Don't use quick sort for small arrays, bubble sort works way faster for small
arrays due to cache coherency. This is what qsort() from libc is doing actually.
We can also experiment unrolling some extra small arrays, for example 3 and 4
element arrays.
This reduces tangent space calculation for dragon from 3.1sec to 2.9sec.
Brings tangent space calculation from 4.6sec to 3.1sec for dragon model in BI.
Cycles is also somewhat faster, but it has other bottlenecks.
Funny thing, using simple `static inline` already gives a lot of speedup here.
That's just answering question whether it's OK to leave decision on what to
inline up to a compiler..
Would be nice to be able to catch this with assert as well, will see what would
be the best way to do this/.\
Need to verify with Mai that this solves crash for her and maybe consider
porting this to 2.79.
Fishy cat benchmark was rendering with wrong shadows. Cause is unclear,
adding printf or rearranging code seems to avoid this issue, possibly a
compiler bug. This reverts the fix and solves the OSL bug elsewhere.
This was needed when we accessed OSL closure memory after shader evaluation,
which could get overwritten by another shader evaluation. But all closures
are immediatley converted to ShaderClosure now, so no longer needed.
While unlikely to have had any serious effects because of limited use, the
previous implementation was not actually atomic due to a data race and
incorrectly coded CAS loop. We also had duplicates of this code in a few
places, it's now been moved to a single location with all other atomic
operations.
We need to make sure we can store all volume closures for all objects in volume
stack. This is a bit tricky to detect what would be the "nestness" level of
volumes so for now use maximum possible stack depth. Might cause some slowdown,
but better to give reliable render output than to fail quickly.
Should be safe for 2.79 after extra eyes.
We should only early out with any hit in BVH traversal if the only visibility
bits used are opaque shadow. Not when opaque shadow is one of multiple bits.
Also pass by value and don't write back now that it is just a hash for seeding
and no longer an LCG state. Together this makes CUDA a tiny bit faster in my
tests, but mainly simplifies code.
This implements Arvo's "Stratified sampling of spherical triangles". Similar to how we sample rectangular area lights, this is sampling triangles over their solid angle. It does significantly improve sampling close to the triangle, but doesn't do much for more distant triangles. So I added a simple heuristic to switch between the two methods. Unfortunately, I expect this to add render time in any case, even when it does not make any difference whatsoever. It'll take some benchmarking with various scenes and hardware to estimate how severe the impact is and if it is worth the change.
Reviewers: #cycles, brecht
Reviewed By: #cycles, brecht
Subscribers: Vega-core, brecht, SteffenD
Tags: #cycles
Differential Revision: https://developer.blender.org/D2730
This patch adds "Pixel Size" to the performance options, which allows to render
in a smaller resolution, which is especially useful for displays with high DPI.
Reviewers: Severin, dingto, sergey, brecht
Reviewed By: brecht
Subscribers: Severin, venomgfx, eyecandy, brecht
Differential Revision: https://developer.blender.org/D1619
Basically, make re-alloc and memcpy from the same lock, otherwise one
thread might be re-allocating thread while another one is trying to
copy data there.
Reported by Mohamed Sakr in IRC, thanks!
It doesn't seem that useful in practice, was mostly added to match some
other renderers but also seems to be causing user confusing and accidental
long render times. So let's just keep the UI simple and remove this.
Differential Revision: https://developer.blender.org/D2768
We're adding some bias by default, which now I think is the right thing
to do from a usability point of view since you really need to use those
options anyway to get clean renders in a practical time.
Differential Revision: https://developer.blender.org/D2769
This is a bit confusing, especially when one mixes OpenCL code where ulong equals
to uint64_t with CPU side code where ulong is expected to be something else from
the naming.
This commit makes it so we use explicit name, common on all platforms.
Problem was that some code checks to see if device_pointer is null or
not and the new allocator wasn't even setting the pointer to anything
as it tracks memory location separately. Setting the pointer to non
null keeps all users of device_pointer happy.
- Apparently MSVC does not support compound literals
in C++ (at least by the looks of it).
- Not sure how opencl_device_assert was managing to
set protected property of the Device class.
We don't enable global SSE optimizations in regular kernel, and we
keep those disabled on Linux 32bit.
One possible workaround would be to pass arguments by ccl_ref, but
that is quite a few of code which better be done accurately.
Steps to reproduce:
- Create shader Image texture -> Diffuse BSDF -> Output. Do NOT select image yet!
- Start viewport render.
- Select image from the ID browser of Image Texture node.
Thing is: with the memory manager we always need to inform device that memory
was freed.
Common folks, nobody considered master a C++11 only branch. Such decision is to
be done officially and will involve changes in quite a few infrastructure related
areas.