When using the Normal output of the Texture Coordinate node on Point and Spot lamps, the coordinates now depend on the rotation of the lamp.
On Area lamps, the Parametric output of the Geometry node now returns UV coordinates on the area lamp.
Credit for the Area lamp part goes to Stefan Werner (from D1995).
Oh man, is it a compiler bug? Is it something we do stupid?
For now more crap to prevent crashes. During the conference will talk to
Maxyn about how can we troubleshoot such weird issues.
Basically don't use rcp() in areas which seems to be critical after
second look. Also disabled some multiplication operators, not sure
yet why they might be a problem.
Tomorrow will be setting up a full test with all cases which were
buggy in our farm to see if this fix is complete.
There is some precision issues for big magnitude coordinates which started
to give weird behavior of release builds. Some weird memory usage in BVH
which is tricky to nail down because only happens in release builds and GDB
reports all variables as optimized out when trying to use RelWithDebInfo.
There are two things in this commit:
- Attempt to make vectorized code closer to original one, hoping that it'll
eliminate precision issue.
This seems to work for transform_point().
- Similar trick did not work for transform_direction() even tho absolute
error here is much smaller. For now disabled that function, need a more
careful look here.
Several ideas here:
- Optimize calculation of near_{x,y,z} in a way that does not require
3 if() statements per update, which avoids negative effect of wrong
branch prediction.
- Optimization of direction clamping for BVH.
- Optimization of point/direction transform.
Brings ~1.5% speedup again depending on a scene (unfortunately, this
speedup can't be sum across all previous commits because speedup of
each of the changes varies from scene to scene, but it still seems to
be nice solid speedup of few percent on Linux and bigger speedup was
reported on Windows).
Once again ,thanks Maxym for inspiration!
Still TODO: We have multiple places where we need to calculate near
x,y,z indices in BVH, for now it's only done for main BVH traversal.
Will try to move this calculation to an utility function and see if
that can be easily re-used across all the BVH flavors.
Similar to the previous commit, avoid negative effect of bad branch prediction.
Gives measurable performance up to ~2% in tests here.
Once again, thanks to Maxym Dmytrychenko!
The idea here is to avoid if statements which could cause wrong
branch prediction.
Gives a bit of measurable speedup up to ~1%. Still nice :)
Inspired by Maxym Dmytrychenko, thanks!
Similar to regular triangle intersection case. Gives about 3% speedup rendering
SSS object on my desktop,
Question: how to avoid such a code duplication in a nice way without speed loss?
This will confuse hell of a guarded allocators because it is possible
to have allocation happened prior to Blender's guarded allocator is
fully initialized.
This was causing crashes and assert failures when running blender
with fully guarded memory allocator.
Initialization order of global stats and node types was not strictly
defined and it was possible to have node types initialized first and
stats after that. This will zero out memory which was allocated from
the statistics causing assert failure when de-initializing node types.
It was possible to have non-initialized unaligned BVH split
to be used when regular BVH split SAH was inf. Now we ensure
that unaligned splitter is only used when it's really initialized.
It's a regression and should be in 2.78a.
Note that volume rendering is not supported yet, this is a step towards that.
Reviewed By: brecht
Differential Revision: https://developer.blender.org/D2299
Previously an error message would be printed whenever the OpenCL build produced output.
However, some frameworks seem to print extra information even if the build succeeded, so now the actual returned error is checked as well.
When --debug-cycles is activated, the build output will always be printed, otherwise it only gets printed if there was an error.
The problem here was, as the title says, that the two kernels were swapped.
Since shader evaluation is only used for building the samling map when World MIS is enabled, rendering without it would still work fine, although baking also was broken.
The previous refactor changed the code to use a separate logging mechanism to support multithreaded compilation.
However, since that's not supported by any frameworks yes, it just resulted in bad logging behaviour.
So, this commit changes the logging to go diectly to stdout/stderr once again by default.
This was giving some speedup but made intersection tests to fail
from watertight point of view.
Needs deeper investigation, but need to quickly get it fixed for
the studio.
This gives about 5% speedup on AVX2 kernels (other kernels still
have SSE disabled for math operations) and this solves the slowdown
of koro scene mention in the previous commit.
The title says it all actually. This commit also contains
changes to pass float3 as const reference in affected functions.
This should make MSVC happier without breaking OpenCL because it's
only done in areas which are ifdef-ed for non-OpenCL.
Another patch based on inspiration from Maxym Dmytrychenko, thanks!
This commit basically vectorizes existing code using AVX2 instructions
(without modifying algorithm itself). This gives quite nice speedups:
BMW: -8%
Classroom: -5%
Cat: -5%
Koro: +1%
Barcelona: -8%
That's on Linux machine, reported performance improvement on Windows
goes up to 20%.
Not currently sure why Koro is somewhat slower because it mainly uses
curve intersection tests, could be a time noise? Or osmething with the
cache utilization perhaps? In any case speedup in other scenes makes
me thinking that current state is acceptable for initial implementation.
This is again inspired by Maxym Dmytrychenko.
Based on existing ssef data type and to my knowledge it's also what happens in
Embree nowadays.
Inspired by Maxym Dmytrychenko and required for the upcoming triangle
intersection commit.
Hopefully the copyright message is correct.
When ray hits curve segment with SSS shader it was possible to have
uninitialized hit_P variable used for sampling.
Seems that was a reason of our headache of difference between AVX2
and SSE4 render results here, so now we can revert all the nasty
ifdef-ed inline policies.
Mostly this is making inlining match CUDA 7.5 in a few performance critical
places. The end result is that performance is now better than before, possibly
due to less register spilling or other CUDA 8.0 compiler improvements.
On benchmarks scenes, there are 3% to 35% render time reductions. Stack memory
usage is reduced a little too.
Reviewed By: sergey
Differential Revision: https://developer.blender.org/D2269