* Added a Ray Depth output to the Light Path node, which returns the current ray bounce (0, 1, 2, 3...)
* This can be used to use different shaders for direct and indirect lighting and artificial effects.
Examples:
* http://www.pasteall.org/pic/show.php?id=55158 Here we use the output to apply a different shader to the third bounce. As in this example, you can use Math Nodes (Greater Than / Less Than) if you want to use values outside of the 0/1 range.
* http://www.pasteall.org/pic/show.php?id=55159 Here we restrict the maximum bounce on a per shader basis for the left sphere. This way it looks like we would only have 1 max bounce set in the scene "Light paths" panel.
This can be used to e.g. improve performance for objects far from the camera, which do not need full GI.
Technical notes:
* Implemented for both integrators and SVM/OSL.
* This is done by passing state.bounce to the shader_setup_from_* functions.
* Note: We don't pass state.bounce to kernel_shader_evaluate() and therefore shader_setup_from_displacement() method doesn't set the value, this is outside the path trace loop. Maybe a ToDo?
RGB color components gave non-grey results when you might no expect it.
What happens is that some of the color channels are zero in the direct light
pass because their channel is zero in the color pass. The direct light pass is
defined as lighting divided by the color pass, and we can't divide by zero. We
do a division after all samples are added together to ensure that multiplication
in the compositor gives the exact combined pass even with antialiasing, DoF, ..
Found a simple tweak here, instead of setting such channels to zero it will set
it to the average of other non-zero color channels, which makes the results look
like the expected grey.
Issue is caused by missing sse flags for Clang compilers,
this flags only was set for GNU C compilers.
Added if branch for Clang now, which contains the same
flags apart from -mfpmath=sse, This is because Clang was
claiming it's unused argument.
Probably OSX would need some further checks since it's
also using Clang. I've got no idea why it could have
worked for OSX before..
* After some more thinking, solved the remaining ToDos. :)
* Added is_object check to check if we have a valid object.
* If we operate on the world, and try to convert from/to object space, we now assume world space instead, same as OSL.
* Implementation of the node for SVM. This covers all possible transformations: World <> Object <> Camera space.
As far as I can tell, it also works fine with Motion Blur enabled.
ToDo:
* SVM differs from OSL, when the node is used on the world.
* Reshuffle SSE #ifdefs to try to avoid compilation errors enabling SSE on 32 bit.
* Remove CUDA kernel launch size exception on Mac, is not needed.
* Make OSL file compilation quiet like c/cpp files.
texture coordinate that should automatically use the default normal or texture
coordinate appropriate for that node, rather than some fixed value specified by
the user.
* On nvidia Kepler GPUs (sm_30 and above), there are now 145 byte images available, instead of 95.
We could extend this to about 200 if needed.
Could not test this, as I don't have a Kepler GPU, so feedback on this would be appreciated.
Thanks to Brecht for review and some fixes. :)
* Add CUDA compiler version detection to cmake/scons/runtime
* Remove noinline in kernel_shader.h and reenable --use_fast_math if CUDA 5.x
is used, these were workarounds for CUDA 4.2 bugs
* Change max number of registers to 32 for sm 2.x (based on performance tests
from Martijn Berger and confirmed here), and also for NVidia OpenCL.
Overall it seems that with these changes and the latest CUDA 5.0 download, that
performance is as good as or better than the 2.67b release with the scenes and
graphics cards I tested.
On the BMW scene, this gives roughly a 10% speedup overall with clang/gcc, and 30%
speedup with visual studio (2008). It turns out visual studio was optimizing the
existing code quite poorly compared to pretty good autovectorization by clang/gcc,
but hand written SSE code also gives a smaller speed boost there.
This code isn't enabled when using the hair minimum width feature yet, need to
make that work with the SSE code still.