for texture system in advance. Patch by Martijn Berger, with some tweaks.
There was about a 10% performance improvement on OS X in my tests with the
images.blend test file. This may be less on other platforms because OS X has
particularly slow mutex locks.
* Render Passes are now available for Subsurface Scattering (Direct, Indirect and Color pass).
This is part of my GSoC project, SVN merge of r58587, r58828 and r58835.
* Added a node to convert a temperature in Kelvin to an RGB color. This can be used e.g. for lights, to easily find the right color temperature.
= Some common temperatures =
Candle light: 1500 Kelvin
Sunset/Sunrise: 1850 Kelvin
Studio lamps: 3200 Kelvin
Horizon daylight: 5000 Kelvin
Documentation: http://wiki.blender.org/index.php/Doc:2.6/Manual/Render/Cycles/Nodes/More#Blackbody
Thanks to Philipp Oeser (lichtwerk), who essentially contributed to this with a patch! :)
This is part of my GSoC 2013 project. SVN merge of r57424, r57487, r57507, r57525, r58253 and r58774
* Added a Ray Depth output to the Light Path node, which gives the user access to the current bounce.
This can be used to limit the maximum ray bounce on a per shader basis. Another use case is to restrict light influence with this, to have a lamp only contribute to the direct lighting.
http://wiki.blender.org/index.php/Doc:2.6/Manual/Render/Cycles/Nodes/More#Light_Path
This is part of my GSoC 2013 project. SVN merge of r58091 and r58772 from soc-2013-dingto.
* Avoid check for !LABEL_TRANSPARENT in "kernel_path_non_progressive_lighting", transparency is either handled in the outer loop or in the "kernel_path_indirect" function, but not here.
* Increase the maximum amount of closures per shader from 16 to 64, so more complex closure trees can be rendered.
I measured performance on CPU and GPU (Geforce 540M) and couldn't find a performance impact, but if someone encounters a noticeable impact on his system, please report.
* After some more thinking, solved the remaining ToDos. :)
* Added is_object check to check if we have a valid object.
* If we operate on the world, and try to convert from/to object space, we now assume world space instead, same as OSL.
* Implementation of the node for SVM. This covers all possible transformations: World <> Object <> Camera space.
As far as I can tell, it also works fine with Motion Blur enabled.
ToDo:
* SVM differs from OSL, when the node is used on the world.
* Reshuffle SSE #ifdefs to try to avoid compilation errors enabling SSE on 32 bit.
* Remove CUDA kernel launch size exception on Mac, is not needed.
* Make OSL file compilation quiet like c/cpp files.
* On nvidia Kepler GPUs (sm_30 and above), there are now 145 byte images available, instead of 95.
We could extend this to about 200 if needed.
Could not test this, as I don't have a Kepler GPU, so feedback on this would be appreciated.
Thanks to Brecht for review and some fixes. :)
* Add CUDA compiler version detection to cmake/scons/runtime
* Remove noinline in kernel_shader.h and reenable --use_fast_math if CUDA 5.x
is used, these were workarounds for CUDA 4.2 bugs
* Change max number of registers to 32 for sm 2.x (based on performance tests
from Martijn Berger and confirmed here), and also for NVidia OpenCL.
Overall it seems that with these changes and the latest CUDA 5.0 download, that
performance is as good as or better than the 2.67b release with the scenes and
graphics cards I tested.
On the BMW scene, this gives roughly a 10% speedup overall with clang/gcc, and 30%
speedup with visual studio (2008). It turns out visual studio was optimizing the
existing code quite poorly compared to pretty good autovectorization by clang/gcc,
but hand written SSE code also gives a smaller speed boost there.
This code isn't enabled when using the hair minimum width feature yet, need to
make that work with the SSE code still.
* Enable the Non-Progressive integrator on GPU (CUDA) for testing.
In order to compile the CUDA kernel with it, you need at least 6GB of system memory and CUDA Toolkit 5.0 or 5.5.
It should also work with CUDA Toolkit 4.2, but in this case you should have 12GB of RAM.
In case any problems arise, just change line 65 of kernel_types.h to disable Non-Progressive again.
-- #define __NON_PROGRESSIVE__
++ //#define __NON_PROGRESSIVE__
* Replaced the Brute Force version with a nice lookup table, this speeds it up a lot.
Patch by Philipp Oeser (lichtwerk) with some cleanup and changes by myself. Thanks!
ToDo:
* Temperature values between 800 and 804 Kelvin are wrong in SVM, check on this.
* First (brute force) implementation for SVM. This works and delivers the same result as OSL, but it's slow.
* Code inside svm_blackbody.h inspired by a patch by Philipp Oeser (#35698), thanks.
Ideas:
* Use a lookup table to perform the calculations on render/ level.
* Implement it as a RNA property only, and do the calculation like Sun/Sky precompute.