* First step towards a new vector transform node, to convert Points/Vectors between World/Object/Camera space.
This only contains the Blender UI, RNA... code, no Cycles integration yet.
It removes buttons_header.c file, adds a (small) space_properties.py one, with a PROPERTIES_HT_header class, which simply uses the RNA enum to draw the context buttons.
It also fixes that enum, btw, it always featured all contexts, which means you could (try to!) set through RNA invalid contexts...
Thanks to brecht and dingto for the reviews.
* Add CUDA compiler version detection to cmake/scons/runtime
* Remove noinline in kernel_shader.h and reenable --use_fast_math if CUDA 5.x
is used, these were workarounds for CUDA 4.2 bugs
* Change max number of registers to 32 for sm 2.x (based on performance tests
from Martijn Berger and confirmed here), and also for NVidia OpenCL.
Overall it seems that with these changes and the latest CUDA 5.0 download, that
performance is as good as or better than the 2.67b release with the scenes and
graphics cards I tested.
Perhaps real fix would be to make all parts of blender
mandatory and not switchable off, so every area of code
would be compiled and verified after no-functional-changes
commits.
caused by recent changes in normal calculation, however curves were not being very smart about calculating modifiers (calling unneeded re-tessellation for every modifier)
On the BMW scene, this gives roughly a 10% speedup overall with clang/gcc, and 30%
speedup with visual studio (2008). It turns out visual studio was optimizing the
existing code quite poorly compared to pretty good autovectorization by clang/gcc,
but hand written SSE code also gives a smaller speed boost there.
This code isn't enabled when using the hair minimum width feature yet, need to
make that work with the SSE code still.
the distance checks could get into a feedback loop so that the result depended on the order of verts/edges.
now you can randomize vert/edge/faces and get exactly the same results.
also made some internal improvements,
- used fixed sized arrays (no need to realloc).
- use vertex tag flags rather then a visit-hash.
- remove 'tots' array that did nothing (not sure why it was added).
* Enable the Non-Progressive integrator on GPU (CUDA) for testing.
In order to compile the CUDA kernel with it, you need at least 6GB of system memory and CUDA Toolkit 5.0 or 5.5.
It should also work with CUDA Toolkit 4.2, but in this case you should have 12GB of RAM.
In case any problems arise, just change line 65 of kernel_types.h to disable Non-Progressive again.
-- #define __NON_PROGRESSIVE__
++ //#define __NON_PROGRESSIVE__