Stefan Werner 47da8dcbca Cycles: Improved thread order for better CUDA performance.
This patch puts threads that render the same pixel closer together,
as opposed to threads that render the same sample. Thus threads
within a warp are more coherent in memory access and control flow,
leading to performance improvements.

Example benchmarks on a Quadro RTX4000 (WDDM) on Windows 10:
Koro:                 4:23 ->  3:46
BMW:                  1:18 ->  1:25
Barbershop Interior: 17:52 -> 14:55
Classroom:            4:37 ->  3:45

Performance differences on OpenCL/AMD were hit and miss, some scenes
became faster, others lost significantly. Therefore, this is kept as
CUDA only change for now.
2019-03-14 11:45:58 +01:00
2018-12-14 08:13:55 +11:00
2019-01-22 12:50:13 +01:00
2010-10-13 14:44:22 +00:00
Description
No description provided
841 MiB
Languages
C++ 78%
Python 14.9%
C 2.9%
GLSL 1.9%
CMake 1.2%
Other 0.9%