This is an initial implementation of BVH8 optimization structure
and packated triangle intersection. The aim is to get faster ray
to scene intersection checks.
Scene BVH4 BVH8
barbershop_interior 10:24.94 10:10.74
bmw27 02:41.25 02:38.83
classroom 08:16.49 07:56.15
fishy_cat 04:24.56 04:17.29
koro 06:03.06 06:01.45
pavillon_barcelona 09:21.26 09:02.98
victor 23:39.65 22:53.71
As memory goes, peak usage raises by about 4.7% in a complex
scenes.
Note that BVH8 is disabled when using OSL, this is because OSL
kernel does not get per-microarchitecture optimizations and
hence always considers BVH3 is used.
Original BVH8 patch from Anton Gavrikov.
Batched triangles intersection from Victoria Zhislina.
Extra work and tests and fixes from Maxym Dmytrychenko.
With small tiles, the repeated allocations on GPUs can actually slow down the denoising quite a lot.
Allocating the buffer just once reduces rendertime for the default cube with 16x16 tiles and denoising on a mobile 1050 from 22.7sec to 14.0sec.
Textures in 16 bit integer format are sometimes used for displacement, bump and normal maps and can be exported by tools like Substance Painter. Without this patch, Cycles would promote those textures to single precision floating point, causing them to take up twice as much memory as needed.
Reviewers: #cycles, brecht, sergey
Reviewed By: #cycles, brecht, sergey
Subscribers: sergey, dingto, #cycles
Tags: #cycles
Differential Revision: https://developer.blender.org/D3523
This deduplicates the calls for tile (un)mapping and allows to have a target buffer that is different from the source buffer (needed for baking and animation denoising).
The latest clang compiler (at least the one in Xcode 9.4.1) warns about the register keyword and macro expansions using defined().
Since these warnings come from third party code, we can't address them directly in Blender. Silencing them via #pramgas will
at least keep the warnings during a build down to the ones that are relevant to Blender code.
This brings separate initialization for libcuda and libnvrtc, which
fixes Cycles nvrtc compilation not working on build machines without
CUDA hardware available.
Differential Revision: https://developer.blender.org/D3045
We should actually be using CL_DEVICE_MEM_BASE_ADDR_ALIGN for sub buffers,
previous change in this code was incorrect. Renamed the function now to
make the specific purpose of this alignment clear, it's not required for
data types in general.