Pass globals as a bare pointer, same as it sued to be prior to split kernel rework.
AMD CPU platform and Intel OpenCL were complaining about this.
Perhaps we shouldn't pass globals as pointer at all, this isn't something what is
really portable and can cause issues on 32 bit perhaps.
The range is controlled using the following command line arguments:
--cycles-resumable-start-chunk
--cycles-resumable-end-chunk
Those are 1-based index of range for rendering.
While this compiler is not officially supported yet, getting it to work is
a nice thing because more and more AMD cards will fall under MESA driver.
It's also nice to use explicit comparison with NULL, which makes it more
clear whether variable is a boolean or pointer. Even Rust enforces this!
Patch by Ian Bruce with own modifications.
Single program generally compiles kernels faster (2-3 times), loads faster,
takes less drive space (2-3 times), and reduces the number of cached kernels.
Reduces memory allocation for split kernel.
This allows for faster rendering due to bigger global size,
specially when GPU memory is limited.
Perfromance results:
R9 290 total render time
Before After Change
BMW 4:37 4:34 -1.1 %
Classroom 14:43 14:30 -1.5 %
Fishy Cat 11:20 11:04 -2.4 %
Koro 12:11 12:04 -1.0 %
Pabellon Barcelona 22:01 20:44 -5.8 %
Pabellon Barcelona(*) 15:32 15:09 -2.5 %
(*) without glossy connected to volume
Decoupled ray marching is not supported yet.
Transparent shadows are always enabled for volume rendering.
Changes in kernel/bvh and kernel/geom are from Sergey.
This simiplifies code significantly, and prepares it for
record-all transparent shadow function in split kernel.
By calculating the size of the state buffer in the kernel rather than the host
less code is needed and the size actually reflects the requested features.
Will also be a little faster in some cases because of larger global work size.
Because the split kernel can render multiple samples in parallel it is
necessary to have everything initialized before rendering of any samples
begins. The code that normally handles initialization of
`rng_state` (`kernel_path_trace_setup()`) only does so for the first sample,
which was causing artifacts in the split kernel due to uninitialized
`rng_state` for some samples.
Note that because the split kernel can render samples in parallel this
means that the split kernel is incompatible with the LCG.
This was only needed for the previous implementation of parallel samples. As
we don't have that any more it can be removed.
Real reason for removal tho is this: `per_sample_output_buffers` was being
calculated too small and artifacts resulted. The tile buffer is already
the correct size and calculating the size for `per_sample_output_buffers`
is a bit difficult with the current layout of the code. As
`per_sample_output_buffers` was only needed for `sum_all_radiance`,
removing that kernel and writing output to the tile buffer directly
fixes the artifacts.
This is to help debug and track memory usage for generic buffers. We
have similar for textures already since those require a name, but for
buffers the name is only for debugging proposes.