Tested with AMD Radeon Pro WX 9100, where it brings performance back to 2.80
level, and combined with recent changes is about 2-15% faster than 2.80 in
our benchmark scenes.
This somehow appears to specifically address the issue where adding more shader
nodes leads to slower runtime. I found no additional speedup by applying this
to change to 2.80 or removing the new shader node code.
Ref T71479
Patch by Jeroen Bakker.
Differential Revision: https://developer.blender.org/D6252
Back in 2.79 you could either use the debug panel or an
environment variable to override using OpenCL for unsupported
hardware. Which was rather useful for developers when testing
on NVidia just to be sure the CL kernels at-least build properly.
This broke in rB949ab753bb2
This diff restores testing though the CYCLES_OPENCL_TEST
environment variable.
Differential Revision: https://developer.blender.org/D7202
Reviewers: brecht
This feature takes some inspiration from
"RenderMan: An Advanced Path Tracing Architecture for Movie Rendering" and
"A Hierarchical Automatic Stopping Condition for Monte Carlo Global Illumination"
The basic principle is as follows:
While samples are being added to a pixel, the adaptive sampler writes half
of the samples to a separate buffer. This gives it two separate estimates
of the same pixel, and by comparing their difference it estimates convergence.
Once convergence drops below a given threshold, the pixel is considered done.
When a pixel has not converged yet and needs more samples than the minimum,
its immediate neighbors are also set to take more samples. This is done in order
to more reliably detect sharp features such as caustics. A 3x3 box filter that
is run periodically over the tile buffer is used for that purpose.
After a tile has finished rendering, the values of all passes are scaled as if
they were rendered with the full number of samples. This way, any code operating
on these buffers, for example the denoiser, does not need to be changed for
per-pixel sample counts.
Reviewed By: brecht, #cycles
Differential Revision: https://developer.blender.org/D4686
This fixes denoising being delayed until after all rendering has finished. Instead, tile-based
denoising is now part of the "RENDER" task again, so that it is all in one task and does not
cause issues with dedicated task pools where tasks are serialized.
Reviewed By: brecht
Differential Revision: https://developer.blender.org/D6940
This reduces code duplication between the CUDA and OptiX device implementations: The CUDA device
class is now split into declaration and definition (similar to the OpenCL device) and the OptiX device
class implements that and only overrides the functions it actually has to change, while using the CUDA
implementation for everything else.
Reviewed By: brecht
Differential Revision: https://developer.blender.org/D6814
The OptiX denoiser can be a great help when rendering in the viewport, since it is really fast
and needs few samples to produce convincing results. This patch therefore adds support for
using any Cycles denoiser in the viewport also (but only the OptiX one is selectable because
the NLM one is too slow to be usable currently). It also adds support for denoising on a
different device than rendering (so one can e.g. render with the CPU but denoise with OptiX).
Reviewed By: #cycles, brecht
Differential Revision: https://developer.blender.org/D6554
In the current OpenCL implementation we have a work-around for platforms
that didn't support NULL pointers. We used to replace all NULLs and
empty arrays with a pointer to a single byte on the OpenCL Device.
During investigation of {T65924} it was asked to remove this work-around
for testing. This change improves the render times.
SCENE | BEFORE | AFTER
--------------------+--------+-------
bmw27 | 108 | 89
barbershop_interior | 867 | 673
classroom | 270 | 173
fishy_cat | 244 | 196
koro | 249 | 207
pavillon_barcelona | 582 | 414
Note that this change does not fix T65924 it just improves the
rendering performance for OpenCL. We haven't tested this patch on all
platforms so we should keep an eye out on the tracker.
Reviewed By: sergey
Differential Revision: https://developer.blender.org/D6391
OpenCL Parallel compilation only works inside Blender. When using cycles in a different setup (standaline or other software) it failed compiling kernels as they don't have the appropriate Python API and command line arguments.
This change introduces a `running_inside_blender` debug flag, that triggers out of process compilation of the kernels. Compilation still happens in subthread that enabled the preview kernels and compilation of the kernels during BVH building
Reviewed By: brecht
Differential Revision: https://developer.blender.org/D5439
It's effectively always enabled, only not on some unsupported OpenCL devices.
For testing those it's not useful to disable these features. This is replaced
by the more fine grained feature toggles that we have now.
The main goals of this change is faster starting when using foreground
rendering.
This patch will build kernels in parallel to the update process of
the scene. When these optimized kernels are not available (yet) an AO
kernel will be used.
These AO kernels are fast to compile (3-7 seconds) and can be
reused by all scenes. When the final kernels become available we
will switch to these kernels.
In background mode the AO kernels will not be used.
Some kernels are being used during Scene update (displace, background
light). When these kernels are being used the process can halt until
these become available.
Reviewed By: brecht, #cycles
Maniphest Tasks: T61752
Differential Revision: https://developer.blender.org/D4428
The functions that determine the program name + filename of kernels
were missing some base kernels like denoising and base. For completeness
I added those kernels so the function returns the correct results.
This patch will reduce the number of times that we need to
recompile kernels. It does this by (en/dis)abling features
by default. So when the user needs them that the kernels are
already available.
Other features are enabled by default for background and foreground
rendering. When in background rendering the user wants the best
render performance. When in foreground rendering the user wants
the least amount of recompilations.
Enabling volumetrics or subdivision evaluation will still trigger
a recompilation during foreground rendering.
Reviewed By: #cycles, brecht
Differential Revision: https://developer.blender.org/D4485
Part of the cleanup of the OpenCL codebase.
Single program is not effective when using OpenCL, it is slower
to compile and slower during rendering (when used in for example
`barbershop` or `victor`).
Reviewers: brecht, #cycles
Maniphest Tasks: T62267
Differential Revision: https://developer.blender.org/D4481
Displacement and Background kernels are selectively used, but always compiled. This patch will not compile these kernels when they are not needed.
Displacement kernel is only used for true displacement.
Background kernel is only used when there is a (Cycles)Light of type `LIGHT_BACKGROUND`.
Reviewed By: brecht, #cycles
Tags: #cycles
Maniphest Tasks: T61971
Differential Revision: https://developer.blender.org/D4412
The goal of this patch is to have limit the number of times
kernels needs to be compiled and are reused as kernels with
different compile directives can lead to identical same
binaries.
The implementation does this by stripping the compile directives.
and reshuffling kernels so the output is more likely to be the
same.
We focussed on the kernels where it was easy to detect and maintain
(bundle, bake, displace, do_volume and background). More optimizations
could be done but they are probably less obvious.
Merged the data_init and state_buffer_size kernels to split_bundle.
This patch will also remove empty kernels for do_volume and bake
when their features are not enabled.
When using the benchmark files there are less background, bake and
do_volume kernels compiled.
Fix: T61576, T61501, T61466
Reviewed By: brecht, #cycles
Differential Revision: https://developer.blender.org/D4390
Using OpenCL MegaKernel has been slow and therefore not usefull.
This patch will remove the mega kernel from the OpenCL codebase
and the OpenCLDeviceBase class.
T61736: removal of mega kernel
T61703: baking does not work with mega kernel
Tags: #cycles
Differential Revision: https://developer.blender.org/D4383
This patch implements a workaround to get the multithreaded compilation from D2231 working.
So far, it only works for Blender, not for Cycles Standalone. Also, I have only tested the Linux codepath in the helper function.
Depends on D2231.
Patch by lukasstockner97, jbakker, brecht
job | scene_name | compilation_time
----------+-----------------+------------------
Baseline | empty | 22.73
D2264 | empty | 13.94
Baseline | bmw | 56.44
D2264 | bmw | 41.32
Baseline | fishycat | 59.50
D2264 | fishycat | 45.19
Baseline | barbershop | 212.28
D2264 | barbershop | 169.81
Baseline | victor | 67.51
D2264 | victor | 53.60
Baseline | classroom | 51.46
D2264 | classroom | 39.02
Baseline | koro | 62.48
D2264 | koro | 49.03
Baseline | pavillion | 54.37
D2264 | pavillion | 38.82
Baseline | splash279 | 47.43
D2264 | splash279 | 37.94
Baseline | volume_emission | 145.22
D2264 | volume_emission | 121.10
This patch reduced compilation time as the split kernels and base
kernels are compiled in parallel. In cycles debug mode (256) you can set
unmark the opencl single program file, what reduces the compilation time
even further (bmw 17 seconds, barbershop 53 seconds).
Reviewers: brecht, dingto, sergey, juicyfruit, lukasstockner97
Reviewed By: brecht
Subscribers: Loner, jbakker, candreacchio, 3dLuver, LazyDodo, bliblubli
Differential Revision: https://developer.blender.org/D2264
This is the internal implementation, not available from the API or
interface yet. The algorithm takes into account past and future frames,
both to get more coherent animation and reduce noise.
Ref D3889.
Prefiltering of feature passes will happen during rendering, which can
then be used for denoising immediately or written as a render pass for
later (animation) denoising.
The number of denoising data passes written is reduced because of this,
leaving out the feature variance passes. The passes are now Normal,
Albedo, Depth, Shadowing, Variance and Intensity.
Ref D3889.
This is unfortunate, but the number of bugs in this configuration
keeps growing, and almost all of them are caused by bug in OpenCL
compiler.
The compiler is not likely to be fixed, since Apple declared OpenCL
deprecated.
This evil commit is aimed to keep officially supported features
of Blender in a good working and stable state.
Mainly useful for debugging. Previously, when AVX2 was disabled
in the debug panel but BVH layout was kept on BVH8 nothing was
rendered.
Needed to make it so supported BVH layout mask for devices is
queried in "dynamic", so it is possible to use DebugFlags there.
Since my temporary buffer commit (about a month ago), the OpenCL device was zeroing the wrong buffer, leading to
completely wrong filtered feature passes and therefore significantly lower-quality results than CPU and CUDA.
With small tiles, the repeated allocations on GPUs can actually slow down the denoising quite a lot.
Allocating the buffer just once reduces rendertime for the default cube with 16x16 tiles and denoising on a mobile 1050 from 22.7sec to 14.0sec.