Commit Graph

139 Commits

Author SHA1 Message Date
Brecht Van Lommel
394a1373a0 Cycles: use OpenCL C 2.0 if available, to improve performance for AMD
Tested with AMD Radeon Pro WX 9100, where it brings performance back to 2.80
level, and combined with recent changes is about 2-15% faster than 2.80 in
our benchmark scenes.

This somehow appears to specifically address the issue where adding more shader
nodes leads to slower runtime. I found no additional speedup by applying this
to change to 2.80 or removing the new shader node code.

Ref T71479

Patch by Jeroen Bakker.

Differential Revision: https://developer.blender.org/D6252
2020-03-24 20:09:36 +01:00
Ray Molenkamp
44c6b6615b OpenCL: Bring back CYCLES_OPENCL_TEST override
Back in 2.79 you could either use the debug panel or an
environment variable to override using OpenCL for unsupported
hardware. Which was rather useful for developers when testing
on NVidia just to be sure the CL kernels at-least build properly.

This broke in rB949ab753bb2

This diff restores testing though the CYCLES_OPENCL_TEST
environment variable.

Differential Revision: https://developer.blender.org/D7202

Reviewers: brecht
2020-03-21 11:55:45 -06:00
Dalai Felinto
2d1cce8331 Cleanup: make format after SortedIncludes change 2020-03-19 09:33:58 +01:00
Brecht Van Lommel
26bea849cf Cleanup: add device_texture for images, distinct from other global memory
There was too much image texture specific stuff in device_memory, and too
much code duplication between devices.
2020-03-12 17:28:55 +01:00
Brecht Van Lommel
f01bc597a8 Cleanup: stop encoding image data type in slot index
This is legacy code from when we had a fixed number of textures.
2020-03-11 17:07:17 +01:00
Stefan Werner
51e898324d Adaptive Sampling for Cycles.
This feature takes some inspiration from
"RenderMan: An Advanced Path Tracing Architecture for Movie Rendering" and
"A Hierarchical Automatic Stopping Condition for Monte Carlo Global Illumination"

The basic principle is as follows:
While samples are being added to a pixel, the adaptive sampler writes half
of the samples to a separate buffer. This gives it two separate estimates
of the same pixel, and by comparing their difference it estimates convergence.
Once convergence drops below a given threshold, the pixel is considered done.

When a pixel has not converged yet and needs more samples than the minimum,
its immediate neighbors are also set to take more samples. This is done in order
to more reliably detect sharp features such as caustics. A 3x3 box filter that
is run periodically over the tile buffer is used for that purpose.

After a tile has finished rendering, the values of all passes are scaled as if
they were rendered with the full number of samples. This way, any code operating
on these buffers, for example the denoiser, does not need to be changed for
per-pixel sample counts.

Reviewed By: brecht, #cycles

Differential Revision: https://developer.blender.org/D4686
2020-03-05 12:21:38 +01:00
Patrick Mours
af54bbd61c Cycles: Rework tile scheduling for denoising
This fixes denoising being delayed until after all rendering has finished. Instead, tile-based
denoising is now part of the "RENDER" task again, so that it is all in one task and does not
cause issues with dedicated task pools where tasks are serialized.

Reviewed By: brecht

Differential Revision: https://developer.blender.org/D6940
2020-02-28 16:12:29 +01:00
Patrick Mours
153e001c74 Cleanup: Move common CUDA/OptiX Cycles device code into separate file
This reduces code duplication between the CUDA and OptiX device implementations: The CUDA device
class is now split into declaration and definition (similar to the OpenCL device) and the OptiX device
class implements that and only overrides the functions it actually has to change, while using the CUDA
implementation for everything else.

Reviewed By: brecht

Differential Revision: https://developer.blender.org/D6814
2020-02-12 13:11:32 +01:00
Patrick Mours
38589de10c Cycles: Add support for denoising in the viewport
The OptiX denoiser can be a great help when rendering in the viewport, since it is really fast
and needs few samples to produce convincing results. This patch therefore adds support for
using any Cycles denoiser in the viewport also (but only the OptiX one is selectable because
the NLM one is too slow to be usable currently). It also adds support for denoising on a
different device than rendering (so one can e.g. render with the CPU but denoise with OptiX).

Reviewed By: #cycles, brecht

Differential Revision: https://developer.blender.org/D6554
2020-02-11 18:03:43 +01:00
Jeroen Bakker
f5e37af5a8 Cycles/OpenCL: Remove NULL PTR Workaround
In the current OpenCL implementation we have a work-around for platforms
that didn't support NULL pointers. We used to replace all NULLs and
empty arrays with a pointer to a single byte on the OpenCL Device.

During investigation of {T65924} it was asked to remove this work-around
for testing. This change improves the render times.

    SCENE              | BEFORE | AFTER
   --------------------+--------+-------
   bmw27               | 108    | 89
   barbershop_interior | 867    | 673
   classroom           | 270    | 173
   fishy_cat           | 244    | 196
   koro                | 249    | 207
   pavillon_barcelona  | 582    | 414

Note that this change does not fix T65924 it just improves the
rendering performance for OpenCL. We haven't tested this patch on all
platforms so we should keep an eye out on the tracker.

Reviewed By: sergey

Differential Revision: https://developer.blender.org/D6391
2019-12-11 11:59:21 +01:00
Jeroen Bakker
b9ed30c25c Cycles: OpenCL Separate Compilation Debug Flag
OpenCL Parallel compilation only works inside Blender. When using cycles in a different setup (standaline or other software) it failed compiling kernels as they don't have the appropriate Python API and command line arguments.

This change introduces a `running_inside_blender` debug flag, that triggers out of process compilation of the kernels. Compilation still happens in subthread that enabled the preview kernels and compilation of the kernels during BVH building

Reviewed By: brecht

Differential Revision: https://developer.blender.org/D5439
2019-08-30 13:53:23 +02:00
Campbell Barton
2425401a59 Cleanup: spelling 2019-08-04 12:51:44 +10:00
Campbell Barton
cd6b49f995 Cleanup: spelling 2019-07-07 15:38:41 +10:00
Campbell Barton
c47d669f24 Cleanup: comments (long lines) in cycles 2019-05-01 21:41:07 +10:00
Campbell Barton
e12c08e8d1 ClangFormat: apply to source, most of intern
Apply clang format as proposed in T53211.

For details on usage and instructions for migrating branches
without conflicts, see:

https://wiki.blender.org/wiki/Tools/ClangFormat
2019-04-17 06:21:24 +02:00
Campbell Barton
5ef4b0438c Cleanup: trailing space 2019-03-19 15:08:16 +11:00
Brecht Van Lommel
9873005ecd Cleanup: simplify kernel features definition.
No functional changes, logic here got too complex after many changes over
the years.
2019-03-17 12:01:19 +01:00
Brecht Van Lommel
e17f7af0ce Cleanup: remove Cycles advanced shading features toggle.
It's effectively always enabled, only not on some unsupported OpenCL devices.
For testing those it's not useful to disable these features. This is replaced
by the more fine grained feature toggles that we have now.
2019-03-17 01:58:39 +01:00
Jeroen Bakker
2f6257fd7f Cycles/OpenCL: Compile Kernels During Scene Update
The main goals of this change is faster starting when using foreground
rendering.

This patch will build kernels in parallel to the update process of
the scene. When these optimized kernels are not available (yet) an AO
kernel will be used.

These AO kernels are fast to compile (3-7 seconds) and can be
reused by all scenes. When the final kernels become available we
will switch to these kernels.

In background mode the AO kernels will not be used.
Some kernels are being used during Scene update (displace, background
light). When these kernels are being used the process can halt until
these become available.

Reviewed By: brecht, #cycles

Maniphest Tasks: T61752

Differential Revision: https://developer.blender.org/D4428
2019-03-15 16:18:21 +01:00
Jeroen Bakker
6237743111 Cycles/OpenCL: Added missing opencl programs
The functions that determine the program name + filename of kernels
were missing some base kernels like denoising and base. For completeness
I added those kernels so the function returns the correct results.
2019-03-15 08:11:28 +01:00
Jeroen Bakker
298dabc79b Cycles/OpenCL: Reduce How Often Kernel Recompilations Are Needed
This patch will reduce the number of times that we need to
recompile kernels. It does this by (en/dis)abling features
by default. So when the user needs them that the kernels are
already available.

Other features are enabled by default for background and foreground
rendering. When in background rendering the user wants the best
render performance. When in foreground rendering the user wants
the least amount of recompilations.

Enabling volumetrics or subdivision evaluation will still trigger
a recompilation during foreground rendering.

Reviewed By: #cycles, brecht

Differential Revision: https://developer.blender.org/D4485
2019-03-12 14:06:45 +01:00
Jeroen Bakker
02a7e875d7 Cycles OpenCL: Remove single program
Part of the cleanup of the OpenCL codebase.
Single program is not effective when using OpenCL, it is slower
to compile and slower during rendering (when used in for example
`barbershop` or `victor`).

Reviewers: brecht, #cycles

Maniphest Tasks: T62267

Differential Revision: https://developer.blender.org/D4481
2019-03-08 16:31:35 +01:00
Jeroen Bakker
76442e676e Codestyle: comments 2019-03-08 08:56:16 +01:00
Brecht Van Lommel
564d252d60 Cleanup: remove unnecessary assert. 2019-02-26 20:01:20 +01:00
Ray Molenkamp
ff304d3665 Cycles: Fix build error
introduced by rBdabe5cd31add8aa55b9ad4bce1b591ed4e98f1a1
2019-02-26 08:32:41 -07:00
Jeroen Bakker
dabe5cd31a T61971: Compilation Displacement/Background Kernel
Displacement and Background kernels are selectively used, but always compiled. This patch will not compile these kernels when they are not needed.

Displacement kernel is only used for true displacement.
Background kernel is only used when there is a (Cycles)Light of type `LIGHT_BACKGROUND`.

Reviewed By: brecht, #cycles

Tags: #cycles

Maniphest Tasks: T61971

Differential Revision: https://developer.blender.org/D4412
2019-02-26 14:06:25 +01:00
Jeroen Bakker
e6099c7e46 T61576: Do Not (Re-)Compile OpenCL kernels
The goal of this patch is to have limit the number of times
kernels needs to be compiled and are reused as kernels with
different compile directives can lead to identical same
binaries.

The implementation does this by stripping the compile directives.
and reshuffling kernels so the output is more likely to be the
same.

We focussed on the kernels where it was easy to detect and maintain
(bundle, bake, displace, do_volume and background). More optimizations
could be done but they are probably less obvious.

Merged the data_init and state_buffer_size kernels to split_bundle.

This patch will also remove empty kernels for do_volume and bake
when their features are not enabled.

When using the benchmark files there are less background, bake and
do_volume kernels compiled.

Fix: T61576, T61501, T61466

Reviewed By: brecht, #cycles

Differential Revision: https://developer.blender.org/D4390
2019-02-26 12:45:26 +01:00
Brecht Van Lommel
f1304c973f Fix T61810: Cycles OpenCL denoising broken after recent changes. 2019-02-21 16:47:04 +01:00
Jeroen Bakker
6e9dca2214 Codestyle: Indentation 2019-02-21 08:52:04 +01:00
Brecht Van Lommel
fda79dbd79 Cleanup: fix compiler warning. 2019-02-20 16:39:12 +01:00
Jeroen Bakker
949ab753bb Cycles OpenCL: Remove OpenCL MegaKernel
Using OpenCL MegaKernel has been slow and therefore not usefull.
This patch will remove the mega kernel from the OpenCL codebase
and the OpenCLDeviceBase class.

T61736: removal of mega kernel
T61703: baking does not work with mega kernel

Tags: #cycles

Differential Revision: https://developer.blender.org/D4383
2019-02-20 15:17:22 +01:00
Jeroen Bakker
667033e89e T61463: Separate Baking kernels
Cycles OpenCL: Split baking kernels in own program

Fix T61463. Before this patch baking was part of the base kernels. There
are 3 baking kernels that and all 3 uses shader evaluation. Only for one
of these kernels the functionality was wrapped in the __NO_BAKING__
compile directive.

When you start baking this leads to long compile times. By separating
in individual programs will reduce the compile times.

Also wrapped all baking kernels with __NO_BAKING__ to reduce the
compilation times.

Impact on compilation time

    job   |   scene_name    | previous |  new  | percentage
  --------+-----------------+----------+-------+------------
   T61463 | empty           |    10.63 |  7.27 |         32%
   T61463 | bmw             |    17.91 | 14.24 |         20%
   T61463 | fishycat        |    19.57 | 15.08 |         23%
   T61463 | barbershop      |    54.10 | 48.18 |         11%
   T61463 | classroom       |    17.55 | 14.42 |         18%
   T61463 | koro            |    18.92 | 17.15 |          9%
   T61463 | pavillion       |    17.43 | 14.23 |         18%
   T61463 | splash279       |    16.48 | 15.33 |          7%
   T61463 | volume_emission |    36.22 | 34.19 |          6%

Impact on render time

    job   |   scene_name    | previous |   new   | percentage
  --------+-----------------+----------+---------+------------
   T61463 | empty           |    21.06 |   20.54 |          2%
   T61463 | bmw             |   198.44 |  189.59 |          4%
   T61463 | fishycat        |   394.20 |  388.50 |          1%
   T61463 | barbershop      |  1188.16 | 1185.49 |          0%
   T61463 | classroom       |   341.08 |  339.27 |          1%
   T61463 | koro            |   472.43 |  360.70 |         24%
   T61463 | pavillion       |   905.77 |  902.14 |          0%
   T61463 | splash279       |    55.26 |   54.92 |          1%
   T61463 | volume_emission |    62.59 |   39.09 |         38%

I don't have a grounded explanation why koro and volume_emission is this much
faster; I have done several tests though...

Maniphest Tasks: T61463

Differential Revision: https://developer.blender.org/D4376
2019-02-19 16:34:55 +01:00
Brecht Van Lommel
8138eb0dfe Fix Cycles OpenCL multithreaded compilation not working on Windows. 2019-02-19 13:48:56 +01:00
Brecht Van Lommel
9800837b98 Cycles: Support multithreaded compilation of kernels
This patch implements a workaround to get the multithreaded compilation from D2231 working.
So far, it only works for Blender, not for Cycles Standalone. Also, I have only tested the Linux codepath in the helper function.
Depends on D2231.

Patch by lukasstockner97, jbakker, brecht

    job    |   scene_name    | compilation_time
----------+-----------------+------------------
    Baseline | empty           |            22.73
    D2264    | empty           |            13.94
    Baseline | bmw             |            56.44
    D2264    | bmw             |            41.32
    Baseline | fishycat        |            59.50
    D2264    | fishycat        |            45.19
    Baseline | barbershop      |           212.28
    D2264    | barbershop      |           169.81
    Baseline | victor          |            67.51
    D2264    | victor          |            53.60
    Baseline | classroom       |            51.46
    D2264    | classroom       |            39.02
    Baseline | koro            |            62.48
    D2264    | koro            |            49.03
    Baseline | pavillion       |            54.37
    D2264    | pavillion       |            38.82
    Baseline | splash279       |            47.43
    D2264    | splash279       |            37.94
    Baseline | volume_emission |           145.22
    D2264    | volume_emission |           121.10

This patch reduced compilation time as the split kernels and base
kernels are compiled in parallel. In cycles debug mode (256) you can set
unmark the opencl single program file, what reduces the compilation time
even further (bmw 17 seconds, barbershop 53 seconds).

Reviewers: brecht, dingto, sergey, juicyfruit, lukasstockner97

Reviewed By: brecht

Subscribers: Loner, jbakker, candreacchio, 3dLuver, LazyDodo, bliblubli

Differential Revision: https://developer.blender.org/D2264
2019-02-15 08:56:20 +01:00
Lukas Stockner
fccf506ed7 Cycles: animation denoising support in the kernel.
This is the internal implementation, not available from the API or
interface yet. The algorithm takes into account past and future frames,
both to get more coherent animation and reduce noise.

Ref D3889.
2019-02-06 15:18:42 +01:00
Lukas Stockner
405cacd4cd Cycles: prefilter feature passes separate from denoising.
Prefiltering of feature passes will happen during rendering, which can
then be used for denoising immediately or written as a render pass for
later (animation) denoising.

The number of denoising data passes written is reduced because of this,
leaving out the feature variance passes. The passes are now Normal,
Albedo, Depth, Shadowing, Variance and Intensity.

Ref D3889.
2019-02-06 15:18:29 +01:00
Sergey Sharybin
bb0d812d98 Cycles: Disable OpenCL on macOS
This is unfortunate, but the number of bugs in this configuration
keeps growing, and almost all of them are caused by bug in OpenCL
compiler.

The compiler is not likely to be fixed, since Apple declared OpenCL
deprecated.

This evil commit is aimed to keep officially supported features
of Blender in a good working and stable state.
2018-12-07 14:37:47 +01:00
Brecht Van Lommel
a8b8da5567 Fix T58183: crash with CPU + GPU rendering after profiling changes.
Multi-device was not passing along profiler to the CPU.
2018-11-29 23:43:27 +01:00
Sergey Sharybin
203de0bbf0 Cycles: Cleanup, space after (void)
It was used in like 95% of places.
2018-11-09 12:08:51 +01:00
Sergey Sharybin
cb4b5e12ab Cycles: Cleanup, spacing after preprocessor
It is supposed to be two spaces before comment stating which if
else/endif statements corresponds to. Was mainly violated in the
header guards.
2018-11-09 11:34:54 +01:00
Sergey Sharybin
e0cc3e9809 Cycles: Fix wrong BVH used when disabling AVX2 in debug settings
Mainly useful for debugging. Previously, when AVX2 was disabled
in the debug panel but BVH layout was kept on BVH8 nothing was
rendered.

Needed to make it so supported BVH layout mask for devices is
queried in "dynamic", so it is possible to use DebugFlags there.
2018-10-31 11:46:52 +01:00
Lukas Stockner
7920ebd157 Cycles: Fix NLM denoising kernels zeroing the wrong buffer on OpenCL
Since my temporary buffer commit (about a month ago), the OpenCL device was zeroing the wrong buffer, leading to
completely wrong filtered feature passes and therefore significantly lower-quality results than CPU and CUDA.
2018-10-09 00:14:29 +02:00
Lukas Stockner
7aaeb06fb6 Cycles: Clean up extra minus in previous commit
Forgot to add that change, sorry for the noise.
2018-10-08 22:22:05 +02:00
Lukas Stockner
15e9d80375 Cycles: Use existing shared temporary memory in reconstruction step of the denoiser
Previously the code allocated its own temporary memory, but it's possible to just use the existing shared one instead.
2018-10-08 22:13:40 +02:00
Alex Fuller
107f1c0a2b Fix Cycles half float pragma for strict OpenCL compilers (like ROCm).
Differential Revision: https://developer.blender.org/D3669
2018-09-03 12:03:56 +02:00
Lukas Stockner
94efc651d4 Cycles Denoiser: Allocate a single temporary buffer for the entire denoising process
With small tiles, the repeated allocations on GPUs can actually slow down the denoising quite a lot.
Allocating the buffer just once reduces rendertime for the default cube with 16x16 tiles and denoising on a mobile 1050 from 22.7sec to 14.0sec.
2018-08-25 12:23:52 -07:00
Sergey Sharybin
658a9c6cf5 Cycles: Cleanup, style
I wouldn't mind changing style to have space after keyword, but there was
no official code style change proposed.
2018-08-24 14:36:18 +02:00
Stefan Werner
a9700e7ad2 Fix T56359: Unitialized variable in Cycles OpenCL could cause crashes. 2018-08-14 22:51:53 +02:00
fclem
e20a0798dc Cycles: Append compute units for RX Vega card names
Makes it more clear whether compute device is Vega 56 or Vega 64.
2018-08-09 15:51:23 +02:00
Stefan Werner
df30b50f2f Cycles: Enabled half precision textures for OpenCL devices that support the cl_khr_fp16 extension. 2018-07-06 11:42:34 +02:00