Commit Graph

116 Commits

Author SHA1 Message Date
Brecht Van Lommel
564d252d60 Cleanup: remove unnecessary assert. 2019-02-26 20:01:20 +01:00
Ray Molenkamp
ff304d3665 Cycles: Fix build error
introduced by rBdabe5cd31add8aa55b9ad4bce1b591ed4e98f1a1
2019-02-26 08:32:41 -07:00
Jeroen Bakker
dabe5cd31a T61971: Compilation Displacement/Background Kernel
Displacement and Background kernels are selectively used, but always compiled. This patch will not compile these kernels when they are not needed.

Displacement kernel is only used for true displacement.
Background kernel is only used when there is a (Cycles)Light of type `LIGHT_BACKGROUND`.

Reviewed By: brecht, #cycles

Tags: #cycles

Maniphest Tasks: T61971

Differential Revision: https://developer.blender.org/D4412
2019-02-26 14:06:25 +01:00
Jeroen Bakker
e6099c7e46 T61576: Do Not (Re-)Compile OpenCL kernels
The goal of this patch is to have limit the number of times
kernels needs to be compiled and are reused as kernels with
different compile directives can lead to identical same
binaries.

The implementation does this by stripping the compile directives.
and reshuffling kernels so the output is more likely to be the
same.

We focussed on the kernels where it was easy to detect and maintain
(bundle, bake, displace, do_volume and background). More optimizations
could be done but they are probably less obvious.

Merged the data_init and state_buffer_size kernels to split_bundle.

This patch will also remove empty kernels for do_volume and bake
when their features are not enabled.

When using the benchmark files there are less background, bake and
do_volume kernels compiled.

Fix: T61576, T61501, T61466

Reviewed By: brecht, #cycles

Differential Revision: https://developer.blender.org/D4390
2019-02-26 12:45:26 +01:00
Brecht Van Lommel
f1304c973f Fix T61810: Cycles OpenCL denoising broken after recent changes. 2019-02-21 16:47:04 +01:00
Jeroen Bakker
6e9dca2214 Codestyle: Indentation 2019-02-21 08:52:04 +01:00
Brecht Van Lommel
fda79dbd79 Cleanup: fix compiler warning. 2019-02-20 16:39:12 +01:00
Jeroen Bakker
949ab753bb Cycles OpenCL: Remove OpenCL MegaKernel
Using OpenCL MegaKernel has been slow and therefore not usefull.
This patch will remove the mega kernel from the OpenCL codebase
and the OpenCLDeviceBase class.

T61736: removal of mega kernel
T61703: baking does not work with mega kernel

Tags: #cycles

Differential Revision: https://developer.blender.org/D4383
2019-02-20 15:17:22 +01:00
Jeroen Bakker
667033e89e T61463: Separate Baking kernels
Cycles OpenCL: Split baking kernels in own program

Fix T61463. Before this patch baking was part of the base kernels. There
are 3 baking kernels that and all 3 uses shader evaluation. Only for one
of these kernels the functionality was wrapped in the __NO_BAKING__
compile directive.

When you start baking this leads to long compile times. By separating
in individual programs will reduce the compile times.

Also wrapped all baking kernels with __NO_BAKING__ to reduce the
compilation times.

Impact on compilation time

    job   |   scene_name    | previous |  new  | percentage
  --------+-----------------+----------+-------+------------
   T61463 | empty           |    10.63 |  7.27 |         32%
   T61463 | bmw             |    17.91 | 14.24 |         20%
   T61463 | fishycat        |    19.57 | 15.08 |         23%
   T61463 | barbershop      |    54.10 | 48.18 |         11%
   T61463 | classroom       |    17.55 | 14.42 |         18%
   T61463 | koro            |    18.92 | 17.15 |          9%
   T61463 | pavillion       |    17.43 | 14.23 |         18%
   T61463 | splash279       |    16.48 | 15.33 |          7%
   T61463 | volume_emission |    36.22 | 34.19 |          6%

Impact on render time

    job   |   scene_name    | previous |   new   | percentage
  --------+-----------------+----------+---------+------------
   T61463 | empty           |    21.06 |   20.54 |          2%
   T61463 | bmw             |   198.44 |  189.59 |          4%
   T61463 | fishycat        |   394.20 |  388.50 |          1%
   T61463 | barbershop      |  1188.16 | 1185.49 |          0%
   T61463 | classroom       |   341.08 |  339.27 |          1%
   T61463 | koro            |   472.43 |  360.70 |         24%
   T61463 | pavillion       |   905.77 |  902.14 |          0%
   T61463 | splash279       |    55.26 |   54.92 |          1%
   T61463 | volume_emission |    62.59 |   39.09 |         38%

I don't have a grounded explanation why koro and volume_emission is this much
faster; I have done several tests though...

Maniphest Tasks: T61463

Differential Revision: https://developer.blender.org/D4376
2019-02-19 16:34:55 +01:00
Brecht Van Lommel
8138eb0dfe Fix Cycles OpenCL multithreaded compilation not working on Windows. 2019-02-19 13:48:56 +01:00
Brecht Van Lommel
9800837b98 Cycles: Support multithreaded compilation of kernels
This patch implements a workaround to get the multithreaded compilation from D2231 working.
So far, it only works for Blender, not for Cycles Standalone. Also, I have only tested the Linux codepath in the helper function.
Depends on D2231.

Patch by lukasstockner97, jbakker, brecht

    job    |   scene_name    | compilation_time
----------+-----------------+------------------
    Baseline | empty           |            22.73
    D2264    | empty           |            13.94
    Baseline | bmw             |            56.44
    D2264    | bmw             |            41.32
    Baseline | fishycat        |            59.50
    D2264    | fishycat        |            45.19
    Baseline | barbershop      |           212.28
    D2264    | barbershop      |           169.81
    Baseline | victor          |            67.51
    D2264    | victor          |            53.60
    Baseline | classroom       |            51.46
    D2264    | classroom       |            39.02
    Baseline | koro            |            62.48
    D2264    | koro            |            49.03
    Baseline | pavillion       |            54.37
    D2264    | pavillion       |            38.82
    Baseline | splash279       |            47.43
    D2264    | splash279       |            37.94
    Baseline | volume_emission |           145.22
    D2264    | volume_emission |           121.10

This patch reduced compilation time as the split kernels and base
kernels are compiled in parallel. In cycles debug mode (256) you can set
unmark the opencl single program file, what reduces the compilation time
even further (bmw 17 seconds, barbershop 53 seconds).

Reviewers: brecht, dingto, sergey, juicyfruit, lukasstockner97

Reviewed By: brecht

Subscribers: Loner, jbakker, candreacchio, 3dLuver, LazyDodo, bliblubli

Differential Revision: https://developer.blender.org/D2264
2019-02-15 08:56:20 +01:00
Lukas Stockner
fccf506ed7 Cycles: animation denoising support in the kernel.
This is the internal implementation, not available from the API or
interface yet. The algorithm takes into account past and future frames,
both to get more coherent animation and reduce noise.

Ref D3889.
2019-02-06 15:18:42 +01:00
Lukas Stockner
405cacd4cd Cycles: prefilter feature passes separate from denoising.
Prefiltering of feature passes will happen during rendering, which can
then be used for denoising immediately or written as a render pass for
later (animation) denoising.

The number of denoising data passes written is reduced because of this,
leaving out the feature variance passes. The passes are now Normal,
Albedo, Depth, Shadowing, Variance and Intensity.

Ref D3889.
2019-02-06 15:18:29 +01:00
Sergey Sharybin
bb0d812d98 Cycles: Disable OpenCL on macOS
This is unfortunate, but the number of bugs in this configuration
keeps growing, and almost all of them are caused by bug in OpenCL
compiler.

The compiler is not likely to be fixed, since Apple declared OpenCL
deprecated.

This evil commit is aimed to keep officially supported features
of Blender in a good working and stable state.
2018-12-07 14:37:47 +01:00
Brecht Van Lommel
a8b8da5567 Fix T58183: crash with CPU + GPU rendering after profiling changes.
Multi-device was not passing along profiler to the CPU.
2018-11-29 23:43:27 +01:00
Sergey Sharybin
203de0bbf0 Cycles: Cleanup, space after (void)
It was used in like 95% of places.
2018-11-09 12:08:51 +01:00
Sergey Sharybin
cb4b5e12ab Cycles: Cleanup, spacing after preprocessor
It is supposed to be two spaces before comment stating which if
else/endif statements corresponds to. Was mainly violated in the
header guards.
2018-11-09 11:34:54 +01:00
Sergey Sharybin
e0cc3e9809 Cycles: Fix wrong BVH used when disabling AVX2 in debug settings
Mainly useful for debugging. Previously, when AVX2 was disabled
in the debug panel but BVH layout was kept on BVH8 nothing was
rendered.

Needed to make it so supported BVH layout mask for devices is
queried in "dynamic", so it is possible to use DebugFlags there.
2018-10-31 11:46:52 +01:00
Lukas Stockner
7920ebd157 Cycles: Fix NLM denoising kernels zeroing the wrong buffer on OpenCL
Since my temporary buffer commit (about a month ago), the OpenCL device was zeroing the wrong buffer, leading to
completely wrong filtered feature passes and therefore significantly lower-quality results than CPU and CUDA.
2018-10-09 00:14:29 +02:00
Lukas Stockner
7aaeb06fb6 Cycles: Clean up extra minus in previous commit
Forgot to add that change, sorry for the noise.
2018-10-08 22:22:05 +02:00
Lukas Stockner
15e9d80375 Cycles: Use existing shared temporary memory in reconstruction step of the denoiser
Previously the code allocated its own temporary memory, but it's possible to just use the existing shared one instead.
2018-10-08 22:13:40 +02:00
Alex Fuller
107f1c0a2b Fix Cycles half float pragma for strict OpenCL compilers (like ROCm).
Differential Revision: https://developer.blender.org/D3669
2018-09-03 12:03:56 +02:00
Lukas Stockner
94efc651d4 Cycles Denoiser: Allocate a single temporary buffer for the entire denoising process
With small tiles, the repeated allocations on GPUs can actually slow down the denoising quite a lot.
Allocating the buffer just once reduces rendertime for the default cube with 16x16 tiles and denoising on a mobile 1050 from 22.7sec to 14.0sec.
2018-08-25 12:23:52 -07:00
Sergey Sharybin
658a9c6cf5 Cycles: Cleanup, style
I wouldn't mind changing style to have space after keyword, but there was
no official code style change proposed.
2018-08-24 14:36:18 +02:00
Stefan Werner
a9700e7ad2 Fix T56359: Unitialized variable in Cycles OpenCL could cause crashes. 2018-08-14 22:51:53 +02:00
fclem
e20a0798dc Cycles: Append compute units for RX Vega card names
Makes it more clear whether compute device is Vega 56 or Vega 64.
2018-08-09 15:51:23 +02:00
Stefan Werner
df30b50f2f Cycles: Enabled half precision textures for OpenCL devices that support the cl_khr_fp16 extension. 2018-07-06 11:42:34 +02:00
Campbell Barton
1daa20ad9f Cleanup: strip trailing space for cycles 2018-07-06 10:17:58 +02:00
Lukas Stockner
c960804747 Cycles Denoising: Pass tile buffers to every OpenCL kernel to conform to standard and get rid of set_tile_info 2018-07-04 14:38:03 +02:00
Lukas Stockner
9db8bdbc65 Cycles Denoising: Cleanup: Rename tiles to tile_info 2018-07-04 14:37:24 +02:00
Lukas Stockner
97a0d6fcc7 Cycles Denoising: Refactor denoiser tile handling
This deduplicates the calls for tile (un)mapping and allows to have a target buffer that is different from the source buffer (needed for baking and animation denoising).
2018-07-04 14:36:01 +02:00
Lukas Stockner
b10c64bd2f Cycles Denoising: Split main function into logical steps 2018-07-04 14:35:05 +02:00
Mai Lavelle
8c4d28cdb9 Fix T54400: Some GCN 1 cards available to select for use with Cycles
Hainan was missing from the list of GCN 1 cards.
2018-04-03 23:15:07 -04:00
Brecht Van Lommel
ce3e0afe59 Fix T54001: AMD OpenCL fails with certain resolutions, after recent changes.
We should actually be using CL_DEVICE_MEM_BASE_ADDR_ALIGN for sub buffers,
previous change in this code was incorrect. Renamed the function now to
make the specific purpose of this alignment clear, it's not required for
data types in general.
2018-02-05 22:19:49 +01:00
Sergey Sharybin
8e1dd7ed81 Cycles: Remove unneeded include statements
Also try to move them from headers to implementation files as much as possible.
2018-01-19 15:19:45 +01:00
Brecht Van Lommel
0fe41009f0 Fix T53830: Cycles OpenCL debug assert on macOS,
This was probably harmless besides some unnecessary memory usage due to
aligning allocations too much.
2018-01-19 11:35:07 +01:00
Lukas Stockner
fa3d50af95 Cycles: Improve denoising speed on GPUs with small tile sizes
Previously, the NLM kernels would be launched once per offset with one thread per pixel.
However, with the smaller tile sizes that are now feasible, there wasn't enough work to fully occupy GPUs which results in a significant slowdown.

Therefore, the kernels are now launched in a single call that handles all offsets at once.
This has two downsides: Memory accesses to accumulating buffers are now atomic, and more importantly, the temporary memory now has to be allocated for every shift at once, increasing the required memory.
On the other hand, of course, the smaller tiles significantly reduce the size of the memory.

The main bottleneck right now is the construction of the transformation - there is nothing to be parallelized there, one thread per pixel is the maximum.
I tried to parallelize the SVD implementation by storing the matrix in shared memory and launching one block per pixel, but that wasn't really going anywhere.

To make the new code somewhat readable, the handling of rectangular regions was cleaned up a bit and commented, it should be easier to understand what's going on now.
Also, some variables have been renamed to make the difference between buffer width and stride more apparent, in addition to some general style cleanup.
2017-11-30 07:37:08 +01:00
Lukas Stockner
40f528a7da Cycles: Add per-tile render time debug pass
Reviewers: sergey, brecht

Differential Revision: https://developer.blender.org/D2920
2017-11-17 16:40:24 +01:00
Brecht Van Lommel
bd4bea3e98 Cycles: avoid reallocating tile denoising memory many times during render. 2017-11-09 20:28:00 +01:00
Brecht Van Lommel
5801ef71e4 Code refactor: device memory cleanups, preparing for mapped host memory. 2017-11-05 15:22:04 +01:00
Mai Lavelle
5cb8730689 Cycles: Add another limit to OpenCL memory usage
Some drivers may report very large allocation sizes, which could cause
unnecessary memory usage. This is now limited to 2gb which should
still be enough to get the needed performance benefits without waste.
2017-11-02 08:14:21 -04:00
Brecht Van Lommel
34fe3f9c06 Code refactor: remove MEM_WRITE_ONLY, always use MEM_READ_WRITE.
It's unlikely the driver can do useful optimizations with this, and if
we sum multiple samples we are reading from the memory anyway.
2017-10-24 23:53:09 +02:00
Sergey Sharybin
eccd18a91f Cycles: Fix compilation error without C++11 2017-10-24 11:14:01 +02:00
Brecht Van Lommel
070a668d04 Code refactor: move more memory allocation logic into device API.
* Remove tex_* and pixels_* functions, replace by mem_*.
* Add MEM_TEXTURE and MEM_PIXELS as memory types recognized by devices.
* No longer create device_memory and call mem_* directly, always go
  through device_only_memory, device_vector and device_pixels.
2017-10-24 01:25:19 +02:00
Brecht Van Lommel
aa8b4c5d81 Code refactor: use device_only_memory and device_vector in more places. 2017-10-24 01:25:13 +02:00
Brecht Van Lommel
7ad9333fad Code refactor: store device/interp/extension/type in each device_memory. 2017-10-24 01:03:59 +02:00
Brecht Van Lommel
57a0cb797d Code refactor: avoid some unnecessary device memory copying. 2017-10-21 20:58:28 +02:00
Brecht Van Lommel
dc9eb8234f Cycles: combined CPU + GPU rendering support.
CPU rendering will be restricted to a BVH2, which is not ideal for raytracing
performance but can be shared with the GPU. Decoupled volume shading will be
disabled to match GPU volume sampling.

The number of CPU rendering threads is reduced to leave one core dedicated to
each GPU. Viewport rendering will also only use GPU rendering still. So along
with the BVH2 usage, perfect scaling should not be expected.

Go to User Preferences > System to enable the CPU to render alongside the GPU.

Differential Revision: https://developer.blender.org/D2873
2017-10-21 20:13:44 +02:00
Brecht Van Lommel
92611dada6 Fix T53098, T53079: OpenCL world texture errors after recent changes. 2017-10-18 03:13:25 +02:00
Brecht Van Lommel
23098cda99 Code refactor: make texture code more consistent between devices.
* Use common TextureInfo struct for all devices, except CUDA fermi.
* Move image sampling code to kernels/*/kernel_*_image.h files.
* Use arrays for data textures on Fermi too, so device_vector<Struct> works.
2017-10-07 14:53:14 +02:00