Commit Graph

134 Commits

Author SHA1 Message Date
Brecht Van Lommel
006025ead0 Cycles: support for different 3D transform per volume grid
This is not yet fully supported by automatic volume bounds but works fine in
most cases that will have mostly matching bounds.

Ref T73201
2020-03-18 11:23:05 +01:00
Brecht Van Lommel
26bea849cf Cleanup: add device_texture for images, distinct from other global memory
There was too much image texture specific stuff in device_memory, and too
much code duplication between devices.
2020-03-12 17:28:55 +01:00
Brecht Van Lommel
f01bc597a8 Cleanup: stop encoding image data type in slot index
This is legacy code from when we had a fixed number of textures.
2020-03-11 17:07:17 +01:00
Brecht Van Lommel
c8ac760c59 Cleanup: tweak Cycles #includes in preparation for clang-format sorting 2020-03-06 14:44:42 +01:00
Stefan Werner
51e898324d Adaptive Sampling for Cycles.
This feature takes some inspiration from
"RenderMan: An Advanced Path Tracing Architecture for Movie Rendering" and
"A Hierarchical Automatic Stopping Condition for Monte Carlo Global Illumination"

The basic principle is as follows:
While samples are being added to a pixel, the adaptive sampler writes half
of the samples to a separate buffer. This gives it two separate estimates
of the same pixel, and by comparing their difference it estimates convergence.
Once convergence drops below a given threshold, the pixel is considered done.

When a pixel has not converged yet and needs more samples than the minimum,
its immediate neighbors are also set to take more samples. This is done in order
to more reliably detect sharp features such as caustics. A 3x3 box filter that
is run periodically over the tile buffer is used for that purpose.

After a tile has finished rendering, the values of all passes are scaled as if
they were rendered with the full number of samples. This way, any code operating
on these buffers, for example the denoiser, does not need to be changed for
per-pixel sample counts.

Reviewed By: brecht, #cycles

Differential Revision: https://developer.blender.org/D4686
2020-03-05 12:21:38 +01:00
Patrick Mours
38589de10c Cycles: Add support for denoising in the viewport
The OptiX denoiser can be a great help when rendering in the viewport, since it is really fast
and needs few samples to produce convincing results. This patch therefore adds support for
using any Cycles denoiser in the viewport also (but only the OptiX one is selectable because
the NLM one is too slow to be usable currently). It also adds support for denoising on a
different device than rendering (so one can e.g. render with the CPU but denoise with OptiX).

Reviewed By: #cycles, brecht

Differential Revision: https://developer.blender.org/D6554
2020-02-11 18:03:43 +01:00
Patrick Mours
d5ca72191c Cycles: Add OptiX AI denoiser support
This patch adds support for the OptiX denoiser as an alternative to the existing NLM denoiser in Cycles. It's re-using the same denoising architecture based on tiles and therefore implicitly also works with multiple GPUs.

Reviewed By: sergey

Differential Revision: https://developer.blender.org/D6395
2020-01-08 16:53:11 +01:00
Patrick Mours
59aef0ad5d Fix T71255: Particle hair not showing in viewport with OptiX after scaling
The OptiX intersection program for curves uses "optixGetObjectRayDirection"
to get the ray direction in object space (which was inverse transformed
with the current transformation matrix). OptiX does no additional operations
on it, so if there is a scaling transform, the direction is not normalized.
But the curve intersection routine expects that. In addition, the distances
used in "optixGetRayTmax()" and "optixReportIntersection()" are in world
space, so need to adjust them accordingly.
2019-11-22 17:30:22 +01:00
Patrick Mours
53932f1f06 Cycles: add Optix support in the kernel
This adds all the kernel side changes for the Optix backend.

Ref D5363
2019-09-13 11:46:22 +02:00
Lazydodo
ea8e0df672 Fix T55054: possible use of unsupported instructions in Cycles texture code
Differential Revision: https://developer.blender.org/D5326
2019-08-16 16:49:04 +02:00
Campbell Barton
c47d669f24 Cleanup: comments (long lines) in cycles 2019-05-01 21:41:07 +10:00
Campbell Barton
e12c08e8d1 ClangFormat: apply to source, most of intern
Apply clang format as proposed in T53211.

For details on usage and instructions for migrating branches
without conflicts, see:

https://wiki.blender.org/wiki/Tools/ClangFormat
2019-04-17 06:21:24 +02:00
Jeroen Bakker
02a7e875d7 Cycles OpenCL: Remove single program
Part of the cleanup of the OpenCL codebase.
Single program is not effective when using OpenCL, it is slower
to compile and slower during rendering (when used in for example
`barbershop` or `victor`).

Reviewers: brecht, #cycles

Maniphest Tasks: T62267

Differential Revision: https://developer.blender.org/D4481
2019-03-08 16:31:35 +01:00
Jeroen Bakker
e6099c7e46 T61576: Do Not (Re-)Compile OpenCL kernels
The goal of this patch is to have limit the number of times
kernels needs to be compiled and are reused as kernels with
different compile directives can lead to identical same
binaries.

The implementation does this by stripping the compile directives.
and reshuffling kernels so the output is more likely to be the
same.

We focussed on the kernels where it was easy to detect and maintain
(bundle, bake, displace, do_volume and background). More optimizations
could be done but they are probably less obvious.

Merged the data_init and state_buffer_size kernels to split_bundle.

This patch will also remove empty kernels for do_volume and bake
when their features are not enabled.

When using the benchmark files there are less background, bake and
do_volume kernels compiled.

Fix: T61576, T61501, T61466

Reviewed By: brecht, #cycles

Differential Revision: https://developer.blender.org/D4390
2019-02-26 12:45:26 +01:00
Jeroen Bakker
a51d08f473 Fix: Missing closing brackets in include 2019-02-21 14:36:51 +01:00
Jeroen Bakker
fab6c5040d Fix: OpenCL Displacement and light sampling
The bake kernels are also used during mesh displacement and light
importance sampling. We disabled the implementation of these kernels
when baking was not enabled.
2019-02-21 08:11:02 +01:00
Jeroen Bakker
949ab753bb Cycles OpenCL: Remove OpenCL MegaKernel
Using OpenCL MegaKernel has been slow and therefore not usefull.
This patch will remove the mega kernel from the OpenCL codebase
and the OpenCLDeviceBase class.

T61736: removal of mega kernel
T61703: baking does not work with mega kernel

Tags: #cycles

Differential Revision: https://developer.blender.org/D4383
2019-02-20 15:17:22 +01:00
Jeroen Bakker
667033e89e T61463: Separate Baking kernels
Cycles OpenCL: Split baking kernels in own program

Fix T61463. Before this patch baking was part of the base kernels. There
are 3 baking kernels that and all 3 uses shader evaluation. Only for one
of these kernels the functionality was wrapped in the __NO_BAKING__
compile directive.

When you start baking this leads to long compile times. By separating
in individual programs will reduce the compile times.

Also wrapped all baking kernels with __NO_BAKING__ to reduce the
compilation times.

Impact on compilation time

    job   |   scene_name    | previous |  new  | percentage
  --------+-----------------+----------+-------+------------
   T61463 | empty           |    10.63 |  7.27 |         32%
   T61463 | bmw             |    17.91 | 14.24 |         20%
   T61463 | fishycat        |    19.57 | 15.08 |         23%
   T61463 | barbershop      |    54.10 | 48.18 |         11%
   T61463 | classroom       |    17.55 | 14.42 |         18%
   T61463 | koro            |    18.92 | 17.15 |          9%
   T61463 | pavillion       |    17.43 | 14.23 |         18%
   T61463 | splash279       |    16.48 | 15.33 |          7%
   T61463 | volume_emission |    36.22 | 34.19 |          6%

Impact on render time

    job   |   scene_name    | previous |   new   | percentage
  --------+-----------------+----------+---------+------------
   T61463 | empty           |    21.06 |   20.54 |          2%
   T61463 | bmw             |   198.44 |  189.59 |          4%
   T61463 | fishycat        |   394.20 |  388.50 |          1%
   T61463 | barbershop      |  1188.16 | 1185.49 |          0%
   T61463 | classroom       |   341.08 |  339.27 |          1%
   T61463 | koro            |   472.43 |  360.70 |         24%
   T61463 | pavillion       |   905.77 |  902.14 |          0%
   T61463 | splash279       |    55.26 |   54.92 |          1%
   T61463 | volume_emission |    62.59 |   39.09 |         38%

I don't have a grounded explanation why koro and volume_emission is this much
faster; I have done several tests though...

Maniphest Tasks: T61463

Differential Revision: https://developer.blender.org/D4376
2019-02-19 16:34:55 +01:00
Brecht Van Lommel
9800837b98 Cycles: Support multithreaded compilation of kernels
This patch implements a workaround to get the multithreaded compilation from D2231 working.
So far, it only works for Blender, not for Cycles Standalone. Also, I have only tested the Linux codepath in the helper function.
Depends on D2231.

Patch by lukasstockner97, jbakker, brecht

    job    |   scene_name    | compilation_time
----------+-----------------+------------------
    Baseline | empty           |            22.73
    D2264    | empty           |            13.94
    Baseline | bmw             |            56.44
    D2264    | bmw             |            41.32
    Baseline | fishycat        |            59.50
    D2264    | fishycat        |            45.19
    Baseline | barbershop      |           212.28
    D2264    | barbershop      |           169.81
    Baseline | victor          |            67.51
    D2264    | victor          |            53.60
    Baseline | classroom       |            51.46
    D2264    | classroom       |            39.02
    Baseline | koro            |            62.48
    D2264    | koro            |            49.03
    Baseline | pavillion       |            54.37
    D2264    | pavillion       |            38.82
    Baseline | splash279       |            47.43
    D2264    | splash279       |            37.94
    Baseline | volume_emission |           145.22
    D2264    | volume_emission |           121.10

This patch reduced compilation time as the split kernels and base
kernels are compiled in parallel. In cycles debug mode (256) you can set
unmark the opencl single program file, what reduces the compilation time
even further (bmw 17 seconds, barbershop 53 seconds).

Reviewers: brecht, dingto, sergey, juicyfruit, lukasstockner97

Reviewed By: brecht

Subscribers: Loner, jbakker, candreacchio, 3dLuver, LazyDodo, bliblubli

Differential Revision: https://developer.blender.org/D2264
2019-02-15 08:56:20 +01:00
Lukas Stockner
fccf506ed7 Cycles: animation denoising support in the kernel.
This is the internal implementation, not available from the API or
interface yet. The algorithm takes into account past and future frames,
both to get more coherent animation and reduce noise.

Ref D3889.
2019-02-06 15:18:42 +01:00
Lukas Stockner
405cacd4cd Cycles: prefilter feature passes separate from denoising.
Prefiltering of feature passes will happen during rendering, which can
then be used for denoising immediately or written as a render pass for
later (animation) denoising.

The number of denoising data passes written is reduced because of this,
leaving out the feature variance passes. The passes are now Normal,
Albedo, Depth, Shadowing, Variance and Intensity.

Ref D3889.
2019-02-06 15:18:29 +01:00
Brecht Van Lommel
b14ec18601 Cycles: add initial CUDA 10.0 support, but only recommend use for Turing cards.
There may still be rendering errors when used for older graphics cards.
2018-12-04 16:03:18 +01:00
Sergey Sharybin
203de0bbf0 Cycles: Cleanup, space after (void)
It was used in like 95% of places.
2018-11-09 12:08:51 +01:00
Sergey Sharybin
cb4b5e12ab Cycles: Cleanup, spacing after preprocessor
It is supposed to be two spaces before comment stating which if
else/endif statements corresponds to. Was mainly violated in the
header guards.
2018-11-09 11:34:54 +01:00
Stefan Werner
e58c6cf0c6 Cycles: Added Cryptomatte output.
This allows for extra output passes that encode automatic object and material masks
for the entire scene. It is an implementation of the Cryptomatte standard as
introduced by Psyop. A good future extension would be to add a manifest to the
export and to do plenty of testing to ensure that it is fully compatible with other
renderers and compositing programs that use Cryptomatte.

Internally, it adds the ability for Cycles to have several passes of the same type
that are distinguished by their name.

Differential Revision: https://developer.blender.org/D3538
2018-10-28 05:37:41 -04:00
Lukas Stockner
15e9d80375 Cycles: Use existing shared temporary memory in reconstruction step of the denoiser
Previously the code allocated its own temporary memory, but it's possible to just use the existing shared one instead.
2018-10-08 22:13:40 +02:00
Lukas Stockner
a0cc7bd961 Cycles: Implement vectorized NLM kernels for faster CPU denoising 2018-10-06 21:49:54 +02:00
Stefan Werner
df30b50f2f Cycles: Enabled half precision textures for OpenCL devices that support the cl_khr_fp16 extension. 2018-07-06 11:42:34 +02:00
Campbell Barton
1daa20ad9f Cleanup: strip trailing space for cycles 2018-07-06 10:17:58 +02:00
Stefan Werner
4d00e95ee3 Cycles: Adding native support for UINT16 textures.
Textures in 16 bit integer format are sometimes used for displacement, bump and normal maps and can be exported by tools like Substance Painter. Without this patch, Cycles would promote those textures to single precision floating point, causing them to take up twice as much memory as needed.

Reviewers: #cycles, brecht, sergey

Reviewed By: #cycles, brecht, sergey

Subscribers: sergey, dingto, #cycles

Tags: #cycles

Differential Revision: https://developer.blender.org/D3523
2018-07-05 13:53:34 +02:00
Lukas Stockner
c960804747 Cycles Denoising: Pass tile buffers to every OpenCL kernel to conform to standard and get rid of set_tile_info 2018-07-04 14:38:03 +02:00
Lukas Stockner
9db8bdbc65 Cycles Denoising: Cleanup: Rename tiles to tile_info 2018-07-04 14:37:24 +02:00
Lukas Stockner
3ee606621c Cycles: Query XYZ to/from Scene Linear conversion from OCIO instead of assuming sRGB
I've limited it to just the RGB<->XYZ stuff for now, correct image handling is the next step.

Reviewers: brecht, sergey

Differential Revision: https://developer.blender.org/D3478
2018-06-14 22:21:37 +02:00
Ray Molenkamp
81060ff6b2 Windows: Add support for building with clang.
This commit contains the minimum to make clang build/work with blender, asan and ninja build support is forthcoming

Things to note:

1) Builds and runs, and is able to pass all tests (except for the freestyle_stroke_material.blend test which was broken at that time for all platforms by the looks of it)

2) It's slightly faster than msvc when using cycles. (time in seconds, on an i7-3370)

victor_cpu
	msvc:3099.51
	clang:2796.43

pavillon_barcelona_cpu
	msvc:1872.05
	clang:1827.72

koro_cpu
	msvc:1097.58
	clang:1006.51

fishy_cat_cpu
	msvc:815.37
	clang:722.2

classroom_cpu
	msvc:1705.39
	clang:1575.43

bmw27_cpu
	msvc:552.38
	clang:561.53

barbershop_interior_cpu
	msvc:2134.93
	clang:1922.33

3) clang on windows uses a drop in replacement for the Microsoft cl.exe (takes some of the Microsoft parameters, but not all, and takes some of the clang parameters but not all) and uses ms headers + libraries + linker, so you still need visual studio installed and will use our existing vc14 svn libs.

4) X64 only currently, X86 builds but crashes on startup.

5) Tested with llvm/clang 6.0.0

6) Requires visual studio integration, available at https://github.com/LazyDodo/llvm-vs2017-integration

7) The Microsoft compiler spawns a few copies of cl in parallel to get faster build times, clang doesn't, so the build time is 3-4x slower than with msvc.

8) No openmp support yet. Have not looked at this much, the binary distribution of clang doesn't seem to include it on windows.

9) No ASAN support yet, some of the sanitizers can be made to work, but it was decided to leave support out of this commit.

Reviewers: campbellbarton

Differential Revision: https://developer.blender.org/D3304
2018-05-28 14:34:47 -06:00
Brecht Van Lommel
1dcd7db73d Code cleanup: remove some more unused code after recent CUDA changes. 2018-02-18 00:53:03 +01:00
Thomas Dinges
9e717c0495 Cycles: Remove Fermi texture code.
This should be the last Fermi removal commit, unless I missed something.
It's been a pleasure Fermi!
2018-02-17 22:56:58 +01:00
Thomas Dinges
e1ef902058 Cycles: Remove fermi related defines from the code.
Did not touch Texture related defines, that comes next.
2018-02-17 22:19:54 +01:00
Sergey Sharybin
54632dc830 Cycles: Remove util_debug include from kernel code
Not sure why it was in there, all the debug flags stuff is to be handled outside
of kernel.
2018-01-19 15:21:34 +01:00
Sergey Sharybin
2e8914549b Cycles: Fix difference in image Clip extension method between CPU and GPU
Our own implementation was behaving different comparing to OSL and GPU,
namely on the border pixels OSL and CUDA was doing interpolation with
black, but we were clamping coordinate.

This partially fixes issue reported in T53452.

Similar change should also be done for 3D interpolation perhaps, but this
is to be investigated separately.
2017-12-08 12:03:11 +01:00
Sergey Sharybin
f31fb4a014 Cycles: Cleanup, split 2D interpolation function 2017-12-08 11:22:04 +01:00
Lukas Stockner
fa3d50af95 Cycles: Improve denoising speed on GPUs with small tile sizes
Previously, the NLM kernels would be launched once per offset with one thread per pixel.
However, with the smaller tile sizes that are now feasible, there wasn't enough work to fully occupy GPUs which results in a significant slowdown.

Therefore, the kernels are now launched in a single call that handles all offsets at once.
This has two downsides: Memory accesses to accumulating buffers are now atomic, and more importantly, the temporary memory now has to be allocated for every shift at once, increasing the required memory.
On the other hand, of course, the smaller tiles significantly reduce the size of the memory.

The main bottleneck right now is the construction of the transformation - there is nothing to be parallelized there, one thread per pixel is the maximum.
I tried to parallelize the SVD implementation by storing the matrix in shared memory and launching one block per pixel, but that wasn't really going anywhere.

To make the new code somewhat readable, the handling of rectangular regions was cleaned up a bit and commented, it should be easier to understand what's going on now.
Also, some variables have been renamed to make the difference between buffer width and stride more apparent, in addition to some general style cleanup.
2017-11-30 07:37:08 +01:00
Stefan Werner
58a15b2bfe Cycles: Fixed compilation of CUDA kernels. Follow-up fix for my last commit. 2017-11-21 10:43:40 +01:00
Stefan Werner
1febc85855 Cycles: Workaround for performance loss with the CUDA 9.0 SDK.
CUDA 9.0.176 apparently caused some slow down on high-end Pascal cards that can be mitigated by increasing the number of registers. See https://developer.blender.org/F1142667 for a detailed comparison.
2017-11-21 10:29:11 +01:00
Brecht Van Lommel
5801ef71e4 Code refactor: device memory cleanups, preparing for mapped host memory. 2017-11-05 15:22:04 +01:00
Brecht Van Lommel
7ad9333fad Code refactor: store device/interp/extension/type in each device_memory. 2017-10-24 01:03:59 +02:00
Brecht Van Lommel
2e50add164 Fix OpenCL performance regression after cubic interpolation.
Reorganize code to reduce register pressure.
2017-10-15 17:46:50 +02:00
Sergey Sharybin
8d73ba58b6 Cycles: Fix compilation of sm_20 and sm_21 kernels
Was broken since the bicubic commit for GPU support.
2017-10-10 12:26:02 +05:00
Brecht Van Lommel
f61c340bc1 Cycles: OpenCL bicubic and tricubic texture interpolation support. 2017-10-08 02:55:44 +02:00
Brecht Van Lommel
2d92988f6b Cycles: CUDA bicubic and tricubic texture interpolation support.
While cubic interpolation is quite expensive on the CPU compared to linear
interpolation, the difference on the GPU is quite small.
2017-10-07 15:30:57 +02:00
Brecht Van Lommel
23098cda99 Code refactor: make texture code more consistent between devices.
* Use common TextureInfo struct for all devices, except CUDA fermi.
* Move image sampling code to kernels/*/kernel_*_image.h files.
* Use arrays for data textures on Fermi too, so device_vector<Struct> works.
2017-10-07 14:53:14 +02:00