Commit Graph

817 Commits

Author SHA1 Message Date
Brecht Van Lommel
2d8c59ccb9 Fix T77095: fix Cycles performance regression with AMD RX cards
Apply the workaround only for known problematic drivers. The latest pro driver
appears to work correctly, hopefully the regular driver will as well once it
is updated to the same OpenCL driver version (3075.13).
2020-06-30 12:01:40 +02:00
Brecht Van Lommel
fb68a30af6 Fix crash compiling Cycles OpenCL, after recent TBB changes 2020-06-26 17:44:24 +02:00
Campbell Barton
fd5c185beb Cleanup: spelling 2020-06-25 23:14:36 +10:00
Brecht Van Lommel
b4e1571d0b Cleanup: compiler warnings 2020-06-24 17:25:44 +02:00
Brecht Van Lommel
669befdfbe Cycles: add Intel OpenImageDenoise support for viewport denoising
Compared to Optix denoise, this is usually slower since there is no GPU
acceleration. Some optimizations may still be possible, in avoid copies
to the GPU and/or denoising less often.

The main thing is that this adds viewport denoising support for computers
without an NVIDIA GPU (as long as the CPU supports SSE 4.1, which is nearly
all of them).

Ref T76259
2020-06-24 15:17:36 +02:00
Brecht Van Lommel
0a3bde6300 Cycles: add denoising settings to the render properties
Enabling render and viewport denoising is now both done from the render
properties. View layers still can individually be enabled/disabled for
denoising and have their own denoising parameters.

Note that the denoising engine also affects how denoising data passes are
output even if no denoising happens on the render itself, to make the passes
compatible with the engine.

This includes internal refactoring for how denoising parameters are passed
along, trying to avoid code duplication and unclear naming.

Ref T76259
2020-06-24 15:17:36 +02:00
Brecht Van Lommel
207338bb58 Cycles: port curve-ray intersection from Embree for use in Cycles GPU
This keeps render results compatible for combined CPU + GPU rendering.
Peformance and quality primitives is quite different than before. There
are now two options:

* Rounded Ribbon: render hair as flat ribbon with (fake) rounded normals, for
  fast rendering. Hair curves are subdivided with a fixed number of user
  specified subdivisions.

  This gives relatively good results, especially when used with the Principled
  Hair BSDF and hair viewed from a typical distance. There are artifacts when
  viewed closed up, though this was also the case with all previous primitives
  (but different ones).

* 3D Curve: render hair as 3D curve, for accurate results when viewing hair
  close up. This automatically subdivides the curve until it is smooth.

  This gives higher quality than any of the previous primitives, but does come
  at a performance cost and is somewhat slower than our previous Thick curves.

The main problem here is performance. For CPU and OpenCL rendering performance
seems usually quite close or better for similar quality results.

However for CUDA and Optix, performance of 3D curve intersection is problematic,
with e.g. 1.45x longer render time in Koro (though there is no equivalent quality
and rounded ribbons seem fine for that scene). Any help or ideas to optimize this
are welcome.

Ref T73778

Depends on D8012

Maniphest Tasks: T73778

Differential Revision: https://developer.blender.org/D8013
2020-06-22 13:28:01 +02:00
Brecht Van Lommel
d1ef5146d7 Cycles: remove SIMD BVH optimizations, to be replaced by Embree
Ref T73778

Depends on D8011

Maniphest Tasks: T73778

Differential Revision: https://developer.blender.org/D8012
2020-06-22 13:28:01 +02:00
Brecht Van Lommel
e50f1ddc65 Cycles: use TBB for task pools and task scheduler
No significant performance improvement is expected, but it means we have a
single thread pool throughout Blender. And it should make adding more
parallellization in the future easier.

After previous refactoring commits this is basically a drop-in replacement.
One difference is that the task pool had a mechanism for scheduling tasks to
the front of the queue to minimize memory usage. TBB has a smarter algorithm
to balance depth-first and breadth-first scheduling of tasks and we assume that
removes the need to manually provide hints to the scheduler.

Fixes T77533
2020-06-22 13:27:37 +02:00
Brecht Van Lommel
54e3487c9e Cleanup: remove task pool stop() and finished() 2020-06-22 13:06:47 +02:00
Brecht Van Lommel
b10b7cdb43 Cleanup: use lambdas instead of functors for task pools, remove threadid 2020-06-22 13:06:47 +02:00
Brecht Van Lommel
ace3268482 Cleanup: minor refactoring around DeviceTask 2020-06-22 13:06:47 +02:00
Brecht Van Lommel
6899cb3c07 Fix for T77095: work around render artifacts with AMD Radeon RX 4xx and 5xx 2020-06-18 14:41:51 +02:00
Brecht Van Lommel
fc7c34e380 Cleanup: fix compiler warnings 2020-06-17 14:36:51 +02:00
Patrick Mours
b586f801fc Cycles: Improve CUDA and OptiX error reporting in the viewport
This patch makes the infamous "Cancel" error in the viewport a thing of the past. Instead it
now shows a more useful error message and streamlines the error handling process in CUDA.

Reviewed By: brecht

Differential Revision: https://developer.blender.org/D8008
2020-06-12 18:24:15 +02:00
Brecht Van Lommel
faf5f7b63d Cleanup: fix compiler warning after recent changes
It would be good to use override for all member functions, but doing it for
only somes generates compiler warning.
2020-06-10 20:34:01 +02:00
Patrick Mours
f367f1e5a5 Cycles: Improve OptiX viewport denoising performance with CUDA rendering
With this patch Cycles recognizing when a logical OptiX and CUDA device represent the same
physical GPU and attempts to eliminate unnecessary tile copies for viewport rendering if that
is the case for all active devices. In addition, denoising is now no longer performed on the first
available OptiX device only, but instead it will try to match CUDA and OptiX
rendering/denoising devices exactly to maximize utilization.

Reviewed By: brecht

Differential Revision: https://developer.blender.org/D7975
2020-06-10 14:12:13 +02:00
Patrick Mours
9f7d84b656 Cycles: Add support for P2P memory distribution (e.g. via NVLink)
This change modifies the multi-device implementation to support memory distribution
across devices, to reduce the overall memory footprint of large scenes and allow scenes to
fit entirely into combined GPU memory that previously had to fall back to host memory.

Reviewed By: brecht

Differential Revision: https://developer.blender.org/D7426
2020-06-08 17:55:49 +02:00
Patrick Mours
473aaa389c Cycles: Enable OptiX on all Maxwell+ GPUs 2020-06-05 12:33:00 +02:00
Patrick Mours
49c295813b Merge branch 'blender-v2.83-release' 2020-05-27 15:31:03 +02:00
Patrick Mours
28d9368538 Fix T76947: Optix realtime denoiser progressively reduces brightness of very bright objects
The input data to the OptiX denoiser was clamped to 0..10000 as required, but it could easily
exceed that range with a high number of samples (since the data contains the overall sum). To
fix that, divide by the number of samples first and multiply it back in after the denoiser ran.
2020-05-27 15:17:47 +02:00
Brecht Van Lommel
d9773edaa3 Cycles: code refactor to bake using regular render session and tiles
There should be no user visible change from this, except that tile size
now affects performance. The goal here is to simplify bake denoising in
D3099, letting it reuse more denoising tiles and pass code.

A lot of code is now shared with regular rendering, with the two main
differences being that we read some render result passes from the bake API
when starting to render a tile, and call the bake kernel instead of the
path trace kernel.

With this kind of design where Cycles asks for tiles from the bake API,
it should eventually be easier to reduce memory usage, show tiles as
they are baked, or bake multiple passes at once, though there's still
quite some work needed for that.

Reviewers: #cycles

Subscribers: monio, wmatyjewicz, lukasstockner97, michaelknubben

Differential Revision: https://developer.blender.org/D3108
2020-05-15 20:25:24 +02:00
Brecht Van Lommel
97f50c71b9 Fix --debug-cycles printing CUDA devices twice
Reuse the CUDA devices list for Optix device detection.
2020-05-14 16:07:22 +02:00
Brecht Van Lommel
d97c83712c Cycles: mark CUDA 10.2 as officially supported
It appears to work fine after a recent bugfix and testing for the past few
weeks.
2020-05-05 15:06:49 +02:00
Ray Molenkamp
aeb42cf8ab Cycles/Optix: Support building the optix kernels on demand.
CMake: `WITH_CYCLES_DEVICE_OPTIX` did not respect `WITH_CYCLES_CUDA_BINARIES` causing the optix kernel to be always build at build time.

Code: `device_optix.cpp` did not count on the optix kernel not existing in the default location.

For this to work, one should have before starting blender

1) working nvcc environment
2) Optix SDK installed and the OPTIX_ROOT_DIR environment variable pointing to it which is not set by default

Differential Revision: https://developer.blender.org/D7400

Reviewed By: Brecht
2020-04-11 12:59:21 -06:00
Brecht Van Lommel
53981c7fb6 Cleanup: refactor adaptive sampling to more easily change some parameters
No functional changes yet, this is work towards making CPU and GPU results
match more closely.
2020-04-07 20:29:48 +02:00
Ray Molenkamp
58ea0d93f1 Cycles/Optix: Add CYCLES_OPTIX_TEST override
This works similarly to the CYCLES_OPENCL_TEST
environment variable to allow testing on unsupported
hardware.

Note: like the OPENCL test override, this is
for *testing* only and bug reports on unsupported
hardware will *not* be accepted at this point in
time.
2020-03-26 11:30:17 -06:00
Brecht Van Lommel
f48d15a861 Cycles: limit number of processes compiling OpenCL kernel based on memory
The numbers here can probably be tweaked to be better, but it's hard to
predict and this should at least avoid excessive memory swapping.

Fixes T57064.
2020-03-25 16:39:37 +01:00
Brecht Van Lommel
394a1373a0 Cycles: use OpenCL C 2.0 if available, to improve performance for AMD
Tested with AMD Radeon Pro WX 9100, where it brings performance back to 2.80
level, and combined with recent changes is about 2-15% faster than 2.80 in
our benchmark scenes.

This somehow appears to specifically address the issue where adding more shader
nodes leads to slower runtime. I found no additional speedup by applying this
to change to 2.80 or removing the new shader node code.

Ref T71479

Patch by Jeroen Bakker.

Differential Revision: https://developer.blender.org/D6252
2020-03-24 20:09:36 +01:00
Ray Molenkamp
44c6b6615b OpenCL: Bring back CYCLES_OPENCL_TEST override
Back in 2.79 you could either use the debug panel or an
environment variable to override using OpenCL for unsupported
hardware. Which was rather useful for developers when testing
on NVidia just to be sure the CL kernels at-least build properly.

This broke in rB949ab753bb2

This diff restores testing though the CYCLES_OPENCL_TEST
environment variable.

Differential Revision: https://developer.blender.org/D7202

Reviewers: brecht
2020-03-21 11:55:45 -06:00
Dalai Felinto
2d1cce8331 Cleanup: make format after SortedIncludes change 2020-03-19 09:33:58 +01:00
Brecht Van Lommel
472534d16e Fix memory leak in recent Cycles image texture refactor 2020-03-12 20:30:49 +01:00
Brecht Van Lommel
26bea849cf Cleanup: add device_texture for images, distinct from other global memory
There was too much image texture specific stuff in device_memory, and too
much code duplication between devices.
2020-03-12 17:28:55 +01:00
Brecht Van Lommel
21821601f2 Fix Optix build error on Linux with some compilers 2020-03-11 20:35:38 +01:00
Brecht Van Lommel
f01bc597a8 Cleanup: stop encoding image data type in slot index
This is legacy code from when we had a fixed number of textures.
2020-03-11 17:07:17 +01:00
Brecht Van Lommel
dcdcc23488 Fix T74504: Cycles wrong progress bar with CPU adaptive sampling 2020-03-06 23:46:58 +01:00
Brecht Van Lommel
b31b44c223 Fix error in Cycles Optix adaptive sampling after recent cleanup 2020-03-06 23:46:58 +01:00
Dalai Felinto
c60366c01f Cycles: cleanup warning 2020-03-06 15:27:50 +01:00
Brecht Van Lommel
c8ac760c59 Cleanup: tweak Cycles #includes in preparation for clang-format sorting 2020-03-06 14:44:42 +01:00
Campbell Barton
8574d68aa0 Cleanup: spelling 2020-03-06 11:52:32 +11:00
Patrick Mours
88db9a17ce Fix T74393: Cycles crashes when both OSL and Optix Denoising are enabled
Enabling viewport denoising causes Cycles to use a multi-device, which always returned NULL when
asked for OSL memory and would subsequently crash. This fixes that by returning the correct OSL
memory pointer from the CPU device in the special viewport denoising multi-device.
2020-03-05 16:28:31 +01:00
Stefan Werner
51e898324d Adaptive Sampling for Cycles.
This feature takes some inspiration from
"RenderMan: An Advanced Path Tracing Architecture for Movie Rendering" and
"A Hierarchical Automatic Stopping Condition for Monte Carlo Global Illumination"

The basic principle is as follows:
While samples are being added to a pixel, the adaptive sampler writes half
of the samples to a separate buffer. This gives it two separate estimates
of the same pixel, and by comparing their difference it estimates convergence.
Once convergence drops below a given threshold, the pixel is considered done.

When a pixel has not converged yet and needs more samples than the minimum,
its immediate neighbors are also set to take more samples. This is done in order
to more reliably detect sharp features such as caustics. A 3x3 box filter that
is run periodically over the tile buffer is used for that purpose.

After a tile has finished rendering, the values of all passes are scaled as if
they were rendered with the full number of samples. This way, any code operating
on these buffers, for example the denoiser, does not need to be changed for
per-pixel sample counts.

Reviewed By: brecht, #cycles

Differential Revision: https://developer.blender.org/D4686
2020-03-05 12:21:38 +01:00
Patrick Mours
af54bbd61c Cycles: Rework tile scheduling for denoising
This fixes denoising being delayed until after all rendering has finished. Instead, tile-based
denoising is now part of the "RENDER" task again, so that it is all in one task and does not
cause issues with dedicated task pools where tasks are serialized.

Reviewed By: brecht

Differential Revision: https://developer.blender.org/D6940
2020-02-28 16:12:29 +01:00
Patrick Mours
0cea9353fd Fix CUDA out of memory error with OptiX viewport denoising on small GPUs
This makes the memory allocation for the denoiser state use the memory allocator in Cycles, which
will evict textures to host memory when there is not enough space on the device. This means the
allocation for the denoiser state won't just fail if there is no more space and instead more space is
made for it to work. Also simplifies code somewhat.
2020-02-28 15:58:17 +01:00
Patrick Mours
a93043153a Cleanup: Remove superfluous "cuda_device_ptr" function 2020-02-25 17:13:59 +01:00
Dalai Felinto
213b4f76ee Cleanup: make format 2020-02-19 18:44:22 +01:00
Patrick Mours
2278aa0da9 Cycles: Add support for adaptive kernel compilation to OptiX device
This modifies the common CUDA implementation for adaptive kernel compilation slightly to support both CUBIN and PTX output (the latter which is then used in the OptiX device). It also fixes adaptive kernel compilation on Windows.

Reviewed By: brecht

Differential Revision: https://developer.blender.org/D6851
2020-02-17 14:27:44 +01:00
Ray molenkamp
9339dc6dd1 Fix T70685: Cycles crash using WITH_CYCLES_NATIVE_ONLY on Windows
MSVC does not have -march=native, so the kernel gets built without AVX2 and
BVH8 support. The code assumed it to be available and crashed

Differential Revision: https://developer.blender.org/D6082
2020-02-14 13:55:11 +01:00
Patrick Mours
0d750d7c06 Fix OptiX denoising when multiple CUDA streams are active 2020-02-13 15:22:26 +01:00
Patrick Mours
63bde1063f Cleanup: Remove some unnecessary OptiX device code 2020-02-13 15:22:26 +01:00