Commit Graph

150 Commits

Author SHA1 Message Date
Aras Pranckevicius
1c9d7d5267 Fix reported build failure due to missing ostream include
On FreeBSD 13 / clang 15 the missing include apparently does not
get included indirectly, see
https://projects.blender.org/blender/blender/pulls/111063#issuecomment-1033764
2023-09-29 17:59:14 +03:00
Stefan Werner
d7f1e6fb12 Cycles: GPU denoising refactor
Moved more generic code from OptiX to GPU denoiser in order to reuse
it for OIDN GPU support.

Pull Request: https://projects.blender.org/blender/blender/pulls/112229
2023-09-11 17:09:23 +02:00
Campbell Barton
24a8d6425a CMake: include missing files in source files 2023-08-24 11:51:25 +10:00
Aras Pranckevicius
7875074532 Cleanup: fewer iostreams related includes in Cycles
In the commonly used cycles headers, it's enough to include
much smaller <iosfwd> than the full <iostream>. While looking at it,
removed inclusion of some other headers from commonly used headers,
that seemed to not be needed.

Pull Request: https://projects.blender.org/blender/blender/pulls/111063
2023-08-15 13:55:38 +02:00
Campbell Barton
c12994612b License headers: use SPDX-FileCopyrightText in intern/cycles 2023-06-14 16:53:23 +10:00
Sergey Sharybin
ba3f26fac5 Cycles: light and shadow linking
With light linking, lights can be set to affect only specific objects in the
scene. Shadow linking additionally gives control over which objects acts a
shadow blockers for a light.

Usage:
https://wiki.blender.org/wiki/Reference/Release_Notes/4.0/Cycles

Implementation:
https://wiki.blender.org/wiki/Source/Render/Cycles/LightLinking

Ref #104972
Co-authored-by: Brecht Van Lommel <brecht@blender.org>
2023-05-24 14:11:47 +02:00
Sebastian Herholz
493856427d Cycles: bumping OpenPGL minimum version to 0.5 and removing version checks 2023-05-23 13:23:09 +02:00
Sebastian Herholz
8d17458569 Cycles: Path Guiding: Adding guiding on glossy surfaces via RIS
Pull Request: https://projects.blender.org/blender/blender/pulls/107782
2023-05-22 16:47:05 +02:00
Nikita Sirgienko
04fc6fd8a7 Cycles: avoid doing zero-sized allocations with partitioned shader sorting 2023-05-17 11:07:56 +02:00
Brecht Van Lommel
36e5157693 Cleanup: remove redundant lerp function, mix already does the same 2023-05-12 21:00:52 +02:00
Campbell Barton
3958ae7241 Cleanup: use STRNCPY, SNPRINTF macros 2023-05-09 14:08:19 +10:00
Campbell Barton
6859bb6e67 Cleanup: format (with BraceWrapping::AfterControlStatement "MultiLine") 2023-05-02 09:37:49 +10:00
Sahar A. Kashi
557a245dd5 Cycles: add HIP RT device, for AMD hardware ray tracing on Windows
HIP RT enables AMD hardware ray tracing on RDNA2 and above, and falls back to a
to shader implementation for older graphics cards. It offers an average 25%
sample rendering rate improvement in Cycles benchmarks, on a W6800 card.

The ray tracing feature functions are accessed through HIP RT SDK, available on
GPUOpen. HIP RT traversal functionality is pre-compiled in bitcode format and
shipped with the SDK.

This is not yet enabled as there are issues to be resolved, but landing the
code now makes testing and further changes easier.

Known limitations:
* Not working yet with current public AMD drivers.
* Visual artifact in motion blur.
* One of the buffers allocated for traversal has a static size. Allocating it
  dynamically would reduce memory usage.
* This is for Windows only currently, no Linux support.

Co-authored-by: Brecht Van Lommel <brecht@blender.org>

Ref #105538
2023-04-25 20:19:43 +02:00
Sergey Sharybin
daaed83a32 Fix set but unused variables in Cycles 2023-04-19 10:02:09 +02:00
Sergey Sharybin
7982d86117 Fix unqualified access to std::move in Cycles 2023-04-19 10:02:09 +02:00
Xavier Hallade
9821a2d397 Cycles: pass kernel features to get_bvh_layout_mask
This allows to selectively disable Hardware Raytracing in oneAPI
backend, depending on features used.
2023-04-18 22:09:42 +02:00
Brecht Van Lommel
0bc957063c Fix #106405: Cycles multi GPU crash with vertex color baking
Avoid division by zero when one of the devices gets no work.
2023-04-17 15:31:35 +02:00
Sebastian Parborg
aa6e95281f Add support for OpenPGL 0.5.0
Some functions changed slightly for this non beta release.
No functional changes though as we didn't use what was removed.

Pull Request: https://projects.blender.org/blender/blender/pulls/106861
2023-04-13 11:44:35 +02:00
Sergey Sharybin
44d5a894c1 Fix #106667: Cycles: Multi-device denoise runs denoising data passes
Only use the denoised buffer for access of denoised passes, and
access the rest of the passes from the original render buffer.

This allows in-place modification of the guiding passes needed
by the denoiser without affecting the final render result pixels.

Pull Request: https://projects.blender.org/blender/blender/pulls/106668
2023-04-07 16:49:43 +02:00
Stefan Werner
a76bf65c9d Cycles: Refactored GPU denoising code
To prepare for OIDN2 with GPU support, some of the code that was exclusive to the OptiXDenoiser is being moved to the DenoiserGPU superclass.

Co-authored-by: Stefan Werner <stefan.werner@intel.com>
Pull Request: https://projects.blender.org/blender/blender/pulls/106496
2023-04-05 11:19:15 +02:00
Campbell Barton
440cccecdc Cleanup: spelling in comments 2023-04-05 14:39:51 +10:00
Sergey Sharybin
d32d787f5f Clang-Format: Allow empty functions to be single-line
For example

```
OIIOOutputDriver::~OIIOOutputDriver()
{
}
```

becomes

```
OIIOOutputDriver::~OIIOOutputDriver() {}
```

Saves quite some vertical space, which is especially handy for
constructors.

Pull Request: https://projects.blender.org/blender/blender/pulls/105594
2023-03-29 16:50:54 +02:00
Campbell Barton
7cda559d7c Cleanup: format, spelling, struct member comment 2023-03-20 11:12:34 +11:00
Alaska
0963ee559e Cycles: adjust resolution divider to achieve a more usable viewport
This changes the maximum viewport resolution divider for Cycles to
help users get a more responsive viewport.

This is done by changing the maximum viewport resolution divider
to a divider that aims to have the largest axis of the viewport
roughly equal to 128 pixels.

Depending on the circumstances, this change can result in a few
noticeable differences:
 - Users with slow hardware and a large pixel_size, or slow hardware
 and a low resolution screen, may observe a higher resolution viewport
 during navigation, making the scene more readable. However this comes
 at the cost of reduced responsiveness.

 - Users with slow hardware and a low pixel_size and high
 resolution screen may observe a lower resolution viewport during
 navigation, providing a more responsive viewport during navigation.

Along with that, how Cycles iterates through resolution dividers
is changed to promote quick transitions between resolution dividers.
Meaning users don't need to wait through as many iterations to get
from a low navigation resolution to a 1:1 viewport resolution.

Pull Request: https://projects.blender.org/blender/blender/pulls/105581
2023-03-17 11:15:58 +01:00
Campbell Barton
b3625e6bfd Cleanup: comment blocks 2023-03-09 10:39:49 +11:00
Campbell Barton
0fa34aa0ec Cleanup: spelling in comments, reference enum types in doc-strings
Also use doxy formatting for structs in sculpt_uv.c.
2023-02-14 10:29:48 +11:00
Alaska
9fecf1f8b8 Cycles: Replace resolution divider loop with an analytical formula
As a side effect of this change, more resolution divisions are now available.
Before this patch the possible resolution divisions were all powers of two.
Now the possible resolution divisions are the multiples of pixel_size.

This increase in possible resolution divisions is the same idea proposed in https://archive.blender.org/developer/D13590.
In that patch there were concerns that this will increase the time between a user navigating
and seeing the 1:1 render. To my knowledge this is a non-issue and there should be
little to no increase in time between those two events.

Pull Request #104450
2023-02-13 13:02:47 +01:00
Michael Jones
654e1e901b Cycles: Use local atomics for faster shader sorting (enabled on Metal)
This patch adds two new kernels: SORT_BUCKET_PASS and SORT_WRITE_PASS. These replace PREFIX_SUM and SORTED_PATHS_ARRAY on supported devices (currently implemented on Metal, but will be trivial to enable on the other backends). The new kernels exploit sort partitioning (see D15331) by sorting each partition separately using local atomics. This can give an overall render speedup of 2-3% depending on architecture. As before, we fall back to the original non-partitioned sorting when the shader count is "too high".

Reviewed By: brecht

Differential Revision: https://developer.blender.org/D16909
2023-02-06 11:18:26 +00:00
Campbell Barton
66dee44088 CMake: quiet references to undeclared variable warnings
These warnings can reveal errors in logic, so quiet them by checking
if the features are enabled before using variables or by assigning
empty strings in some cases.

- Check CMAKE_THREAD_LIBS_INIT is set before use as CMake docs
  note that this may be left unset if it's not needed.
- Remove BOOST/OPENVDB/VULKAN references when disable.
- Define INC_SYS even when empty.
- Remove PNG_INC from freetype (not defined anywhere).
2023-01-19 17:10:42 +11:00
Michael Jones
77c3e67d3d Cycles: Improved render start/stop responsiveness on Metal
All kernel specialisation is now performed in the background regardless of kernel type, meaning that the first render will be visible a few seconds sooner. The only exception is during benchmark warm up, in which case we wait for all kernels to be cached. When stopping a render, we call a new `cancel()` method on the device which causes any outstanding compilation work to be cancelled, and we destroy the device in a detached thread so that any stale queued compilations can be safely purged without blocking the UI for longer than necessary.

Reviewed By: brecht

Differential Revision: https://developer.blender.org/D16371
2023-01-04 16:00:53 +00:00
Chris Blackbourn
60523ea523 Cleanup: format 2022-11-16 12:59:47 +13:00
Patrick Mours
a859837cde Cleanup: Move OptiX denoiser code from device into denoiser class
Cycles already treats denoising fairly separate in its code, with a
dedicated `Denoiser` base class used to describe denoising
behavior. That class has been fully implemented for OIDN
(`denoiser_oidn.cpp`), but for OptiX was mostly empty
(`denoiser_optix.cpp`) and denoising was instead implemented in
the OptiX device. That meant denoising code was split over various
files and directories, making it a bit awkward to work with. This
patch moves the OptiX denoising implementation into the existing
`OptiXDenoiser` class, so that everything is in one place. There are
no functional changes, code has been mostly moved as-is. To
retain support for potential other denoiser implementations based
on a GPU device in the future, the `DeviceDenoiser` base class was
kept and slightly extended (and its file renamed to
`denoiser_gpu.cpp` to follow similar naming rules as
`path_trace_work_*.cpp`).

Differential Revision: https://developer.blender.org/D16502
2022-11-15 15:50:01 +01:00
Campbell Barton
afc091c3c4 Cleanup: spelling in comments 2022-11-01 12:24:58 +11:00
Michael Jones
8dd7b5b26b Cycles: Metal integrator state size tuning
This patch tunes the integrator state sizing for Metal (`num_concurrent_states` and `num_concurrent_busy_states`).

On all GPUs architecture, we adjust the busy:total states ratio to be 1:4 which gives better rendering performance than the previous 1:16 ratio (independent of total state count). This gives a small performance uplift (e.g. 2-3% on M1 Ultra).

Additionally for M2 architectures, we double the overall state size if there is available headroom. Inclusive of the first change, we can expect uplift of close to 10% in future, as this results in larger dispatch sizes and minimises work submission overheads. In order to make an accurate determination of available headroom, we defer the calculation of `num_concurrent_states` and `num_concurrent_busy_states` until the time of integrator state allocation (i.e. after all of the scene data has been allocated). We also refactor `alloc_integrator_soa` to calculate an *exact* single-state-size in a first pass, right before allocating the integrator SoA buffers in a second pass.

Reviewed By: brecht

Differential Revision: https://developer.blender.org/D16313
2022-10-24 17:14:33 +01:00
Sebastian Herholz
2006c3ed10 Fix T101529: Blender crashes when using Path Guiding 2022-10-18 13:59:12 +02:00
Lukas Stockner
95aac5df73 Fix T101651: Cycles crashes when failing to initialize render device
The issue here was that PathTraceWork was set up before checking if
any error occurred, and it didn't account for the dummy device so
it called a non-implemented function.

This fix therefore avoids creating PathTraceWork for dummy devices
and checks for device creation errors earlier in the process.
2022-10-10 17:55:08 +02:00
Campbell Barton
210f4db81c Cleanup: spelling in comments 2022-10-10 11:22:41 +11:00
Campbell Barton
6d1d1bf2b1 Cleanup: spelling in comments
Also add missing task ID.
2022-09-28 09:41:31 +10:00
Hans Goudey
b145cc9d36 Cleanup: Unused variable warning with path guiding turned off 2022-09-27 15:00:37 -05:00
Sebastian Herhoz
75a6d3abf7 Cycles: add Path Guiding on CPU through Intel OpenPGL
This adds path guiding features into Cycles by integrating Intel's Open Path
Guiding Library. It can be enabled in the Sampling > Path Guiding panel in the
render properties.

This feature helps reduce noise in scenes where finding a path to light is
difficult for regular path tracing.

The current implementation supports guiding directional sampling decisions on
surfaces, when the material contains a least one diffuse component, and in
volumes with isotropic and anisotropic Henyey-Greenstein phase functions.

On surfaces, the guided sampling decision is proportional to the product of
the incident radiance and the normal-oriented cosine lobe and in volumes it
is proportional to the product of the incident radiance and the phase function.

The incident radiance field of a scene is learned and updated during rendering
after each per-frame rendering iteration/progression.

At the moment, path guiding is only supported by the CPU backend. Support for
GPU backends will be added in future versions of OpenPGL.

Ref T92571

Differential Revision: https://developer.blender.org/D15286
2022-09-27 15:56:32 +02:00
Brecht Van Lommel
3a605b23d0 Fix T100708: Cycles bake of diffuse/glossy color not outputting alpha 2022-08-31 20:51:50 +02:00
Brecht Van Lommel
6a4f4810f3 Fix T100246: Cycles GPU render error when adding AO node during viewport render 2022-08-18 20:04:22 +02:00
Patrick Mours
515a15f200 Fix syntax error introduced in previous commit 2022-08-12 16:13:09 +02:00
Patrick Mours
79787bf8e1 Cycles: Improve denoiser update performance when rendering with multiple GPUs
This patch causes the render buffers to be copied to the denoiser
device only once before denoising and output/display is then fed
from that single buffer on the denoiser device. That way usually all
but one copy (from all the render devices to the denoiser device)
can be eliminated, provided that the denoiser device is also the
display device (in which case interop is used to update the display).
As such this patch also adds some logic that tries to ensure the
chosen denoiser device is the same as the display device.

Differential Revision: https://developer.blender.org/D15657
2022-08-12 16:00:54 +02:00
Brecht Van Lommel
523bbf7065 Cycles: generalize shader sorting / locality heuristic to all GPU devices
This was added for Metal, but also gives good results with CUDA and OptiX.
Also enable it for future Apple GPUs instead of only M1 and M2, since this has
been shown to help across multiple GPUs so the better bet seems to enable
rather than disable it.

Also moves some of the logic outside of the Metal device code, and always
enables the code in the kernel since other devices don't do dynamic compile.

Time per sample with OptiX + RTX A6000:
                                         new                  old
barbershop_interior                      0.0730s              0.0727s
bmw27                                    0.0047s              0.0053s
classroom                                0.0428s              0.0464s
fishy_cat                                0.0102s              0.0108s
junkshop                                 0.0366s              0.0395s
koro                                     0.0567s              0.0578s
monster                                  0.0206s              0.0223s
pabellon                                 0.0158s              0.0174s
sponza                                   0.0088s              0.0100s
spring                                   0.1267s              0.1280s
victor                                   0.0524s              0.0531s
wdas_cloud                               0.0817s              0.0816s

Ref D15331, T87836
2022-07-15 13:42:47 +02:00
Michael Jones
4b1d315017 Cycles: Improve cache usage on Apple GPUs by chunking active indices
This patch partitions the active indices into chunks prior to sorting by material in order to tradeoff some material coherence for better locality. On Apple Silicon GPUs (particularly higher end M1-family GPUs), we observe overall render time speedups of up to 15%. The partitioning is implemented by repeating the range of `shader_sort_key` for each partition, and encoding a "locator" key which distributes the indices into sorted chunks.

Reviewed By: brecht

Differential Revision: https://developer.blender.org/D15331
2022-07-14 14:26:18 +01:00
Xavier Hallade
a02992f131 Cycles: Add support for rendering on Intel GPUs using oneAPI
This patch adds a new Cycles device with similar functionality to the
existing GPU devices.  Kernel compilation and runtime interaction happen
via oneAPI DPC++ compiler and SYCL API.

This implementation is primarly focusing on Intel® Arc™ GPUs and other
future Intel GPUs.  The first supported drivers are 101.1660 on Windows
and 22.10.22597 on Linux.

The necessary tools for compilation are:
- A SYCL compiler such as oneAPI DPC++ compiler or
  https://github.com/intel/llvm
- Intel® oneAPI Level Zero which is used for low level device queries:
  https://github.com/oneapi-src/level-zero
- To optionally generate prebuilt graphics binaries: Intel® Graphics
  Compiler All are included in Linux precompiled libraries on svn:
  https://svn.blender.org/svnroot/bf-blender/trunk/lib The same goes for
  Windows precompiled binaries but for the graphics compiler, available
  as "Intel® Graphics Offline Compiler for OpenCL™ Code" from
  https://www.intel.com/content/www/us/en/developer/articles/tool/oneapi-standalone-components.html,
  for which path can be set as OCLOC_INSTALL_DIR.

Being based on the open SYCL standard, this implementation could also be
extended to run on other compatible non-Intel hardware in the future.

Reviewed By: sergey, brecht

Differential Revision: https://developer.blender.org/D15254

Co-authored-by: Nikita Sirgienko <nikita.sirgienko@intel.com>
Co-authored-by: Stefan Werner <stefan.werner@intel.com>
2022-06-29 12:58:04 +02:00
Brecht Van Lommel
ff1883307f Cleanup: renaming and consistency for kernel data
* Rename "texture" to "data array". This has not used textures for a long time,
  there are just global memory arrays now. (On old CUDA GPUs there was a cache
  for textures but not global memory, so we used to put all data in textures.)
* For CUDA and HIP, put globals in KernelParams struct like other devices.
* Drop __ prefix for data array names, no possibility for naming conflict now that
  these are in a struct.
2022-06-20 12:30:48 +02:00
Brecht Van Lommel
2c1bffa286 Cleanup: add verbose logging category names instead of numbers
And use them more consistently than before.
2022-06-17 14:08:14 +02:00
Brecht Van Lommel
f2cd7e08fe Fix Cycles MNEE not working for Metal
Move MNEE to own kernel, separate from shader ray-tracing. This does introduce
the limitation that a shader can't use both MNEE and AO/bevel, but that seems
like the better trade-off for now.

We can experiment with bigger kernel organization changes later.

Differential Revision: https://developer.blender.org/D15070
2022-05-31 17:24:43 +02:00