73 Commits

Author SHA1 Message Date
Weizhen Huang
2b0a1cae06 Cycles: Add an option to use ray marching for volume rendering
Null Scattering currently has performance and noise issues, and it will
take time to address them. For now add the previous Ray Marching back as
an option.

Co-authored-by: Brecht Van Lommel <brecht@blender.org>
Pull Request: https://projects.blender.org/blender/blender/pulls/146317
2025-09-26 12:14:45 +02:00
Brecht Van Lommel
2615cecf10 Refactor: Cycles: Align log levels with CLOG
WORK -> DEBUG
DEBUG, STATS -> TRACE

Pull Request: https://projects.blender.org/blender/blender/pulls/144490
2025-08-18 20:22:44 +02:00
Weizhen Huang
a4f8e0bfa2 Cycles: Use RGBE for denoised guiding buffers to reduce memory usage
Co-authored-by: Brecht Van Lommel <brecht@blender.org>
2025-08-13 10:28:50 +02:00
Weizhen Huang
5cb6014efd Cycles: Volume Scattering Probability Guiding
Guide the probability to scatter in or transmit through the volume.
Only applied for primary rays.

Co-authored-by: Brecht Van Lommel <brecht@blender.org>
2025-08-13 10:28:50 +02:00
Hugh Delaney
930a942dd0 Refactor: Cycles: Move block sizes into common header
This change puts all the block size macros in the same common header, so
they can be included in host side code without needing to also include
the kernels that are defined in the device headers that contained these
values.

This change also removes a magic number used to enqueue a kernel, which
happened to agree with the GPU_PARALLEL_SORT_BLOCK_SIZE macro.

Pull Request: https://projects.blender.org/blender/blender/pulls/143646
2025-08-01 13:26:02 +02:00
Brecht Van Lommel
73fe848e07 Fix: Cycles log levels conflict with macros on some platforms
In particular DEBUG, but prefix all of them to be sure.

Pull Request: https://projects.blender.org/blender/blender/pulls/141749
2025-07-10 19:44:14 +02:00
Brecht Van Lommel
fb4e3c8167 Refactor: Cycles: Remove distinction between severity and verbosity
Only use LOG() and LOG_IS_ON() macros, no more VLOG_.

Pull Request: https://projects.blender.org/blender/blender/pulls/140244
2025-07-09 20:59:24 +02:00
Brecht Van Lommel
cf7f276d49 Refactor: Cycles: Tweak logging to prepare for dropping glog
* Implement own simple ScopedMockLog
* Always use names instead of numbers
* Avoid logging in header files

Pull Request: https://projects.blender.org/blender/blender/pulls/140244
2025-07-09 20:59:24 +02:00
Xavier Hallade
2df163a648 Fix: Cycles low performance with scenes with many shaders on Arc B570
The performance of the sorted_paths_array kernel on B570 is problematic.
Relying on local sorting+partitioning instead gives a 25% overall rendering
speedup and no regression in shade_surface when rendering Agent 327 Barbershop scene.
On Arc A770, it still gives a 2% speedup when rendering Barbershop.

Pull Request: https://projects.blender.org/blender/blender/pulls/140308
2025-06-18 08:21:19 +02:00
Brecht Van Lommel
afad355060 Fix: Properly free Vulkan interop handle for Cycles
Unlike OpenGL and Metal, this handle is not shared, but rather Cycles
has to take ownership of it. This required a fair amount of refactoring
to ensure the handle is closed, ownership is properly transferred, and
the handle is recreated once when the pixel buffer is modified.
2025-05-26 10:59:49 +02:00
Brecht Van Lommel
4d7bd22beb Refactor: Cycles: Graphics interop changes
* Add GraphicsInteropDevice to check if interop is possible with device
* Rename GraphcisInterop to GraphicsInteropBuffer
* Include display device type and memory size in GraphicsInteropBuffer
* Unnest graphics interop class to make forward declarations possible

Pull Request: https://projects.blender.org/blender/blender/pulls/137363
2025-04-28 11:38:56 +02:00
Brecht Van Lommel
96eef81d90 Fix: Cycles assert cancelling render during kernel compilation 2025-01-13 10:07:37 +01:00
Brecht Van Lommel
9971648783 Refactor: Cycles: Replace new/delete by unique_ptr, in simple cases
Pull Request: https://projects.blender.org/blender/blender/pulls/132361
2025-01-03 10:23:30 +01:00
Brecht Van Lommel
57ff24cb99 Refactor: Cycles: Add const keyword to more function parameters
Pull Request: https://projects.blender.org/blender/blender/pulls/132361
2025-01-03 10:23:24 +01:00
Brecht Van Lommel
dd51c8660b Refactor: Cycles: Add const keyword where possible, using clang-tidy
Check was misc-const-correctness, combined with readability-isolate-declaration
as suggested by the docs.

Temporarily clang-format "QualifierAlignment: Left" was used to get consistency
with the prevailing order of keywords.

Pull Request: https://projects.blender.org/blender/blender/pulls/132361
2025-01-03 10:23:20 +01:00
Brecht Van Lommel
d0c2e68e5f Refactor: Cycles: Automated clang-tidy fixups in Cycles
* Use .empty() and .data()
* Use nullptr instead of 0
* No else after return
* Simple class member initialization
* Add override for virtual methods
* Include C++ instead of C headers
* Remove some unused includes
* Use default constructors
* Always use braces
* Consistent names in definition and declaration
* Change typedef to using

Pull Request: https://projects.blender.org/blender/blender/pulls/132361
2025-01-03 10:22:55 +01:00
Sergey Sharybin
d36c2e0fdc Merge branch 'blender-v4.3-release' 2024-10-31 14:48:40 +01:00
Sergey Sharybin
e5a4beb518 Fix #129476: Dual GPU - Bake to Color Attribute Crashes Blender
Various changes to avoid division by zero.

Also avoid invokaiton of kernels with zero work size: CUDA reports an error
in such cases.

Pull Request: https://projects.blender.org/blender/blender/pulls/129633
2024-10-31 14:47:59 +01:00
William Leeson
9ebdd49f39 Fix: Only compact if index is a ratio of the number of paths
Currently the number of shadow paths is multiplied by the ratio of
0.5f which would half the number of paths. However, the index can
never be smaller than the number of paths so the shadow paths will
always be compacted.

Pull Request: https://projects.blender.org/blender/blender/pulls/125048
2024-10-29 14:50:16 +01:00
Michael Jones
5be30b7d2b Cycles: "Struct-of-array-of-packed-structs" for parts of the integrator state
On a M3 MacBook Pro, this change increases the benchmark score by 8% (with classroom seeing a path-tracing speedup of 15%).

The integrator state is currently store using struct-of-arrays, with one array per field. Such fine grained separation can result in poor GPU cache utilisation in cases where multiple fields of the same parent struct are accessed together. This PR changes the layout of the `ray`, `isect`, `subsurface`, and `shadow_ray` structs so that the data is interleaved (per parent struct) instead of separate. To try and keep this change localised, I encapsulated the layout change by extending the integrator state access macros, however maybe we want to do this more explicitly? (e.g. by updating every bit of code that accesses these parts of the state). Feedback welcome.

Pull Request: https://projects.blender.org/blender/blender/pulls/122015
2024-06-04 14:53:30 +02:00
Michael Jones
e82d69daa1 Cycles: Disambiguate shadow integrator state buffer names
This patch adds a "shadow" prefix & array index suffixes to the shadow integrator state buffer names. This eliminates confusion when looking at GPU traces etc.

Pull Request: https://projects.blender.org/blender/blender/pulls/121745
2024-05-15 23:19:24 +02:00
Brecht Van Lommel
d377ef2543 Clang Format: bump to version 17
Along with the 4.1 libraries upgrade, we are bumping the clang-format
version from 8-12 to 17. This affects quite a few files.

If not already the case, you may consider pointing your IDE to the
clang-format binary bundled with the Blender precompiled libraries.
2024-01-03 13:38:14 +01:00
Brecht Van Lommel
d015e98ee6 Fix Cycles ASAN error with boolean kernel arguments 2023-12-12 13:27:36 +01:00
Brecht Van Lommel
49c3dc9d7f Fix 114336: Cycles crash switching render pass in the viewport 2023-11-02 17:24:54 +01:00
Campbell Barton
c12994612b License headers: use SPDX-FileCopyrightText in intern/cycles 2023-06-14 16:53:23 +10:00
Sergey Sharybin
ba3f26fac5 Cycles: light and shadow linking
With light linking, lights can be set to affect only specific objects in the
scene. Shadow linking additionally gives control over which objects acts a
shadow blockers for a light.

Usage:
https://wiki.blender.org/wiki/Reference/Release_Notes/4.0/Cycles

Implementation:
https://wiki.blender.org/wiki/Source/Render/Cycles/LightLinking

Ref #104972
Co-authored-by: Brecht Van Lommel <brecht@blender.org>
2023-05-24 14:11:47 +02:00
Nikita Sirgienko
04fc6fd8a7 Cycles: avoid doing zero-sized allocations with partitioned shader sorting 2023-05-17 11:07:56 +02:00
Campbell Barton
6859bb6e67 Cleanup: format (with BraceWrapping::AfterControlStatement "MultiLine") 2023-05-02 09:37:49 +10:00
Sergey Sharybin
daaed83a32 Fix set but unused variables in Cycles 2023-04-19 10:02:09 +02:00
Xavier Hallade
9821a2d397 Cycles: pass kernel features to get_bvh_layout_mask
This allows to selectively disable Hardware Raytracing in oneAPI
backend, depending on features used.
2023-04-18 22:09:42 +02:00
Michael Jones
654e1e901b Cycles: Use local atomics for faster shader sorting (enabled on Metal)
This patch adds two new kernels: SORT_BUCKET_PASS and SORT_WRITE_PASS. These replace PREFIX_SUM and SORTED_PATHS_ARRAY on supported devices (currently implemented on Metal, but will be trivial to enable on the other backends). The new kernels exploit sort partitioning (see D15331) by sorting each partition separately using local atomics. This can give an overall render speedup of 2-3% depending on architecture. As before, we fall back to the original non-partitioned sorting when the shader count is "too high".

Reviewed By: brecht

Differential Revision: https://developer.blender.org/D16909
2023-02-06 11:18:26 +00:00
Campbell Barton
afc091c3c4 Cleanup: spelling in comments 2022-11-01 12:24:58 +11:00
Michael Jones
8dd7b5b26b Cycles: Metal integrator state size tuning
This patch tunes the integrator state sizing for Metal (`num_concurrent_states` and `num_concurrent_busy_states`).

On all GPUs architecture, we adjust the busy:total states ratio to be 1:4 which gives better rendering performance than the previous 1:16 ratio (independent of total state count). This gives a small performance uplift (e.g. 2-3% on M1 Ultra).

Additionally for M2 architectures, we double the overall state size if there is available headroom. Inclusive of the first change, we can expect uplift of close to 10% in future, as this results in larger dispatch sizes and minimises work submission overheads. In order to make an accurate determination of available headroom, we defer the calculation of `num_concurrent_states` and `num_concurrent_busy_states` until the time of integrator state allocation (i.e. after all of the scene data has been allocated). We also refactor `alloc_integrator_soa` to calculate an *exact* single-state-size in a first pass, right before allocating the integrator SoA buffers in a second pass.

Reviewed By: brecht

Differential Revision: https://developer.blender.org/D16313
2022-10-24 17:14:33 +01:00
Brecht Van Lommel
6a4f4810f3 Fix T100246: Cycles GPU render error when adding AO node during viewport render 2022-08-18 20:04:22 +02:00
Brecht Van Lommel
523bbf7065 Cycles: generalize shader sorting / locality heuristic to all GPU devices
This was added for Metal, but also gives good results with CUDA and OptiX.
Also enable it for future Apple GPUs instead of only M1 and M2, since this has
been shown to help across multiple GPUs so the better bet seems to enable
rather than disable it.

Also moves some of the logic outside of the Metal device code, and always
enables the code in the kernel since other devices don't do dynamic compile.

Time per sample with OptiX + RTX A6000:
                                         new                  old
barbershop_interior                      0.0730s              0.0727s
bmw27                                    0.0047s              0.0053s
classroom                                0.0428s              0.0464s
fishy_cat                                0.0102s              0.0108s
junkshop                                 0.0366s              0.0395s
koro                                     0.0567s              0.0578s
monster                                  0.0206s              0.0223s
pabellon                                 0.0158s              0.0174s
sponza                                   0.0088s              0.0100s
spring                                   0.1267s              0.1280s
victor                                   0.0524s              0.0531s
wdas_cloud                               0.0817s              0.0816s

Ref D15331, T87836
2022-07-15 13:42:47 +02:00
Michael Jones
4b1d315017 Cycles: Improve cache usage on Apple GPUs by chunking active indices
This patch partitions the active indices into chunks prior to sorting by material in order to tradeoff some material coherence for better locality. On Apple Silicon GPUs (particularly higher end M1-family GPUs), we observe overall render time speedups of up to 15%. The partitioning is implemented by repeating the range of `shader_sort_key` for each partition, and encoding a "locator" key which distributes the indices into sorted chunks.

Reviewed By: brecht

Differential Revision: https://developer.blender.org/D15331
2022-07-14 14:26:18 +01:00
Brecht Van Lommel
ff1883307f Cleanup: renaming and consistency for kernel data
* Rename "texture" to "data array". This has not used textures for a long time,
  there are just global memory arrays now. (On old CUDA GPUs there was a cache
  for textures but not global memory, so we used to put all data in textures.)
* For CUDA and HIP, put globals in KernelParams struct like other devices.
* Drop __ prefix for data array names, no possibility for naming conflict now that
  these are in a struct.
2022-06-20 12:30:48 +02:00
Brecht Van Lommel
2c1bffa286 Cleanup: add verbose logging category names instead of numbers
And use them more consistently than before.
2022-06-17 14:08:14 +02:00
Brecht Van Lommel
f2cd7e08fe Fix Cycles MNEE not working for Metal
Move MNEE to own kernel, separate from shader ray-tracing. This does introduce
the limitation that a shader can't use both MNEE and AO/bevel, but that seems
like the better trade-off for now.

We can experiment with bigger kernel organization changes later.

Differential Revision: https://developer.blender.org/D15070
2022-05-31 17:24:43 +02:00
Brecht Van Lommel
9cfc7967dd Cycles: use SPDX license headers
* Replace license text in headers with SPDX identifiers.
* Remove specific license info from outdated readme.txt, instead leave details
  to the source files.
* Add list of SPDX license identifiers used, and corresponding license texts.
* Update copyright dates while we're at it.

Ref D14069, T95597
2022-02-11 17:47:34 +01:00
Brecht Van Lommel
ae28d90578 Fix T93350: Cycles renders shows black during rendering huge resolutions
The root of the issue is caused by Cycles ignoring OpenGL limitation on
the maximum resolution of textures: Cycles was allocating texture of the
final render resolution. It was exceeding limitation on certain GPUs and
driver.

The idea is simple: use multiple textures for the display, each of which
will fit into OpenGL limitations.

There is some code which allows the display driver to know when to start
the new tile. Also added some code to allow force graphics interop to be
re-created. The latter one ended up not used in the final version of the
patch, but it might be helpful for other drivers implementation.

The tile size is limited to 8K now as it is the safest size for textures
on many GPUs and OpenGL drivers.

This is an updated fix with a workaround for freezing with the NVIDIA
driver on Linux.

Differential Revision: https://developer.blender.org/D13385
2022-01-07 17:20:04 +01:00
Brecht Van Lommel
204ae33d75 Revert "Fix T93350: Cycles renders shows black during rendering huge resolutions"
This reverts commit 5e37f70307.

It is leading to freezing of the entire desktop for a few seconds when stopping
3D viewport rendering on my Linux / NVIDIA system.
2021-12-07 20:49:34 +01:00
Sergey Sharybin
5e37f70307 Fix T93350: Cycles renders shows black during rendering huge resolutions
The root of the issue is caused by Cycles ignoring OpenGL limitation on
the maximum resolution of textures: Cycles was allocating texture of the
final render resolution. It was exceeding limitation on certain GPUs and
driver.

The idea is simple: use multiple textures for the display, each of which
will fit into OpenGL limitations.

There is some code which allows the display driver to know when to start
the new tile. Also added some code to allow force graphics interop to be
re-created. The latter one ended up not used in the final version of the
patch, but it might be helpful for other drivers implementation.

The tile size is limited to 8K now as it is the safest size for textures
on many GPUs and OpenGL drivers.

Differential Revision: https://developer.blender.org/D13385
2021-12-07 19:01:42 +01:00
Brecht Van Lommel
2e6f914e37 Fix debug build error after recent Cycles kernel argument changes 2021-11-29 18:41:37 +01:00
Michael Jones
98a5c924fc Cycles: Metal readiness: Specify DeviceQueue::enqueue arg types
This patch adds new arg-type parameters to `DeviceQueue::enqueue` and its overrides. This is in preparation for the Metal backend which needs this information for correct argument encoding.

Ref T92212

Reviewed By: brecht

Maniphest Tasks: T92212

Differential Revision: https://developer.blender.org/D13357
2021-11-29 14:56:06 +00:00
William Leeson
c49d2cbe92 Merge branch 'blender-v3.0-release' to bring in D13042:
Fix performance decrease with Scrambling Distance on
2021-11-25 09:41:03 +01:00
Alaska
b41c72b710 Fix performance decrease with Scrambling Distance on
With the current code in master, scrambling distance is enabled on non-hardware accelerated ray tracing devices see a measurable performance decrease when compared scrambling distance on vs off. From testing, this performance decrease comes from the large tile sizes scheduled in `tile.cpp`.

This patch attempts to address the performance decrease by using different algorithms to calculate the tile size for devices with hardware accelerated ray traversal and devices without. Large tile sizes for hardware accelerated devices and small tile sizes for others.

Most of this code is based on proposals from @brecht and @leesonw

Reviewed By: brecht, leesonw

Differential Revision: https://developer.blender.org/D13042
2021-11-25 09:32:26 +01:00
Sergey Sharybin
ce395c84a3 Merge branch 'blender-v3.0-release' 2021-11-11 15:29:35 +01:00
Sergey Sharybin
d26d3cfe19 Fix T92868: Cycles catcher with transparency crashes
The issue was caused by splitting happening twice.

Fixed by checking for split flag which is assigned to the both states
during split.

The tricky part was to write catcher data at the moment of split: the
transparency and shadow catcher sample count is to be accumulated at
that point. Now it is happening in the `intersect_closest` kernel.
The downside is that render buffer is to be passed to the kernel, but
the benefit is that extra split bounce check is not needed now.

Had to move the passes write to shadow catcher header, since include
of `film/passes.h` causes all the fun of requirement to have BSDF
data structures available.

Differential Revision: https://developer.blender.org/D13177
2021-11-11 15:21:35 +01:00
Andrii
c63e735f6b Cycles: Add sample offset option
This patch exposes the sampling offset option to Blender. It is located in the "Sampling > Advanced" panel.
For example, this can be useful to parallelize rendering and distribute different chunks of samples for each computer to render.

---

I also had to add this option to `RenderWork` and `RenderScheduler` classes so that the sample count in the status string can be calculated correctly.

Reviewed By: leesonw

Differential Revision: https://developer.blender.org/D13086
2021-11-11 09:39:25 +01:00