Commit Graph

41 Commits

Author SHA1 Message Date
Brecht Van Lommel
4d7bd22beb Refactor: Cycles: Graphics interop changes
* Add GraphicsInteropDevice to check if interop is possible with device
* Rename GraphcisInterop to GraphicsInteropBuffer
* Include display device type and memory size in GraphicsInteropBuffer
* Unnest graphics interop class to make forward declarations possible

Pull Request: https://projects.blender.org/blender/blender/pulls/137363
2025-04-28 11:38:56 +02:00
Michael Jones
326d5bca03 Cycles: Support Decomposed MetalRT motion interpolation
Currently MetalRT interpolates transformation matrix on per-element basis
which leads to issues like #135659.

This change adds implementation of for decomposed (Scale/Rotate/Translate)
motion interpolation, matching behavior of BVH2 and other HW-RT.

This requires macOS 15 and Xcode 16 in order to use this interpolation.
On older platforms and compilers old interpolation is used.

Currently there is no changes on the user (by default) and it is only
available via CYCLES_METALRT_PCMI environment variable. This is because
there are some issues with complex motion paths that need to be looked
into. Having code available makes it easier to do further debugging.

Ref #135659

Authored by Emma Liu

Pull Request: https://projects.blender.org/blender/blender/pulls/136253
2025-04-03 16:24:04 +02:00
Michael Jones
c23c4ae6ba Cycles: Fix issue affecting Metal kernel profiling (normally disabled)
This issue only affects profiling mode (`CYCLES_METAL_PROFILING=1`). There's a modest limit to the number of concurrent counter sampling buffers per device, so instead of creating one per device queue, we create one per device that can be reused by successive device queues.

Authored by Emma Liu.

Pull Request: https://projects.blender.org/blender/blender/pulls/136248
2025-03-21 12:47:15 +01:00
Michael Jones
584f19a5af Cycles: Apple Silicon tidy: Remove non-UMA codepaths (v2)
This PR removes a bunch of dead code following #123551 (removal of AMD and Intel GPU support). It is safe to assume that UMA will be available, so a lot of codepaths that dealt with copying between CPU and GPU are now just clutter.

Pull Request: https://projects.blender.org/blender/blender/pulls/136146
2025-03-19 12:53:01 +01:00
Brecht Van Lommel
ab3204e251 Revert "Cycles: Apple Silicon tidy: Remove non-UMA codepaths"
This reverts commit 1a93dfe4fc.

This is hitting asserts in the tests, revert until it's fixed.

Ref #136117
2025-03-18 20:37:23 +01:00
Michael Jones
1a93dfe4fc Cycles: Apple Silicon tidy: Remove non-UMA codepaths
This PR removes a bunch of dead code following #123551 (removal of AMD and Intel GPU support). It is safe to assume that UMA will be available, so a lot of codepaths that dealt with copying between CPU and GPU are now just clutter.

Pull Request: https://projects.blender.org/blender/blender/pulls/136117
2025-03-18 19:09:25 +01:00
Brecht Van Lommel
2cfe2e0bfe Fix: Cycles: Re-copy memory from host to device without realloc
Should be a bit more efficient, and it fixes host memory fallback bugs,
where host memory was incorrectly freed during re-copy. For the case
where memory should get reallocated on the host, a new mem_move_to_host
was added.

Thanks to Jorn Visser for investigating and finding this problem.

Pull Request: https://projects.blender.org/blender/blender/pulls/132912
2025-01-29 14:11:50 +01:00
Brecht Van Lommel
57ff24cb99 Refactor: Cycles: Add const keyword to more function parameters
Pull Request: https://projects.blender.org/blender/blender/pulls/132361
2025-01-03 10:23:24 +01:00
Brecht Van Lommel
d0c2e68e5f Refactor: Cycles: Automated clang-tidy fixups in Cycles
* Use .empty() and .data()
* Use nullptr instead of 0
* No else after return
* Simple class member initialization
* Add override for virtual methods
* Include C++ instead of C headers
* Remove some unused includes
* Use default constructors
* Always use braces
* Consistent names in definition and declaration
* Change typedef to using

Pull Request: https://projects.blender.org/blender/blender/pulls/132361
2025-01-03 10:22:55 +01:00
Sergey Sharybin
b96a7b7204 Fix #127622: 4.1 splash screen won't render with MetalRT
The commit which made the issue to be more easily discoverable is 4651f8a08f.

The fix is similar to #127114.

Pull Request: https://projects.blender.org/blender/blender/pulls/128173
2024-09-26 13:39:22 +02:00
Weizhen Huang
4a4270d73c Merge branch 'blender-v4.2-release' 2024-07-08 16:19:41 +02:00
Michael Jones
5a29be3c75 Cycles: Fix #116243, #122022 - MetalRT live viewport stability issues
This PR fixes live viewport stability issues on Mac when MetalRT is enabled.

There were two sources of instability:

1) `MTLAccelerationStructure` instances were not being correctly retained meaning that use-after-free crashes could occur following a geometry sync.
2) `MTLIntersectionFunctionTable` objects could be unsafely shared between multiple `MetalDeviceQueue` instances (in this case, `setBuffer` being the unsafe mutation)

The solution to 2 involves creating a new `MetalDispatchPipeline` type which is strictly used by only 1 `MetalDeviceQueue` instance.

Pull Request: https://projects.blender.org/blender/blender/pulls/124055
2024-07-08 16:18:34 +02:00
Alaska
c8340cf754 Cycles: Remove AMD and Intel GPU support from Metal backend
This is because with the addition of new features to Cycles, these GPUs
experienced significant performance regressions and bugs, all stemming
from bugs in the Metal GPU driver/compiler. The only reasonable way to
work around these issues was to disable parts of Cycles code on
these GPUs to avoid the driver/compiler bugs.

This resulted in increased development time maintaining these platforms
while being unable to deliver feature parity with other
GPU backends.

It has been decided that this development time is better spent
maintaining platforms that are still actively maintained by
hardware/software vendors, and so AMD and Intel GPU support will be
removed from the Metal backend for Cycles.

Pull Request: https://projects.blender.org/blender/blender/pulls/123551
2024-06-26 17:16:20 +02:00
Sergey Sharybin
b803d7fabb Fix: Command line Cycles render crash on multi-CUDA device
Since #118841 there are more cases where Cycles would check for the
graphics interop support. This could lead to a crash when graphics
interop functions are called without having active graphics context.

This change makes it so there is no graphics interop calls when doing
headless render. In order to achieve this the device creation is now
aware of the headless mode.

Pull Request: https://projects.blender.org/blender/blender/pulls/122844
2024-06-07 17:53:44 +02:00
Brecht Van Lommel
f57e4c5b98 Fix #119551: Cycles denoising crash canceling tiled render with MetalRT
The BVH has been freed at this point, but the Metal queue sets it on
every invocation. Make sure it's null so it doesn't get used anymore.

Pull Request: https://projects.blender.org/blender/blender/pulls/119581
2024-03-18 11:00:21 +01:00
Stefan Werner
31d55e87f9 Cycles: Metal support for OpenImageDenoise
This is supported on Apple Silicon GPUs and macOS 13.0+.

Co-authored-by: Stefan Werner <stefan.werner@intel.com>
Co-authored-by: Attila Afra <attila.t.afra@intel.com>
Pull Request: https://projects.blender.org/blender/blender/pulls/116124
2024-02-06 21:13:23 +01:00
Brecht Van Lommel
25e74f0115 Fix (harmless) uninitialized variable usage in Cycles Metal device 2023-12-11 14:46:19 +01:00
Michael Jones
af9ffee152 Cycles: Metal: Fix occasional anim corruption (KernelData MD5 refresh bug)
This fixes an issue where animation frames occasionally get corrupted (e.g. when rendering "Pokedstudio" Blender 2.77 splash screen). This happens when the KernelData is refreshed but the MD5 isn't immediately regenerated which can cause the wrong PSO to be selected.

Pull Request: https://projects.blender.org/blender/blender/pulls/114153
2023-10-25 17:47:13 +02:00
Michael Jones
4e3ee4f026 Cycles: Fix animation hangs/crashes in Metal due to leaking temp objects
This PR adds `@autoreleasepool` blocks around functions that have been observed to create hidden temporary NSObjects, and eventually cause command buffer failures. A couple of allocations needed to be tweaked in order to maintain correct retain/release behaviour. This PR also fixes the command buffer error text to show more useful information.
2023-10-24 23:20:16 +01:00
Michael Jones
b8833a7f8c Cycles: Disable NanoVDB if not needed when specialising Metal PSOs
This patch adds a check to see whether we're actually using NanoVDB textures, and if not, removes `#define WITH_NANOVDB` when generating the scene-optimised kernels. This results in marginally faster render times (maybe 2 or 3%) for scenes that do not use NanoVDB. The generic kernels are unaffected, so this will not impact responsiveness on first render.

Pull Request: https://projects.blender.org/blender/blender/pulls/112822
2023-09-25 14:56:58 +02:00
Michael Jones
6c98cb73ac Cycles: Use new MetalRT curve primitives for 3D curves and ribbons
This patch updates the experimental MetalRT code path to use new [curve primitives](https://developer.apple.com/videos/play/wwdc2023/10128/) which were recently added in macOS 14. This replaces the previous custom box intersection implementation, allowing the driver to better optimise curve acceleration structures for the GPU. On existing hardware, this can speed up MetalRT renders by up to 40% for scenes that use hair / curve primitives extensively.

The MetalRT option will only be available on macOS >= 14, and requires Xcode >= 15 to build (otherwise the option will be compiled out).

Authored by Marco Giordano, Michael Jones, and Jason Fielder

---
Before / after render times (M1 Max MacBook Pro, macOS 14 beta, MetalRT enabled):
```
                  Custom box intersection      MetalRT curve primitives       Speedup
fishy_cat           111.5                         80.5                         1.39
koro                114.4                         86.7                         1.32
sinosauropteryx     291.8                        279.2                         1.05
spring              142.3                        142.2                         1.00
victor              442.7                        347.7                         1.27
```

---

Pull Request: https://projects.blender.org/blender/blender/pulls/111795
2023-09-13 16:02:49 +02:00
Campbell Barton
c12994612b License headers: use SPDX-FileCopyrightText in intern/cycles 2023-06-14 16:53:23 +10:00
Michael Jones
25fe9d74c9 Cycles: Fix hang when MSL->AIR compilation fails
Follow up to [#100066](https://projects.blender.org/blender/blender/pulls/105122/files). Scenes that do pre-render work on the GPU (e.g. the bake unit tests) will still hang if there are MSL compile failures. This patch adds a loop break out in case of an error.

Pull Request: https://projects.blender.org/blender/blender/pulls/107657
2023-05-05 18:52:54 +02:00
Xavier Hallade
9821a2d397 Cycles: pass kernel features to get_bvh_layout_mask
This allows to selectively disable Hardware Raytracing in oneAPI
backend, depending on features used.
2023-04-18 22:09:42 +02:00
Chris Blackbourn
86ceb6722f Cleanup: format 2023-02-26 11:55:22 +13:00
Sybren A. Stüvel
c8ed48d5ca Merge remote-tracking branch 'origin/blender-v3.5-release' 2023-02-23 11:28:02 +01:00
Michael Jones
482fb791ce Fix #105100: Metal using wrong kernels in multi-pass renders
This fixes issue [#105100](https://projects.blender.org/blender/blender/issues/105100) where multi-pass renders can be incorrect due to kernels using stale specialisation constants (e.g. when rendering Pokedstudio).

This patch adds a new group of md5 hashes (`global_defines_md5`) to track whether the injected block of #defines is stale and regenerate the source string as appropriate. It also renames the existing group of md5 hashes from `source_md5` to `kernels_md5` to clarify that these refer to a specific kernel set rather than just the source (which might build an arbitrarily large number of kernel sets).

Pull Request #105103
2023-02-23 11:07:28 +01:00
Brecht Van Lommel
02c2970983 Cycles: add NanoVDB support for Metal on Apple Silicon
Contributed by Yulia Kuznetcova at Apple.

NanoVDB is patched to give add address spaces required by Metal. We hope that
in the future Metal will support the generic address space.

For AMD and Intel this is currently not available since it causes a performance
regression also on scenes without volumes.

Pull Request #104837
2023-02-21 15:03:52 +01:00
Michael Jones
2d994de77c Cycles: MetalRT optimisation for subsurface intersection queries
This patch optimises subsurface intersection queries on MetalRT. Currently intersect_local traverses from the scene root, retrospectively discarding all non-local hits. Using a lookup of bottom level acceleration structures, we can explicitly query only the relevant instance. On M1 Max, with MetalRT selected, this can give a render speedup of 15-20% for scenes like Monster which make heavy use of subsurface scattering.

Patch authored by Marco Giordano.

Reviewed By: brecht

Differential Revision: https://developer.blender.org/D17153
2023-02-06 19:12:29 +00:00
Michael Jones
654e1e901b Cycles: Use local atomics for faster shader sorting (enabled on Metal)
This patch adds two new kernels: SORT_BUCKET_PASS and SORT_WRITE_PASS. These replace PREFIX_SUM and SORTED_PATHS_ARRAY on supported devices (currently implemented on Metal, but will be trivial to enable on the other backends). The new kernels exploit sort partitioning (see D15331) by sorting each partition separately using local atomics. This can give an overall render speedup of 2-3% depending on architecture. As before, we fall back to the original non-partitioned sorting when the shader count is "too high".

Reviewed By: brecht

Differential Revision: https://developer.blender.org/D16909
2023-02-06 11:18:26 +00:00
Michael Jones
77c3e67d3d Cycles: Improved render start/stop responsiveness on Metal
All kernel specialisation is now performed in the background regardless of kernel type, meaning that the first render will be visible a few seconds sooner. The only exception is during benchmark warm up, in which case we wait for all kernels to be cached. When stopping a render, we call a new `cancel()` method on the device which causes any outstanding compilation work to be cancelled, and we destroy the device in a detached thread so that any stale queued compilations can be safely purged without blocking the UI for longer than necessary.

Reviewed By: brecht

Differential Revision: https://developer.blender.org/D16371
2023-01-04 16:00:53 +00:00
Michael Jones
2dc51fccb8 Fix T101787, T102786. Cycles: Improved out-of-memory messaging on Metal
This patch adds a new `max_working_set_exceeded()` check on Metal so that we can display a "System is out of GPU memory" message to the user. Without this, we get obtuse "CommandBuffer failed" errors at render time due to exceeding the size limit of resident resources.

Likely fix for T101787 & T102786.

Reviewed By: brecht

Differential Revision: https://developer.blender.org/D16713
2022-12-07 13:56:21 +00:00
Michael Jones
da4ef05e4d Cycles: Apple Silicon optimization to specialize intersection kernels
The Metal backend now compiles and caches a second set of kernels which are
optimized for scene contents, enabled for Apple Silicon.

The implementation supports doing this both for intersection and shading
kernels. However this is currently only enabled for intersection kernels that
are quick to compile, and already give a good speedup. Enabling this for
shading kernels would be faster still, however this also causes a long wait
times and would need a good user interface to control this.

M1 Max samples per minute (macOS 13.0):

                    PSO_GENERIC  PSO_SPECIALIZED_INTERSECT  PSO_SPECIALIZED_SHADE

barbershop_interior       83.4	            89.5                   93.7
bmw27                   1486.1	          1671.0                 1825.8
classroom                175.2	           196.8                  206.3
fishy_cat                674.2	           704.3                  719.3
junkshop                 205.4	           212.0                  257.7
koro                     310.1	           336.1                  342.8
monster                  376.7	           418.6                  424.1
pabellon                 273.5	           325.4                  339.8
sponza                   830.6	           929.6                 1142.4
victor                    86.7              96.4                   96.3
wdas_cloud               111.8	           112.7                  183.1

Code contributed by Jason Fielder, Morteza Mostajabodaveh and Michael Jones

Differential Revision: https://developer.blender.org/D14645
2022-07-15 13:40:04 +02:00
Michael Jones
328a911379 Cycles: Distinguish Apple GPUs by core count
This patch suffixes Apple GPU device names with `(GPU - # cores)` so that variant GPUs with the same chipset can be distinguished. Currently benchmark scores for these M1 family GPUs are being incorrectly merged:

- M1: 7 or 8 cores
- M1 Pro: 14 or 16 cores
- M1 Max: 24 or 32 cores
- M1 Ultra: 48 or 64 cores

Reviewed By: brecht, sergey

Differential Revision: https://developer.blender.org/D15257
2022-06-22 22:32:56 +01:00
Michael Jones
4412e14708 Cycles: Useful Metal backend debug & profiling functionality
This patch adds some useful debugging & profiling env vars to the Metal backend:

- `CYCLES_METAL_PROFILING`: output a per-kernel timing report at the end of the render
- `CYCLES_METAL_DEBUG`: enable per-dispatch tracing (very verbose)
- `CYCLES_DEBUG_METAL_CAPTURE_KERNEL`: enable programatic .gputrace capture for a specified kernel index

Here's an example of the timing report with `CYCLES_METAL_PROFILING` enabled:

```
---------------------------------------------------------------------------------------------------
Kernel name                                 Total threads   Dispatches     Avg. T/D    Time   Time%
---------------------------------------------------------------------------------------------------
integrator_init_from_camera                   657,407,232          161    4,083,274   0.24s   0.51%
integrator_intersect_closest                1,629,288,440          681    2,392,494  15.18s  32.12%
integrator_intersect_shadow                   751,652,291          470    1,599,260   5.80s  12.28%
integrator_shade_background                   304,612,074          263    1,158,220   1.16s   2.45%
integrator_shade_surface                    1,159,764,041          676    1,715,627  20.57s  43.52%
integrator_shade_shadow                       598,885,847          418    1,432,741   1.27s   2.69%
integrator_queued_paths_array               2,969,650,130          805    3,689,006   0.35s   0.74%
integrator_queued_shadow_paths_array          593,936,619          379    1,567,115   0.14s   0.29%
integrator_terminated_paths_array              22,205,417          155      143,260   0.05s   0.10%
integrator_sorted_paths_array               2,517,140,043          676    3,723,579   1.65s   3.50%
integrator_compact_paths_array                648,912,748          155    4,186,533   0.03s   0.07%
integrator_compact_states                      20,872,687          155      134,662   0.14s   0.29%
integrator_terminated_shadow_paths_array      374,100,675          438      854,111   0.16s   0.33%
integrator_compact_shadow_paths_array         503,768,657          438    1,150,156   0.05s   0.10%
integrator_compact_shadow_states               37,664,941          202      186,460   0.23s   0.50%
integrator_reset                               25,165,824            6    4,194,304   0.06s   0.12%
film_convert_combined_half_rgba                 3,110,400            6      518,400   0.00s   0.01%
prefix_sum                                            676          676            1   0.19s   0.40%
---------------------------------------------------------------------------------------------------
                                                                 6,760               47.27s 100.00%
---------------------------------------------------------------------------------------------------
```

Reviewed By: brecht

Differential Revision: https://developer.blender.org/D15044
2022-06-07 11:08:39 +01:00
Michael Jones
007184bcf2 Enable inlining on Apple Silicon. Use new process-wide ShaderCache in order to safely re-enable binary archives
This patch is the same as D14763, but with a fix for unit test failures caused by ShaderCache fetch logic not working in the non-MetalRT case:

```
diff --git a/intern/cycles/device/metal/kernel.mm b/intern/cycles/device/metal/kernel.mm
index ad268ae7057..6aa1a56056e 100644
--- a/intern/cycles/device/metal/kernel.mm
+++ b/intern/cycles/device/metal/kernel.mm
@@ -203,9 +203,12 @@ bool kernel_has_intersection(DeviceKernel device_kernel)

   /* metalrt options */
   request.pipeline->use_metalrt = device->use_metalrt;
-  request.pipeline->metalrt_hair = device->kernel_features & KERNEL_FEATURE_HAIR;
-  request.pipeline->metalrt_hair_thick = device->kernel_features & KERNEL_FEATURE_HAIR_THICK;
-  request.pipeline->metalrt_pointcloud = device->kernel_features & KERNEL_FEATURE_POINTCLOUD;
+  request.pipeline->metalrt_hair = device->use_metalrt &&
+                                   (device->kernel_features & KERNEL_FEATURE_HAIR);
+  request.pipeline->metalrt_hair_thick = device->use_metalrt &&
+                                         (device->kernel_features & KERNEL_FEATURE_HAIR_THICK);
+  request.pipeline->metalrt_pointcloud = device->use_metalrt &&
+                                         (device->kernel_features & KERNEL_FEATURE_POINTCLOUD);

   {
     thread_scoped_lock lock(cache_mutex);
@@ -225,9 +228,9 @@ bool kernel_has_intersection(DeviceKernel device_kernel)

   /* metalrt options */
   bool use_metalrt = device->use_metalrt;
-  bool metalrt_hair = device->kernel_features & KERNEL_FEATURE_HAIR;
-  bool metalrt_hair_thick = device->kernel_features & KERNEL_FEATURE_HAIR_THICK;
-  bool metalrt_pointcloud = device->kernel_features & KERNEL_FEATURE_POINTCLOUD;
+  bool metalrt_hair = use_metalrt && (device->kernel_features & KERNEL_FEATURE_HAIR);
+  bool metalrt_hair_thick = use_metalrt && (device->kernel_features & KERNEL_FEATURE_HAIR_THICK);
+  bool metalrt_pointcloud = use_metalrt && (device->kernel_features & KERNEL_FEATURE_POINTCLOUD);

   MetalKernelPipeline *best_pipeline = nullptr;
   for (auto &pipeline : collection) {

```

Reviewed By: brecht

Differential Revision: https://developer.blender.org/D14923
2022-05-11 16:20:59 +01:00
Brecht Van Lommel
52a5f68562 Revert "Cycles: Enable inlining on Apple Silicon for 1.1x speedup"
This reverts commit b82de02e7c. It is causing
crashes in various regression tests.

Ref D14763
2022-04-28 00:46:43 +02:00
Michael Jones
b82de02e7c Cycles: Enable inlining on Apple Silicon for 1.1x speedup
This is a stripped down version of D14645 without the scene specialisation optimisations.

The two major changes in this patch are:

- Enables more aggressive inlining on Apple Silicon resulting in a 1.1x speedup and 10% reduction in spill, at the cost of longer pipeline build times
- Revival of shader binary archives through a new ShaderCache which is shared between MetalDevice instances using the same physical MTLDevice. This mitigates the extra compile times via explicit caching (rather than, as before, relying on the implicit system shader cache which can be purged without notice)

Reviewed By: brecht

Differential Revision: https://developer.blender.org/D14763
2022-04-26 22:17:16 +01:00
Brecht Van Lommel
9cfc7967dd Cycles: use SPDX license headers
* Replace license text in headers with SPDX identifiers.
* Remove specific license info from outdated readme.txt, instead leave details
  to the source files.
* Add list of SPDX license identifiers used, and corresponding license texts.
* Update copyright dates while we're at it.

Ref D14069, T95597
2022-02-11 17:47:34 +01:00
Michael Jones
17cab47ed1 Cycles: Fix T94736: Crash when modifying strength of world environment texture
This patch fixes crash T94736 on Metal in which the launch_params were not being updated to reflect destruction of MetalMem objects.

Reviewed By: brecht

Differential Revision: https://developer.blender.org/D13875
2022-01-19 18:17:37 +00:00
Michael Jones
9558fa5196 Cycles: Metal host-side code
This patch adds the Metal host-side code:

- Add all core host-side Metal backend files (device_impl, queue, etc)
- Add MetalRT BVH setup files
- Integrate with Cycles device enumeration code
- Revive `path_source_replace_includes` in util/path (required for MSL compilation)

This patch also includes a couple of small kernel-side fixes:

- Add an implementation of `lgammaf` for Metal [Nemes, Gergő (2010), "New asymptotic expansion for the Gamma function", Archiv der Mathematik](https://users.renyi.hu/~gergonemes/)
- include "work_stealing.h" inside the Metal context class because it accesses state now

Ref T92212

Reviewed By: brecht

Maniphest Tasks: T92212

Differential Revision: https://developer.blender.org/D13423
2021-12-07 15:52:21 +00:00