Commit Graph

68 Commits

Author SHA1 Message Date
Campbell Barton
58ea0e051f Cleanup: spelling in comments 2023-11-09 09:54:28 +11:00
Campbell Barton
6bba008325 Cleanup: format 2023-11-09 09:34:49 +11:00
Michael Jones
051ce95628 Cycles: Use Metal Program Scope Global Built-ins on macOS >= 14.0
This PR simplifies the kernel entrypoints by using [Metal Program Scope Global Built-ins](https://developer.apple.com/metal/Metal-Shading-Language-Specification.pdf) when available (macOS >= 14.0).

Pull Request: https://projects.blender.org/blender/blender/pulls/114535
2023-11-07 11:20:16 +01:00
Michael Jones
1c1c6ac457 Cycles: Fix last failing unit test (T39823) on MetalRT
This PR fixes T39823, the sole failing unit test when running with MetalRT.  It does so by implementing and binding a missing intersection handler (`__anyhit__cycles_metalrt_volume_test_tri`) which is required for `scene_intersect_volume` (as used by `integrator_volume_stack_update_for_subsurface`) to work as intended. This scene exposed the error as it uses subsurface scattering on a sphere which is intersected by volume.

Pull Request: https://projects.blender.org/blender/blender/pulls/112876
2023-09-25 22:41:27 +02:00
Campbell Barton
b7f3e0d84e Cleanup: spelling & punctuation in comments
Also remove some unhelpful/redundant comments.
2023-09-14 13:25:24 +10:00
Harley Acheson
092b568a90 Cleanup: Make format
Formatting changes resulting from Make Format
2023-09-13 11:03:43 -07:00
Michael Jones
6c98cb73ac Cycles: Use new MetalRT curve primitives for 3D curves and ribbons
This patch updates the experimental MetalRT code path to use new [curve primitives](https://developer.apple.com/videos/play/wwdc2023/10128/) which were recently added in macOS 14. This replaces the previous custom box intersection implementation, allowing the driver to better optimise curve acceleration structures for the GPU. On existing hardware, this can speed up MetalRT renders by up to 40% for scenes that use hair / curve primitives extensively.

The MetalRT option will only be available on macOS >= 14, and requires Xcode >= 15 to build (otherwise the option will be compiled out).

Authored by Marco Giordano, Michael Jones, and Jason Fielder

---
Before / after render times (M1 Max MacBook Pro, macOS 14 beta, MetalRT enabled):
```
                  Custom box intersection      MetalRT curve primitives       Speedup
fishy_cat           111.5                         80.5                         1.39
koro                114.4                         86.7                         1.32
sinosauropteryx     291.8                        279.2                         1.05
spring              142.3                        142.2                         1.00
victor              442.7                        347.7                         1.27
```

---

Pull Request: https://projects.blender.org/blender/blender/pulls/111795
2023-09-13 16:02:49 +02:00
Campbell Barton
9e41eccc6e Cleanup: spelling in comments 2023-09-08 17:12:29 +10:00
Sergey Sharybin
7e4a51329b Fix shadow linking for Cycles Metal RT
The shadow intersection kernels needs to perform extra checks
to see whether object is really considered a blocker.

Pull Request: https://projects.blender.org/blender/blender/pulls/112012
2023-09-06 15:25:30 +02:00
Campbell Barton
1f01a64403 Cleanup: spelling in comments 2023-09-06 14:23:01 +10:00
Sergey Sharybin
71b4a97cbc Refactor: De-duplicate Metal RT self intersection checks
Use the common BVH utilities header for this.

Added a special type qualifier ccl_ray_data which is defined to ccl_private
for all platforms but Metal. On Metal it is defined to ray_data.

The tricky part is that the BVH utilities are wrapped into the Metal context
class. In some of the BVH functions the context has been already constructed,
but it wasn't done in all the callbacks.

From a quick render tests of the Junkshop benchmark scene there is no render
time difference,

No functional changes are expected.

Pull Request: https://projects.blender.org/blender/blender/pulls/111967
2023-09-05 17:21:49 +02:00
Sergey Sharybin
7365f0b094 Cleanup: Cover .metal files with make format
Pull Request: https://projects.blender.org/blender/blender/pulls/111930
2023-09-05 09:59:47 +02:00
Sergey Sharybin
c59c97c947 Cleanup: Ensure correct order of headers in Metal kernel
Explicitly splint into groups of headers, so that clang-format
does not ruin the required order of headers.
2023-09-05 09:59:41 +02:00
Campbell Barton
0caf227530 License headers: use SPDX-FileCopyrightText for *.inl and *.osl files 2023-08-04 13:24:17 +10:00
Campbell Barton
c12994612b License headers: use SPDX-FileCopyrightText in intern/cycles 2023-06-14 16:53:23 +10:00
Michael Jones
d0467a277a Cycles: MetalRT: Don't apply local object transform if it is baked
This patch fixes the failing shader/bevel unit test when MetalRT is enabled. The ray was being transformed into local object space even when the SD_OBJECT_TRANSFORM_APPLIED flag was set.

Pull Request: https://projects.blender.org/blender/blender/pulls/107292
2023-04-24 15:20:21 +02:00
Michael Jones
5f61eca7af Cycles: Exploit non-uniform threadgroup sizes on Metal
This patch replaces `dispatchThreadgroups` with `dispatchThreads` which takes care of non-uniform threadgroup bounds. This allows us to remove the bounds guards in the integrator kernel entry points.

Pull Request: https://projects.blender.org/blender/blender/pulls/106217
2023-03-29 21:46:11 +02:00
Michael Jones
944a5854c6 Cycles: Fix MetalRT shadow all hit bug
This patch fixes a MetalRT issue where viable shadow hits are discounted based on the false assumption that hits are ordered by distance. With this patch, the following unit tests now pass:

- openvdb smoke
- shadow catcher pt transparent lamp only 0.8
- shadow catcher pt transparent lamp only 1.0

Pull Request: https://projects.blender.org/blender/blender/pulls/106276
2023-03-29 20:20:07 +02:00
Julian Eisel
30e517c3ca Merge branch 'blender-v3.5-release' 2023-03-15 13:07:26 +01:00
Michael Jones
089e8a1887 Cycles: Fix Metal API validation error (use uint instead of ushort)
This PR fixes an error that is given when Metal API validation is enabled. The compute grid can exceed 65536 threads so `ushort` is not sufficient for `metal_grid_id [[threadgroup_position_in_grid]]`.

This PR also fixes OS version warnings ([Cycles Metal: Unguarded access to newer macOS features #105630](https://projects.blender.org/blender/blender/issues/105630))

Pull Request: https://projects.blender.org/blender/blender/pulls/105763
2023-03-14 22:05:55 +01:00
William Leeson
6c03339e48 Cycles: reduce mesh memory usage by unflattening
To improve mesh upload speeds and reduce the size of the scene data which allows larger scenes to be rendered.

The meshes in Cycles are currently stored as flattened meshes, where each triangle is stored as a set of 3 vertices. Unflattening writes out the vertices in a list according to the index buffer. This uses a lot of memory and for current hardware does not provide a noticeable benefit. This change unflattens the mesh by directly using the meshes vertex and index buffers directly and skips the unflattening. This change allows for larger scenes and also a reduction in the sizes of the meshes. Further it results in a decrease the amount of time it takes to upload the data to a GPU. This is especially important for when multiple GPUs are used in a single machine.

Pull Request #105173
2023-02-27 10:39:19 +01:00
Brecht Van Lommel
02c2970983 Cycles: add NanoVDB support for Metal on Apple Silicon
Contributed by Yulia Kuznetcova at Apple.

NanoVDB is patched to give add address spaces required by Metal. We hope that
in the future Metal will support the generic address space.

For AMD and Intel this is currently not available since it causes a performance
regression also on scenes without volumes.

Pull Request #104837
2023-02-21 15:03:52 +01:00
Michael Jones
2d994de77c Cycles: MetalRT optimisation for subsurface intersection queries
This patch optimises subsurface intersection queries on MetalRT. Currently intersect_local traverses from the scene root, retrospectively discarding all non-local hits. Using a lookup of bottom level acceleration structures, we can explicitly query only the relevant instance. On M1 Max, with MetalRT selected, this can give a render speedup of 15-20% for scenes like Monster which make heavy use of subsurface scattering.

Patch authored by Marco Giordano.

Reviewed By: brecht

Differential Revision: https://developer.blender.org/D17153
2023-02-06 19:12:29 +00:00
Michael Jones
654e1e901b Cycles: Use local atomics for faster shader sorting (enabled on Metal)
This patch adds two new kernels: SORT_BUCKET_PASS and SORT_WRITE_PASS. These replace PREFIX_SUM and SORTED_PATHS_ARRAY on supported devices (currently implemented on Metal, but will be trivial to enable on the other backends). The new kernels exploit sort partitioning (see D15331) by sorting each partition separately using local atomics. This can give an overall render speedup of 2-3% depending on architecture. As before, we fall back to the original non-partitioned sorting when the shader count is "too high".

Reviewed By: brecht

Differential Revision: https://developer.blender.org/D16909
2023-02-06 11:18:26 +00:00
Campbell Barton
79c82fc1c5 Cleanup: trailing space 2023-01-31 15:49:04 +11:00
Hallam Roberts
a501a2dbff Images: add mirror extension type
This adds a new mirror image extension type for shaders and
geometry nodes (next to the existing repeat, extend and clip
options).

See D16432 for a more detailed explanation of `wrap_mirror`.

This also adds a new sampler flag `GPU_SAMPLER_MIRROR_REPEAT`.
It acts as a modifier to `GPU_SAMPLER_REPEAT`, so any `REPEAT`
flag must be set for the `MIRROR` flag to have an effect.

Differential Revision: https://developer.blender.org/D16432
2022-12-14 19:27:29 +01:00
Michael Jones
b0e2e45496 Cycles: Enable MetalRT pointclouds & other fixes
Code authored by Marco Giordano.

This fixes pointcloud rendering on MetalRT and some other subtle MetalRT bugs:
- Incorrect kernel hashing
- Missing specialisation constants
- Incorrect visibility filtering
- Missing null pointer check

Reviewed By: brecht

Differential Revision: https://developer.blender.org/D16499
2022-11-14 16:39:18 +00:00
Patrick Mours
e6b38deb9d Cycles: Add basic support for using OSL with OptiX
This patch  generalizes the OSL support in Cycles to include GPU
device types and adds an implementation for that in the OptiX
device. There are some caveats still, including simplified texturing
due to lack of OIIO on the GPU and a few missing OSL intrinsics.

Note that this is incomplete and missing an update to the OSL
library before being enabled! The implementation is already
committed now to simplify further development.

Maniphest Tasks: T101222

Differential Revision: https://developer.blender.org/D15902
2022-11-09 15:30:21 +01:00
Lukas Stockner
e2a93e9c7c Fix T94136: Cycles: No Hair Shadows with Transparent BSDF 2022-10-20 04:47:21 +02:00
Morteza Mostajab
e6902d19a0 Cycles: Allow Intel GPUs under Metal
Known Issues:
- Command buffer failures when using binary archives (binary archives is disabled for Intel GPUs as a workaround)
- Wrong texture sampler being applied (to be addressed in the future)

Ref T92212

Reviewed By: brecht

Maniphest Tasks: T92212

Differential Revision: https://developer.blender.org/D16253
2022-10-19 17:09:38 +01:00
Michael Jones
2b88ee50fb Cycles: Tweak inlining policy on Metal
This patch optimises the Metal inlining policy. It gives a small speedup (2-3% on M1 Max) with no notable compilation slowdown vs what is already in master. Previously noted compilation slowdowns (as reported in T100102) were caused by forcing inlining for `ccl_device`, but we get better rendering perf by relying on compiler heuristics in these cases.

Reviewed By: brecht

Differential Revision: https://developer.blender.org/D16081
2022-09-27 17:01:28 +01:00
Brecht Van Lommel
6d08ba8a50 Fix T100824: Cycles GPU render broken on macOS 13 Beta and Apple silicon
The recent revert of Apple silicon inlining changes to avoid long compile times
worked on macOS 12, but in macOS 13 Beta it results in render errors. This may
be a compiler bug and perhaps get fixed in time, but try to be on the safe side
and ensure Blender 3.3.0 works regardless.

This brings part of the inlining back, which brings improved performance but
also longer compiler times again. Compile time is around 2min now, where the
previous full inlining was about 5-7min.

Patch by Michael Jones.

Differential Revision: https://developer.blender.org/D15897
2022-09-06 19:11:52 +02:00
Brecht Van Lommel
9961aae1e6 Merge branch 'blender-v3.3-release' 2022-08-18 20:31:34 +02:00
Brecht Van Lommel
e11c899e71 Cycles: disable Metal inlining optimization on Apple GPUs
This gave a 1.1x speedup, however also leads to very long compile times
that make it seems like Blender has stopped working.

This can be brought back in the future behind an option that users can
explicitly enabled.

Fix T100102

Ref D14923, D14763, T92212
2022-08-18 20:01:29 +02:00
Brecht Van Lommel
3aeacb9ab3 Merge branch 'blender-v3.3-release' 2022-08-15 13:53:42 +02:00
Brecht Van Lommel
c2c019dda8 Fix Cycles MetalRT compile error 2022-08-13 19:55:38 +02:00
Brecht Van Lommel
1988665c3c Cleanup: make vector types make/print functions consistent between CPU and GPU
Now all the same ones are available on CPU and GPU, which was previously not
possible due to lack of operator overloadng in OpenCL. Print functions are
no-ops on some GPUs.

Ref D15535
2022-08-09 16:07:23 +02:00
Brecht Van Lommel
fa514564b0 Fix T99201: Cycles render difference with 3D hair curves between OptiX and Emrbee
It should consistently use the Cycles pirmitive ID for self intersection detection,
not the one from the OptiX or Embree acceleration structure.

Differential Revision: https://developer.blender.org/D15632
2022-08-05 15:03:47 +02:00
Brecht Van Lommel
38af5b0501 Cycles: switch Cycles triangle barycentric convention to match Embree/OptiX
Simplifies intersection code a little and slightly improves precision regarding
self intersection.

The parametric texture coordinate in shader nodes is still the same as before
for compatibility.
2022-07-27 21:03:33 +02:00
Brecht Van Lommel
4cf6524731 Fix Cycles Metal build errors after recent changes
float8 is a reserved type in Metal, but is not implemented. So rename to
float8_t for now.

Also move back intersection handlers to kernel.metal, they can't be in the
class that encapsulates the other Metal kernel functions.
2022-07-26 00:17:37 +02:00
Brecht Van Lommel
7a74d91e32 Cleanup: move device BVH code to kernel/device/*/bvh.h
Having the OptiX/MetalRT/Embree/MetalRT implementations all in one file with
many #ifdefs became too confusing. Instead split it up per device, and also
move it together with device specific hit/filter/intersect functions and
associated data types.
2022-07-25 16:34:22 +02:00
Brecht Van Lommel
484ad31653 Cycles: simplify handling of ray distance in GPU rendering
All our intersections functions now work with unnormalized ray direction,
which means we no longer need to transform ray distance between world and
object space, they can all remain in world space.

There doesn't seem to be any real performance difference one way or the
other, but it does simplify the code.
2022-07-25 13:27:40 +02:00
Brecht Van Lommel
5152c7c152 Cycles: refactor rays to have start and end distance, fix precision issues
For transparency, volume and light intersection rays, adjust these distances
rather than the ray start position. This way we increment the start distance
by the smallest possible float increment to avoid self intersections, and be
sure it works as the distance compared to be will be exactly the same as
before, due to the ray start position and direction remaining the same.

Fix T98764, T96537, hair ray tracing precision issues.

Differential Revision: https://developer.blender.org/D15455
2022-07-15 18:46:24 +02:00
Brecht Van Lommel
bb376da6df Fix Cycles MetalRT error after recent specialization changes 2022-07-15 18:28:13 +02:00
Michael Jones
da4ef05e4d Cycles: Apple Silicon optimization to specialize intersection kernels
The Metal backend now compiles and caches a second set of kernels which are
optimized for scene contents, enabled for Apple Silicon.

The implementation supports doing this both for intersection and shading
kernels. However this is currently only enabled for intersection kernels that
are quick to compile, and already give a good speedup. Enabling this for
shading kernels would be faster still, however this also causes a long wait
times and would need a good user interface to control this.

M1 Max samples per minute (macOS 13.0):

                    PSO_GENERIC  PSO_SPECIALIZED_INTERSECT  PSO_SPECIALIZED_SHADE

barbershop_interior       83.4	            89.5                   93.7
bmw27                   1486.1	          1671.0                 1825.8
classroom                175.2	           196.8                  206.3
fishy_cat                674.2	           704.3                  719.3
junkshop                 205.4	           212.0                  257.7
koro                     310.1	           336.1                  342.8
monster                  376.7	           418.6                  424.1
pabellon                 273.5	           325.4                  339.8
sponza                   830.6	           929.6                 1142.4
victor                    86.7              96.4                   96.3
wdas_cloud               111.8	           112.7                  183.1

Code contributed by Jason Fielder, Morteza Mostajabodaveh and Michael Jones

Differential Revision: https://developer.blender.org/D14645
2022-07-15 13:40:04 +02:00
Brecht Van Lommel
ff1883307f Cleanup: renaming and consistency for kernel data
* Rename "texture" to "data array". This has not used textures for a long time,
  there are just global memory arrays now. (On old CUDA GPUs there was a cache
  for textures but not global memory, so we used to put all data in textures.)
* For CUDA and HIP, put globals in KernelParams struct like other devices.
* Drop __ prefix for data array names, no possibility for naming conflict now that
  these are in a struct.
2022-06-20 12:30:48 +02:00
Michael Jones
007184bcf2 Enable inlining on Apple Silicon. Use new process-wide ShaderCache in order to safely re-enable binary archives
This patch is the same as D14763, but with a fix for unit test failures caused by ShaderCache fetch logic not working in the non-MetalRT case:

```
diff --git a/intern/cycles/device/metal/kernel.mm b/intern/cycles/device/metal/kernel.mm
index ad268ae7057..6aa1a56056e 100644
--- a/intern/cycles/device/metal/kernel.mm
+++ b/intern/cycles/device/metal/kernel.mm
@@ -203,9 +203,12 @@ bool kernel_has_intersection(DeviceKernel device_kernel)

   /* metalrt options */
   request.pipeline->use_metalrt = device->use_metalrt;
-  request.pipeline->metalrt_hair = device->kernel_features & KERNEL_FEATURE_HAIR;
-  request.pipeline->metalrt_hair_thick = device->kernel_features & KERNEL_FEATURE_HAIR_THICK;
-  request.pipeline->metalrt_pointcloud = device->kernel_features & KERNEL_FEATURE_POINTCLOUD;
+  request.pipeline->metalrt_hair = device->use_metalrt &&
+                                   (device->kernel_features & KERNEL_FEATURE_HAIR);
+  request.pipeline->metalrt_hair_thick = device->use_metalrt &&
+                                         (device->kernel_features & KERNEL_FEATURE_HAIR_THICK);
+  request.pipeline->metalrt_pointcloud = device->use_metalrt &&
+                                         (device->kernel_features & KERNEL_FEATURE_POINTCLOUD);

   {
     thread_scoped_lock lock(cache_mutex);
@@ -225,9 +228,9 @@ bool kernel_has_intersection(DeviceKernel device_kernel)

   /* metalrt options */
   bool use_metalrt = device->use_metalrt;
-  bool metalrt_hair = device->kernel_features & KERNEL_FEATURE_HAIR;
-  bool metalrt_hair_thick = device->kernel_features & KERNEL_FEATURE_HAIR_THICK;
-  bool metalrt_pointcloud = device->kernel_features & KERNEL_FEATURE_POINTCLOUD;
+  bool metalrt_hair = use_metalrt && (device->kernel_features & KERNEL_FEATURE_HAIR);
+  bool metalrt_hair_thick = use_metalrt && (device->kernel_features & KERNEL_FEATURE_HAIR_THICK);
+  bool metalrt_pointcloud = use_metalrt && (device->kernel_features & KERNEL_FEATURE_POINTCLOUD);

   MetalKernelPipeline *best_pipeline = nullptr;
   for (auto &pipeline : collection) {

```

Reviewed By: brecht

Differential Revision: https://developer.blender.org/D14923
2022-05-11 16:20:59 +01:00
Brecht Van Lommel
52a5f68562 Revert "Cycles: Enable inlining on Apple Silicon for 1.1x speedup"
This reverts commit b82de02e7c. It is causing
crashes in various regression tests.

Ref D14763
2022-04-28 00:46:43 +02:00
Michael Jones
b82de02e7c Cycles: Enable inlining on Apple Silicon for 1.1x speedup
This is a stripped down version of D14645 without the scene specialisation optimisations.

The two major changes in this patch are:

- Enables more aggressive inlining on Apple Silicon resulting in a 1.1x speedup and 10% reduction in spill, at the cost of longer pipeline build times
- Revival of shader binary archives through a new ShaderCache which is shared between MetalDevice instances using the same physical MTLDevice. This mitigates the extra compile times via explicit caching (rather than, as before, relying on the implicit system shader cache which can be purged without notice)

Reviewed By: brecht

Differential Revision: https://developer.blender.org/D14763
2022-04-26 22:17:16 +01:00
Stefan Werner
65dcb5ebd3 Cycles: Semantically separate 2D and 3D texture objects
Currently there are no functional changes.

Preparing for an upcoming oneAPI integration where such separation
in types is needed.
2022-04-01 19:44:31 +02:00