Commit Graph

240 Commits

Author SHA1 Message Date
Alaska
c8340cf754 Cycles: Remove AMD and Intel GPU support from Metal backend
This is because with the addition of new features to Cycles, these GPUs
experienced significant performance regressions and bugs, all stemming
from bugs in the Metal GPU driver/compiler. The only reasonable way to
work around these issues was to disable parts of Cycles code on
these GPUs to avoid the driver/compiler bugs.

This resulted in increased development time maintaining these platforms
while being unable to deliver feature parity with other
GPU backends.

It has been decided that this development time is better spent
maintaining platforms that are still actively maintained by
hardware/software vendors, and so AMD and Intel GPU support will be
removed from the Metal backend for Cycles.

Pull Request: https://projects.blender.org/blender/blender/pulls/123551
2024-06-26 17:16:20 +02:00
Brecht Van Lommel
d72c4f0096 Fix: Cycles build issues when disabling various kernel features 2024-06-13 19:41:19 +02:00
Lukas Stockner
f3f05f945c Cycles: Add missing make_uintX definitions for Metal 2024-06-05 03:04:04 +02:00
Michael Jones
5be30b7d2b Cycles: "Struct-of-array-of-packed-structs" for parts of the integrator state
On a M3 MacBook Pro, this change increases the benchmark score by 8% (with classroom seeing a path-tracing speedup of 15%).

The integrator state is currently store using struct-of-arrays, with one array per field. Such fine grained separation can result in poor GPU cache utilisation in cases where multiple fields of the same parent struct are accessed together. This PR changes the layout of the `ray`, `isect`, `subsurface`, and `shadow_ray` structs so that the data is interleaved (per parent struct) instead of separate. To try and keep this change localised, I encapsulated the layout change by extending the integrator state access macros, however maybe we want to do this more explicitly? (e.g. by updating every bit of code that accesses these parts of the state). Feedback welcome.

Pull Request: https://projects.blender.org/blender/blender/pulls/122015
2024-06-04 14:53:30 +02:00
Nikita Sirgienko
759bb6c768 Cycles: oneAPI: Enable host memory migration
This enables scenes with all textures not fitting in GPU
memory to finally render. For scenes that are fitting,
no functional change or performance change is expected.

Pull Request: https://projects.blender.org/blender/blender/pulls/122385
2024-05-28 19:04:19 +02:00
Michael Jones
5508b41a40 Cycles: MetalRT optimisations (scene_intersect_shadow + random_walk)
This PR contains optimisations and a general tidy-up of the MetalRT backend.

- Currently `scene_intersect` is used for both normal and (opaque) shadow rays, however the usage patterns are different enough to warrant specialisation. Shadow intersection tests (flagged with `PATH_RAY_SHADOW_OPAQUE`) only need a bool result, but need a larger "self" payload in order to exclude hits against target lights. By specialising we can minimise the payload size in each case (which is helps performance) and avoid some dynamic branching. This PR introduces a new `scene_intersect_shadow` function which is specialised in Metal, and currently redirects to `scene_intersect` in the other backends.

- Currently `scene_intersect_local` is implemented for worst-case payload requirements as demanded by `subsurface_disk` (where `max_hits` is 4). The random_walk case only demands 1 hit result which we can retrieve directly from the intersector object (rather than stashing it in the payload). By specialising, we significantly reduce the payload size for random_walk queries, which has a big impact on performance. Additionally, we only need to use a custom intersection function for the first ray test in a random walk (for self-primitive filtering), so this PR forces faster `opaque` intersection testing for all but the first random walk test.

- Currently `scene_intersect_volume` has a lot of redundant code to handle non-triangle primitives despite volumes only being enclosed by trimeshes. This PR removes this code.

Additionally, this PR tidies up the convoluted intersection function linking code, removes some redundant intersection handlers, and uses more consistent naming of intersection functions.

On a M3 MacBook Pro, these changes give 2-3% performance increase on typical scenes with opaque trimesh materials (e.g. barbershop, classroom junkshop), but can give over 15% performance increase for certain scenes using random walk SSS (e.g. monster).

Pull Request: https://projects.blender.org/blender/blender/pulls/121397
2024-05-10 16:38:02 +02:00
Michael Jones
99f5433445 Cycles: Dormant fixes for adaptive feature compilation
This PR fixes the (currently unused) scene-based selective feature compilation macros. These feature based macros haven't been used for a few years, and enabling them currently results in compilation errors.

The only functional change in this PR is in geom/primitive.h where undef-ing `__HAIR__` had exposed an inconsistency in how pointcloud attributes were being fetched. Using the more general `primitive_surface_attribute_float4` (instead of `curve_attribute_float4`) fixed a compilation error that occurred when rendering pointcloud unit test scenes with adaptive compilation enabled.

Pull Request: https://projects.blender.org/blender/blender/pulls/121216
2024-04-30 12:56:22 +02:00
Weizhen Huang
418acfe8bb Cleanup: remove unused function parameters
This is not a complete list of all the unused parameters in kernel, but
those I touch often, so I am more confident that it's safe to delete them.
2024-04-17 18:49:00 +02:00
Xavier Hallade
891d71a4d4 Cycles: Drop noinline keyword for oneAPI device
fdc2962beb indirectly introduced a change
in inlining (light_tree_pdf started getting inlined) that led to a 5-10%
drop in performance for most scenes.
Dropping the noinline keyword for oneAPI device recovers it.
It however brings another performance regression to MNEE and Raytrace
kernels, that we'll look into separately.
2024-04-02 18:29:35 +02:00
Campbell Barton
2be407fc82 Cleanup: spelling in comments 2024-04-01 16:47:57 +11:00
Weizhen Huang
b81b0308fd Fix: WITH_CYCLES_DEBUG flag not enabled on Metal
seems to be enabled on other GPUs already

Pull Request: https://projects.blender.org/blender/blender/pulls/119701
2024-03-20 16:42:42 +01:00
Thomas Dinges
2b095c97fa Cycles: Increase minimum target on x86 to SSE4.2
* Compile regular host code with SSE4.2
* Remove the SSE2 kernel, only the SSE4.2 and AVX2 kernel remain

Pull Request: https://projects.blender.org/blender/blender/pulls/118471
2024-02-26 14:49:19 +01:00
Thomas Dinges
30a22b92ca Cycles: Rename SSE4.1 kernel to SSE4.2
This commit updates all defines, compiler flags and cleans up some code for unused CPU capabilities.

There should be no functional change, unless it's run on a CPU that supports sse41 but not sse42. It will fallback to the SSE2 kernel in this case.

In preparation for the new SSE4.2 minimum in Blender 4.2.

Pull Request: https://projects.blender.org/blender/blender/pulls/118043
2024-02-09 17:25:58 +01:00
Stefan Werner
31d55e87f9 Cycles: Metal support for OpenImageDenoise
This is supported on Apple Silicon GPUs and macOS 13.0+.

Co-authored-by: Stefan Werner <stefan.werner@intel.com>
Co-authored-by: Attila Afra <attila.t.afra@intel.com>
Pull Request: https://projects.blender.org/blender/blender/pulls/116124
2024-02-06 21:13:23 +01:00
Campbell Barton
617f7b76df Cleanup: comment block formatting 2024-01-08 11:31:43 +11:00
Brecht Van Lommel
d377ef2543 Clang Format: bump to version 17
Along with the 4.1 libraries upgrade, we are bumping the clang-format
version from 8-12 to 17. This affects quite a few files.

If not already the case, you may consider pointing your IDE to the
clang-format binary bundled with the Blender precompiled libraries.
2024-01-03 13:38:14 +01:00
Brecht Van Lommel
d015e98ee6 Fix Cycles ASAN error with boolean kernel arguments 2023-12-12 13:27:36 +01:00
Brecht Van Lommel
6cdb43195e Refactor: replace NanoVDB kernel side implementation by own code
The NanoVDB headers are not compatible with Metal due to missing address
space qualifiers. We currently have a big patch for NanoVDB header
files, which is difficult to update for OpenVDB 11. Instead extract a
few hundred lines of code from NanoVDB to do just what we need.

Pull Request: https://projects.blender.org/blender/blender/pulls/115992
2023-12-10 19:37:36 +01:00
Brecht Van Lommel
8ba474dc4f Refactor: replace NanoVDB SampleFromVoxels by own code
This makes the GPU tricubic implementation more efficient. The dense
grid code implemented this in terms of trilinear lookups that are
hardware accelerated, but for NanoVDB this just causes unnecessary voxel
reads. Instead match the CPU code.

Pull Request: https://projects.blender.org/blender/blender/pulls/115992
2023-12-10 19:37:36 +01:00
Brecht Van Lommel
798a0b301e Cycles: update OSL to work with version 1.13.5
This keeps compatibility with older stable versions, but not
older unreleased versions in the 1.13.x series.

Ref #113157

Pull Request: https://projects.blender.org/blender/blender/pulls/116004
2023-12-10 17:08:47 +01:00
Stefan Werner
02b5e27f89 Cycles: Add Intel GPU support for OpenImageDenoise
OpenImageDenoise V2 comes with GPU support for various backends. This adds a new class, OIDNDenoiserGPU, in order to add this functionality into the existing Cycles post processing pipeline without having to change it much. OptiX and OIDN CPU denoising remain as they are. Rendering on a supported Intel GPU will automatically select the GPU denoiser.

Device support is initially limited to the oneAPI devices that are supported by Cycles, but can be extended.

Ref #115045

Co-authored-by: Stefan Werner <stefan.werner@intel.com>
Co-authored-by: Ray Molenkamp <github@lazydodo.com>
Pull Request: https://projects.blender.org/blender/blender/pulls/108314
2023-11-20 11:12:41 +01:00
Brecht Van Lommel
7b26c3d517 Merge branch 'blender-v4.0-release' into main 2023-11-09 18:24:05 +01:00
salipourto
13171183fa Fix Cycles HIP RT issues with deformation motion blur
The first problem was triangles with motion blur were all grouped into
one category without separating the ones with and without triangle
motion steps.

The second problem was HIP RT uses the generic motion triangle
intersection function and this function checks prim_visibility buffer.
HIP RT doesn't provide the buffer per primitive but passes it to HIP RT
core per instance.

The buffer name was changed to prim_visibility from visibility to be
the same as what Cycles uses but when the motion triangle intersection
function is called from HIP RT kernels, the instance id is passed to
the function instead of primitive id.

Pull Request: https://projects.blender.org/blender/blender/pulls/114555
2023-11-09 18:22:59 +01:00
Campbell Barton
58ea0e051f Cleanup: spelling in comments 2023-11-09 09:54:28 +11:00
Campbell Barton
6bba008325 Cleanup: format 2023-11-09 09:34:49 +11:00
Michael Jones
051ce95628 Cycles: Use Metal Program Scope Global Built-ins on macOS >= 14.0
This PR simplifies the kernel entrypoints by using [Metal Program Scope Global Built-ins](https://developer.apple.com/metal/Metal-Shading-Language-Specification.pdf) when available (macOS >= 14.0).

Pull Request: https://projects.blender.org/blender/blender/pulls/114535
2023-11-07 11:20:16 +01:00
Brecht Van Lommel
39107b3133 Revert changes from main commits that were merged into blender-v4.0-release
The last good commit was 8474716abb.

After this commits from main were pushed to blender-v4.0-release. These are
being reverted.

Commits a4880576dc from to b26f176d1a that happend afterwards were meant for
4.0, and their contents is preserved.
2023-10-30 21:40:35 +01:00
Xavier Hallade
d26a2b09bc Cycles: oneAPI: use hardware cos
Speckles and missing lights were experienced in scenes with Nishita Sky
Texture and a Sun Size smaller than 1.5°, such as in Lone Monk and Attic
scenes.
We previously worked around these by using a more precise
software implementation of cosine.
After recent changes in Cycles, it turns out this workaround isn't
currently needed.
2023-10-06 13:10:27 +02:00
Michael Jones
1c1c6ac457 Cycles: Fix last failing unit test (T39823) on MetalRT
This PR fixes T39823, the sole failing unit test when running with MetalRT.  It does so by implementing and binding a missing intersection handler (`__anyhit__cycles_metalrt_volume_test_tri`) which is required for `scene_intersect_volume` (as used by `integrator_volume_stack_update_for_subsurface`) to work as intended. This scene exposed the error as it uses subsurface scattering on a sphere which is intersected by volume.

Pull Request: https://projects.blender.org/blender/blender/pulls/112876
2023-09-25 22:41:27 +02:00
Campbell Barton
2721b937fb Cleanup: use braces in headers 2023-09-24 14:52:38 +10:00
Campbell Barton
b7f3e0d84e Cleanup: spelling & punctuation in comments
Also remove some unhelpful/redundant comments.
2023-09-14 13:25:24 +10:00
Harley Acheson
092b568a90 Cleanup: Make format
Formatting changes resulting from Make Format
2023-09-13 11:03:43 -07:00
Michael Jones
6c98cb73ac Cycles: Use new MetalRT curve primitives for 3D curves and ribbons
This patch updates the experimental MetalRT code path to use new [curve primitives](https://developer.apple.com/videos/play/wwdc2023/10128/) which were recently added in macOS 14. This replaces the previous custom box intersection implementation, allowing the driver to better optimise curve acceleration structures for the GPU. On existing hardware, this can speed up MetalRT renders by up to 40% for scenes that use hair / curve primitives extensively.

The MetalRT option will only be available on macOS >= 14, and requires Xcode >= 15 to build (otherwise the option will be compiled out).

Authored by Marco Giordano, Michael Jones, and Jason Fielder

---
Before / after render times (M1 Max MacBook Pro, macOS 14 beta, MetalRT enabled):
```
                  Custom box intersection      MetalRT curve primitives       Speedup
fishy_cat           111.5                         80.5                         1.39
koro                114.4                         86.7                         1.32
sinosauropteryx     291.8                        279.2                         1.05
spring              142.3                        142.2                         1.00
victor              442.7                        347.7                         1.27
```

---

Pull Request: https://projects.blender.org/blender/blender/pulls/111795
2023-09-13 16:02:49 +02:00
Xavier Hallade
01931e213f Cycles: oneAPI: only export necessary symbols
The API for the kernels library is defined, there is no need to
export more than that. This change only affects linux since hidden
visiblity is the default on Windows.
2023-09-08 15:44:39 +02:00
Campbell Barton
9e41eccc6e Cleanup: spelling in comments 2023-09-08 17:12:29 +10:00
Sergey Sharybin
7e4a51329b Fix shadow linking for Cycles Metal RT
The shadow intersection kernels needs to perform extra checks
to see whether object is really considered a blocker.

Pull Request: https://projects.blender.org/blender/blender/pulls/112012
2023-09-06 15:25:30 +02:00
Sergey Sharybin
8e49bc4a05 Refactor: Make Cycles shadow linking primitives receive ray self primitives
No functional changes.

Makes it closer to other self-intersection checks, making it easier to
re-use functions from the HW RT kernels.

Pull Request: https://projects.blender.org/blender/blender/pulls/111971
2023-09-06 09:53:29 +02:00
Campbell Barton
1f01a64403 Cleanup: spelling in comments 2023-09-06 14:23:01 +10:00
Sergey Sharybin
71b4a97cbc Refactor: De-duplicate Metal RT self intersection checks
Use the common BVH utilities header for this.

Added a special type qualifier ccl_ray_data which is defined to ccl_private
for all platforms but Metal. On Metal it is defined to ray_data.

The tricky part is that the BVH utilities are wrapped into the Metal context
class. In some of the BVH functions the context has been already constructed,
but it wasn't done in all the callbacks.

From a quick render tests of the Junkshop benchmark scene there is no render
time difference,

No functional changes are expected.

Pull Request: https://projects.blender.org/blender/blender/pulls/111967
2023-09-05 17:21:49 +02:00
Sergey Sharybin
7365f0b094 Cleanup: Cover .metal files with make format
Pull Request: https://projects.blender.org/blender/blender/pulls/111930
2023-09-05 09:59:47 +02:00
Sergey Sharybin
c59c97c947 Cleanup: Ensure correct order of headers in Metal kernel
Explicitly splint into groups of headers, so that clang-format
does not ruin the required order of headers.
2023-09-05 09:59:41 +02:00
salipourto
359bbf6af2 Fix Cycles HIP RT issues with curves and transparent shadows
Ref #104110

Pull Request: https://projects.blender.org/blender/blender/pulls/111414
2023-09-04 17:55:21 +02:00
Xavier Hallade
40a39c2976 Cycles: oneAPI: cleanup: drop __spirv_ocl_cos workaround
As __FAST_MATH__ isn't defined anymore since
09df1f4caf, sycl::cos uses the precise
implementation, no need to call __spirv_ocl_cos anymore.
2023-08-31 13:10:29 +02:00
Nikita Sirgienko
abab47a805 Cycles: oneAPI: Refactoring of local size choice logic 2023-08-22 19:04:16 +02:00
Campbell Barton
7f34ad736a Cleanup: spelling in comments 2023-08-05 13:54:25 +10:00
Campbell Barton
0caf227530 License headers: use SPDX-FileCopyrightText for *.inl and *.osl files 2023-08-04 13:24:17 +10:00
Xavier Hallade
aefc9835f8 Cycles: oneAPI: fix kernel host-side compilation with MSVC 17.7
<algorithm> header include is missing from some sycl headers, this will
be fixed upstream with https://github.com/intel/llvm/pull/10424,
meanwhile, we work around it by including it directly.
2023-07-25 12:01:09 +02:00
Ray Molenkamp
235c564aa0 Cycles: re-Fixed oneAPI build on Windows
fixes one uint missed in a0846a60c9
2023-07-06 14:47:35 -06:00
Stefan Werner
a0846a60c9 Cycles: Fixed oneAPI build on Windows
Turns out uint wasn't defined this early in our kernels on Windows.
Using unsigned int instead should fix this.
2023-07-06 21:50:03 +02:00
Werner, Stefan
7befc40386 Cycles: Use sycl::bitcast in oneAPI backend
Using sycl::bitcast instead of union hack
2023-07-06 15:06:33 +02:00