OpenImageDenoise V2 comes with GPU support for various backends. This adds a new class, OIDNDenoiserGPU, in order to add this functionality into the existing Cycles post processing pipeline without having to change it much. OptiX and OIDN CPU denoising remain as they are. Rendering on a supported Intel GPU will automatically select the GPU denoiser.
Device support is initially limited to the oneAPI devices that are supported by Cycles, but can be extended.
Ref #115045
Co-authored-by: Stefan Werner <stefan.werner@intel.com>
Co-authored-by: Ray Molenkamp <github@lazydodo.com>
Pull Request: https://projects.blender.org/blender/blender/pulls/108314
Empty hair geometry in Cycles may still report having one curve, even when
there are no actual segments in that curve. This caused an attempt to build
an acceleration structure with zero primitives, which due to other setup
OptiX rejected with an error. Fix that by checking the number of segments
rather than the number of curves in the hair geometry, since the former will
always be zero for empty geometry.
Pull Request: https://projects.blender.org/blender/blender/pulls/115044
The first problem was triangles with motion blur were all grouped into
one category without separating the ones with and without triangle
motion steps.
The second problem was HIP RT uses the generic motion triangle
intersection function and this function checks prim_visibility buffer.
HIP RT doesn't provide the buffer per primitive but passes it to HIP RT
core per instance.
The buffer name was changed to prim_visibility from visibility to be
the same as what Cycles uses but when the motion triangle intersection
function is called from HIP RT kernels, the instance id is passed to
the function instead of primitive id.
Pull Request: https://projects.blender.org/blender/blender/pulls/114555
This PR works around an issue where zero-filled motion TLAS instance descriptors can cause unexpected hangs during downstream TLAS builds on M3. Instead of zeroing the descriptor we insert an explicit "null" BLAS, achieving the same result.
Pull Request: https://projects.blender.org/blender/blender/pulls/114544
In order to speedup compilation, we upgrade IGC to 1.0.14828.26 along
with ocloc and the associated dependencies.
We also bump min-driver version accordingly to 26918.
Ref !114341
This PR adds tunings for the [newly announced](https://www.youtube.com/watch?v=ctkW3V0Mh-k) M3 family of chips. In particular, MetalRT will be enabled as the automatic default for intersection testing on M3 and beyond to take advantage of hardware raytracing. This will result in significant path-tracing speedups, as well as faster BVH builds.
Pull Request: https://projects.blender.org/blender/blender/pulls/114296
The last good commit was 8474716abb.
After this commits from main were pushed to blender-v4.0-release. These are
being reverted.
Commits a4880576dc from to b26f176d1a that happend afterwards were meant for
4.0, and their contents is preserved.
_(NOTE: This is a clone of [PR 114067](https://projects.blender.org/blender/blender/pulls/114067), but targeting `blender-v4.0-release` as originally intended)_
This PR removes the "experimental" disclaimer from the MetalRT control now that the unit tests all render correctly with it enabled. As well as "Off" and "On", this adds a third "Auto" setting - a new default which can be used to pick the best option.
Pull Request: https://projects.blender.org/blender/blender/pulls/114232
This fixes an issue where animation frames occasionally get corrupted (e.g. when rendering "Pokedstudio" Blender 2.77 splash screen). This happens when the KernelData is refreshed but the MD5 isn't immediately regenerated which can cause the wrong PSO to be selected.
Pull Request: https://projects.blender.org/blender/blender/pulls/114153
This PR adds `@autoreleasepool` blocks around functions that have been observed to create hidden temporary NSObjects, and eventually cause command buffer failures. A couple of allocations needed to be tweaked in order to maintain correct retain/release behaviour. This PR also fixes the command buffer error text to show more useful information.
This PR removes the "experimental" disclaimer from the MetalRT control now that the unit tests all render correctly with it enabled. As well as "Off" and "On", this adds a third "Auto" setting - a new default which can be used to pick the best option.
Pull Request: https://projects.blender.org/blender/blender/pulls/114067
The increased amount of BSDF code from Principled BSDF v2 and the
microfacet BSDF led to a big performance regression on Metal and AMD.
We have not been able to find a good workaround for all scenes.
This change disables the Principled Hair BSDF code when it is not used
in the scene. This makes common benchmark scenes faster, but
performance is still bad in scenes that do use it.
Ref #112596
Pull Request: https://projects.blender.org/blender/blender/pulls/113904
The first public Windows driver version with a higher number is
101.4824, so we bump the min-required driver version on Windows to this
one to ensure compatibility.
<algorithm> header include is missing from some sycl headers, this will
be fixed upstream with https://github.com/intel/llvm/pull/10424,
meanwhile, we work around it by including it directly.
This PR fixes T39823, the sole failing unit test when running with MetalRT. It does so by implementing and binding a missing intersection handler (`__anyhit__cycles_metalrt_volume_test_tri`) which is required for `scene_intersect_volume` (as used by `integrator_volume_stack_update_for_subsurface`) to work as intended. This scene exposed the error as it uses subsurface scattering on a sphere which is intersected by volume.
Pull Request: https://projects.blender.org/blender/blender/pulls/112876
This patch adds `BVHMetalBuildThrottler` which limits the amount of Metal BVH building work that runs concurrently on the GPU. Previously we submitted BVH build requests to the GPU as fast as possible, but in extreme cases this could fail when the device's working set size passes safe limits.
Pull Request: https://projects.blender.org/blender/blender/pulls/112821
This patch fixes the memory leak described in #107714 by adding an `@autoreleasepool` around Metal BVH builds. Certain NSObjects were being retained indefinitely, specifically ones which had been value-passed via an NSArray into acceleration structure descriptors.
Pull Request: https://projects.blender.org/blender/blender/pulls/112820
This patch adds a check to see whether we're actually using NanoVDB textures, and if not, removes `#define WITH_NANOVDB` when generating the scene-optimised kernels. This results in marginally faster render times (maybe 2 or 3%) for scenes that do not use NanoVDB. The generic kernels are unaffected, so this will not impact responsiveness on first render.
Pull Request: https://projects.blender.org/blender/blender/pulls/112822
This patch updates the experimental MetalRT code path to use new [curve primitives](https://developer.apple.com/videos/play/wwdc2023/10128/) which were recently added in macOS 14. This replaces the previous custom box intersection implementation, allowing the driver to better optimise curve acceleration structures for the GPU. On existing hardware, this can speed up MetalRT renders by up to 40% for scenes that use hair / curve primitives extensively.
The MetalRT option will only be available on macOS >= 14, and requires Xcode >= 15 to build (otherwise the option will be compiled out).
Authored by Marco Giordano, Michael Jones, and Jason Fielder
---
Before / after render times (M1 Max MacBook Pro, macOS 14 beta, MetalRT enabled):
```
Custom box intersection MetalRT curve primitives Speedup
fishy_cat 111.5 80.5 1.39
koro 114.4 86.7 1.32
sinosauropteryx 291.8 279.2 1.05
spring 142.3 142.2 1.00
victor 442.7 347.7 1.27
```
---
Pull Request: https://projects.blender.org/blender/blender/pulls/111795
Recent versions of DPC++ dropped using the environment variable
SYCL_PI_LEVEL_ZERO_USE_COPY_ENGINE_FOR_IN_ORDER_QUEUE=0 we were setting.
We're now also setting SYCL_PI_LEVEL_ZERO_USE_COPY_ENGINE=0 by default
to keep a consistent behavior.
In the commonly used cycles headers, it's enough to include
much smaller <iosfwd> than the full <iostream>. While looking at it,
removed inclusion of some other headers from commonly used headers,
that seemed to not be needed.
Pull Request: https://projects.blender.org/blender/blender/pulls/111063