This PR fixes live viewport stability issues on Mac when MetalRT is enabled.
There were two sources of instability:
1) `MTLAccelerationStructure` instances were not being correctly retained meaning that use-after-free crashes could occur following a geometry sync.
2) `MTLIntersectionFunctionTable` objects could be unsafely shared between multiple `MetalDeviceQueue` instances (in this case, `setBuffer` being the unsafe mutation)
The solution to 2 involves creating a new `MetalDispatchPipeline` type which is strictly used by only 1 `MetalDeviceQueue` instance.
Pull Request: https://projects.blender.org/blender/blender/pulls/124055
This PR fixes a buffer overrun crash in the MetalRT backend. When non-traceable objects are in the scene, 'num_motion_transforms' is undercounting and the downstream buffer writes (i.e.`motion_transforms[motion_transform_index++]`) are overrunning.
Pull Request: https://projects.blender.org/blender/blender/pulls/124351
Embree device pointer can end up being nullptr even when Embree on GPU is
expected to be used. Previous implementation overlooked this possibility,
leading to a completely silent fallback to the non-Hardware ray-tracing path,
this commit fixes it. We've noticed this as now Embree relies on a driver
component: https://github.com/intel/level-zero-raytracing-support that can
potentially be missing from a system.
Pull Request: https://projects.blender.org/blender/blender/pulls/124085
SYCL runtime currently relies on an internal driver behavior that will
break the driver version string returned by SYCL if it changes:
https://github.com/oneapi-src/unified-runtime/issues/1777
This will be fixed at SYCL runtime level but until we use a new enough
one, we need to add additional verifications to avoid blocking execution
on a driver that will change this internal behavior.
Pull Request: https://projects.blender.org/blender/blender/pulls/124084
This is because with the addition of new features to Cycles, these GPUs
experienced significant performance regressions and bugs, all stemming
from bugs in the Metal GPU driver/compiler. The only reasonable way to
work around these issues was to disable parts of Cycles code on
these GPUs to avoid the driver/compiler bugs.
This resulted in increased development time maintaining these platforms
while being unable to deliver feature parity with other
GPU backends.
It has been decided that this development time is better spent
maintaining platforms that are still actively maintained by
hardware/software vendors, and so AMD and Intel GPU support will be
removed from the Metal backend for Cycles.
Pull Request: https://projects.blender.org/blender/blender/pulls/123551
On some Macs, MNEE would be disabled in Cycles to work around a bug.
However this just led to these devices skipping over MNEE related
parts of the rendering pipeline and not properly progressing through
the render.
This commit fixes this issue by properly disabling MNEE on these devices.
Pull Request: https://projects.blender.org/blender/blender/pulls/123765
Precompiled Cycles kernels make up a considerable fraction of the total size of
Blender builds nowadays. As we add more features and support for more
architectures, this will only continue to increase.
However, since these kernels tend to be quite compressible, we can save a lot
of storage by storing them in compressed form and decompressing the required
kernel(s) during loading.
By using Zstandard compression with a high level, we can get decent compression
ratios (~5x for the current kernels) while keeping decompression time low
(about 30ms in the worse case in my tests). And since we already require zstd
for Blender, this doesn't introduce a new dependency.
While the main improvement is to the size of the extracted Blender installation
(which is reduced by ~400-500MB currently), this also shrinks the download on
Windows, since .zip's deflate compression is less effective. It doesn't help on
Linux since we're already using .tar.xz there, but the smaller installed size
is still a good thing.
See #123522 for initial discussion.
Pull Request: https://projects.blender.org/blender/blender/pulls/123557
The callables generated by OSL reference other external functions
(defined in the OSL services module), in which case OptiX cannot
calculate the right stack size just based on the callable alone, it needs to
know all functions linked together in the pipeline to get to an accurate
result. `optixProgramGroupGetStackSize` has an optional pipeline
argument for this purpose, so make use of that to ensure the correct
stack size is calculated.
Ref #122779
Pull Request: https://projects.blender.org/blender/blender/pulls/123368
Since #118841 there are more cases where Cycles would check for the
graphics interop support. This could lead to a crash when graphics
interop functions are called without having active graphics context.
This change makes it so there is no graphics interop calls when doing
headless render. In order to achieve this the device creation is now
aware of the headless mode.
Pull Request: https://projects.blender.org/blender/blender/pulls/122844
A regression since #118841.
It is possible that the selected preference device is not found, in which
case a default-initialized DeviceInfo would have added to the list. This
device is set to CPU, but with differnet other fields (such as description)
compared to the actual CPU device.
Pull Request: https://projects.blender.org/blender/blender/pulls/122701
ZES_ENABLE_SYSMAN is supposed to be set for free_memory queries to be
available.
These queries are then optionally used since
759bb6c768, for the host memory fallback
feature.
Setting SYCL_ENABLE_PCI was leading ZES_ENABLE_SYSMAN to be set by DPCPP
2022-12 but it's not used by newer versions of DPCPP.
We however temporarily disable SYSMAN by default on Linux as builds with
JEMALLOC enabled currently lead to driver runtime issues. These can be
worked around by using LD_PRELOAD=libigsc.so.
This enables scenes with all textures not fitting in GPU
memory to finally render. For scenes that are fitting,
no functional change or performance change is expected.
Pull Request: https://projects.blender.org/blender/blender/pulls/122385
This patch adds a "shadow" prefix & array index suffixes to the shadow integrator state buffer names. This eliminates confusion when looking at GPU traces etc.
Pull Request: https://projects.blender.org/blender/blender/pulls/121745
This PR contains optimisations and a general tidy-up of the MetalRT backend.
- Currently `scene_intersect` is used for both normal and (opaque) shadow rays, however the usage patterns are different enough to warrant specialisation. Shadow intersection tests (flagged with `PATH_RAY_SHADOW_OPAQUE`) only need a bool result, but need a larger "self" payload in order to exclude hits against target lights. By specialising we can minimise the payload size in each case (which is helps performance) and avoid some dynamic branching. This PR introduces a new `scene_intersect_shadow` function which is specialised in Metal, and currently redirects to `scene_intersect` in the other backends.
- Currently `scene_intersect_local` is implemented for worst-case payload requirements as demanded by `subsurface_disk` (where `max_hits` is 4). The random_walk case only demands 1 hit result which we can retrieve directly from the intersector object (rather than stashing it in the payload). By specialising, we significantly reduce the payload size for random_walk queries, which has a big impact on performance. Additionally, we only need to use a custom intersection function for the first ray test in a random walk (for self-primitive filtering), so this PR forces faster `opaque` intersection testing for all but the first random walk test.
- Currently `scene_intersect_volume` has a lot of redundant code to handle non-triangle primitives despite volumes only being enclosed by trimeshes. This PR removes this code.
Additionally, this PR tidies up the convoluted intersection function linking code, removes some redundant intersection handlers, and uses more consistent naming of intersection functions.
On a M3 MacBook Pro, these changes give 2-3% performance increase on typical scenes with opaque trimesh materials (e.g. barbershop, classroom junkshop), but can give over 15% performance increase for certain scenes using random walk SSS (e.g. monster).
Pull Request: https://projects.blender.org/blender/blender/pulls/121397
This enables the new lazy module loading behavior introduced in OIDN 2.3,
without breaking compatibility with older versions of OIDN (using separate
code paths).
Also, the detection of OIDN support for devices is now much cleaner, and
devices do not need to be matched by PCI address or device name anymore.
Pull Request: https://projects.blender.org/blender/blender/pulls/121362
This PR replaces the existing CPU wall-clock based profiling mechanism with more precise GPU counter based timestamps. As before, it is enabled by setting the env var `CYCLES_METAL_PROFILING=1`. Original implementation by Morteza Mostajabodaveh.
Pull Request: https://projects.blender.org/blender/blender/pulls/121208
get_apple_gpu_architecture will now report if the GPU being checked
is not an Apple GPU.
At the moment this has no functional changes. But it reduces the
chances of mistakes in the future where a developer tries to enable
a feature on newer Apple GPUs using get_apple_gpu_architecture,
and accidentally enables it on unsupported AMD and Intel GPUs.
Pull Request: https://projects.blender.org/blender/blender/pulls/120448
With the switch to using the primary CUDA context it became possible
for peer access between CUDA devices to already have been enabled for
that context, either by a previous Cycles session or third-party library,
thus causing the call to `cuCtxEnablePeerAccess` to return
`CUDA_ERROR_PEER_ACCESS_ALREADY_ENABLED`. This is not a failure
state however, so just needs to be handled like a success return value.
Pull Request: https://projects.blender.org/blender/blender/pulls/120255
Ever since commit [1], `use_metalrt_by_default` will be True
if the GPU being used is not a M1 or M2 based system.
The intention of this was to enable MetalRT by default for
M3 and newer devices that have hardware for ray traversal.
However the side effect of this change was that all AMD GPUs would
have `use_metalrt_by_default` set to True. Which appears to be the
main culprit causing crashes on older AMD GPUs in #120126.
Since these GPUs don't support MetalRT.
This commit fixes this issue by only setting
`use_metalrt_by_default` to True if the GPU is not M1 or M2 based,
and the GPU is Apple Silicon based. Which equates to M3 or newer.
Which is the original intent of this code.
This resolves the issue where AMD GPUs were being told to use MetalRT
by default, when they shouldn't be.
[1] 322a2f7b12
Pull Request: https://projects.blender.org/blender/blender/pulls/120299
Running a very simple files when Blender is built with the
WITH_COMPILER_ASAN=ON and WITH_CYCLES_KERNEL_ASAN=ON CMake options
leads to ASAN reporting an unknown-crash at line where the worker
pool is being filled in.
It is not entirely clear if it is a real issue in the code, since
placing debug prints with `this` address report proper addresses,
however there is no harm on capturing `this` pointer by value and
it does solve the ASAN reporting issues.
It is possible to reproduce the ASAN crash with the following steps:
- Start with --factory-startup
- Enable Metal device in User Preferences
- Switch render device to GPU Compute
- Switch viewport more to Rendered
Pull Request: https://projects.blender.org/blender/blender/pulls/119867
In older drivers with an integrated GPU, this may crash. This not only
affects HIP, but also can crash when using Cycles with an NVIDIA or
Intel GPU in combination with an AMD CPU.
Fixes for this are expected to be coming, but there will not be enough
time for user testing, and it is difficult to be certain that the fix
is complete.
So to be careful, this is postponed until it has had more testing.
Pull Request: https://projects.blender.org/blender/blender/pulls/119476
Previously the CUDA context was always destroyed and the module along
with it. Now that this no longer happens, the missing module free became
a memory leak.
Also fix the same issue for HIP, though this is destroying the context
so it's not a problem yet.
Fix part of #119035
Co-authored-by: Brecht Van Lommel <brecht@blender.org>
Global built-ins appear to not work on AMD cards.
Also add a tweak to avoid a performance regression, similar
to what was done before. Disable adaptive subdivision kernel
code if not used.
Pull Request: https://projects.blender.org/blender/blender/pulls/119175