Commit Graph

1344 Commits

Author SHA1 Message Date
Xavier Hallade
2cfe69c07d Cycles: Fix error handling of BVH transfer to device
Previously, in case of a failure during BVH transfer, when running out
of memory for example, we could get an error such as "BVH failed to
migrate to the GPU due to Embree library error (no error)", because
embree error status was actually reset before being queried.
This commit fixes its propagation.

Pull Request: https://projects.blender.org/blender/blender/pulls/129022
2024-10-15 10:31:30 +02:00
Sergey Sharybin
6c3f3a7fb6 Fix: Proper forward declaration for friend class
Turns out it is possible to have code to pick up wrong class
when defining a friend:

```
intern\cycles\device/memory.h(255): warning C4099: 'GPUDevice': type name first seen using 'struct' now seen using 'class'
source\blender\gpu\GPU_platform.hh(69): note: see declaration of 'GPUDevice'
```

Now made it so the classes have forward declaration in the CCL
namespace, avoiding possible conflict with the classes with the
same name in the global namespace.

Pull Request: https://projects.blender.org/blender/blender/pulls/128485
2024-10-04 09:56:54 +02:00
Sergey Sharybin
95f361ac31 Fix: Cycles occasional crash after Metal render
Happens for renders from command line, when kernel specialization
thread is still working after the allocators on the Blender side
have been deinitialized.

Add an explicit deinitializaiton, which ensures all Cycles worker
and cache threads are finished before the allocators are deinitialized.

This should solve occasional crashes when running regression tests
for Metal or Metal-RT.

Pull Request: https://projects.blender.org/blender/blender/pulls/128239
2024-09-27 14:39:49 +02:00
Sergey Sharybin
b96a7b7204 Fix #127622: 4.1 splash screen won't render with MetalRT
The commit which made the issue to be more easily discoverable is 4651f8a08f.

The fix is similar to #127114.

Pull Request: https://projects.blender.org/blender/blender/pulls/128173
2024-09-26 13:39:22 +02:00
Sahar A. Kashi
26ed4d3892 Cycles: Linux Support for HIP-RT
This change switches Cycles to an opensource HIP-RT library which
implements hardware ray-tracing. This library is now used on
both Windows and Linux. While there should be no noticeable changes
on Windows, on Linux this adds support for hardware ray-tracing on
AMD GPUs.

The majority of the change is typical platform code to add new
library to the dependency builder, and a change in the way how
ahead-of-time (AoT) kernels are compiled. There are changes in
Cycles itself, but they are rather straightforward: some APIs
changed in the opensource version of the library.

There are a couple of extra files which are needed for this to
work: hiprt02003_6.1_amd.hipfb and oro_compiled_kernels.hipfb.
There are some assumptions in the HIP-RT library about how they
are available. Currently they follow the same rule as AoT
kernels for oneAPI:
- On Windows they are next to blender.exe
- On Linux they are in the lib/ folder

Performance comparison on Ubuntu 22.04.5:
```
GPU: AMD Radeon PRO W7800
Driver: amdgpu-install_6.1.60103-1_all.deb
                       main         hip-rt
attic                  0.1414s      0.0932s
barbershop_interior    0.1563s      0.1258s
bistro                 0.2134s      0.1597s
bmw27                  0.0119s      0.0099s
classroom              0.1006s      0.0803s
fishy_cat              0.0248s      0.0178s
junkshop               0.0916s      0.0713s
koro                   0.0589s      0.0720s
monster                0.0435s      0.0385s
pabellon               0.0543s      0.0391s
sponza                 0.0223s      0.0180s
spring                 0.1026s      1.5145s
victor                 0.1901s      0.1239s
wdas_cloud             0.1153s      0.1125s
```

Co-authored-by: Brecht Van Lommel <brecht@blender.org>
Co-authored-by: Ray Molenkamp <github@lazydodo.com>
Co-authored-by: Sergey Sharybin <sergey@blender.org>

Pull Request: https://projects.blender.org/blender/blender/pulls/121050
2024-09-24 14:35:24 +02:00
salipourto
17ddca5017 Fix #127240: Deforming motion blurred point clouds do not render under certain conditions
Deforming motion blurred point clouds do not render in Cycles
HIP-RT when BVH timesteps != 0 if Blender is launched with
debug memory.

The root cause is that the size of allocated memory for the
bounding boxes is reported to HIP-RT not the number of valid
bounding boxes.

Pull Request: https://projects.blender.org/blender/blender/pulls/127432
2024-09-12 16:47:13 +02:00
salipourto
4bfee1936c Fix #126749: HIP-RT Memory leak in Cycles viewport
hiprtScene object wasn't being freed between frames.

Pull Request: https://projects.blender.org/blender/blender/pulls/127473
2024-09-12 15:05:08 +02:00
Xavier Hallade
a1182e07b1 Build: upgrade Intel Graphics Compiler and ocloc on Linux
IGC 1.0.17384, ocloc 24.31.30508, which:
- add support for Battlemage and Lunar Lake GPUs
- recover from recent performance regression on Linux
- allow to drop older work-around
  (9d5164d472) and need for a patched
  version on Windows
- ocloc now needs "dg2,mtl" naming for fat binaries.

opencl-clang patches don't get applied anymore by igc build scripts
when llvm is not a git repository, hence I could also drop we can drop
current patch disabling patching.

I've only slightly pushed min-driver-version updates after carefull
testing, instead of jumping to the same version as ocloc as we use to.

Pull Request: https://projects.blender.org/blender/blender/pulls/127251
2024-09-12 09:11:56 +02:00
Xavier Hallade
56db2d393d Cycles: oneAPI: use ocloc 101.5972 on Windows
This new version of the graphics compiler solves a performance
regression on Arc, adds support for Battlemage and Lunar Lake GPUs, and
allows to drop older patch to build fat binaries with broad
compatibility.
This latter change requires using -device dg2,mtl naming instead of
passing architecture ids.

Pull Request: https://projects.blender.org/blender/blender/pulls/127371
2024-09-11 17:34:13 +02:00
salipourto
d4597e20b6 Fix #127131: Deforming motion blurred point clouds do not render in Cycles HIP-RT when BVH timesteps != 0
The device code was disabled for primitives with deformation blur
and the intersection function always returned false, hence no
rendered primitive.

Other than that, there were a few bugs on both device and host codes
(e.g., the order of current and previous times and the primitive name.)

Pull Request: https://projects.blender.org/blender/blender/pulls/127163
2024-09-06 12:27:17 +02:00
Nikita Sirgienko
94c9898f41 Fix #124811: Cycles: oneAPI: no hair strands in viewport with Embree
oneAPI kernels preloading logic was letting un-needed kernels to be
compiled without features, which would then miss when these kernels
were needed later.

Pull Request: https://projects.blender.org/blender/blender/pulls/127114
2024-09-04 11:08:00 +02:00
Alaska
8cf4d47fe2 Fix: Improve Cycles point clouds in HIPRT
Fixes a few issues with point clouds with HIPRT.
1. Crashing when building the BLAS due to an incorrect sized array.
2. A typo leading to all point cloud intersections being skipped.
3. A typo leading to some motion blurred point clouds rendering
as if they were stationary, or not rendering at all.

Pointclouds, with deformable motion blur, with BVH time steps set to >0
still do not render. Curves seem to have the same issue.

Ref #125086

Pull Request: https://projects.blender.org/blender/blender/pulls/125834
2024-09-03 16:31:41 +02:00
Sergey Sharybin
92733a9415 Fix: Cycles memory leak in HIP-RT
Some of the device memory objects had their host_pointer overwritten
with another CPU-side buffer after allocation. This leads to a leak of
host memory allocated by the device_memory.

There are few remaining places where the host_pointer is assigned and
those seems to be fine because the memory was not yet allocated with
a alloc() call.

While the approach in this change is not very ideal, it is small and
potentially could be ported to the LTS tracks. More ideal solution
would be to utilize device_vector::give_data().

Pull Request: https://projects.blender.org/blender/blender/pulls/126788
2024-08-27 12:46:54 +02:00
Patrick Mours
013a2ce765 Cycles: Change OptiX curve vertex data generation to use more compact representation
OptiX has accepted Catmull-Rom curve data natively since OptiX 7.4, but due to the previous conversion to B-Spline code, the format that data is fed to OptiX wasn't optimal.

Each curve segment was put in the vertex buffer as four independent control points, even though continuous segments actually share control points between each other. This patch compacts that so shared control points only occur once in the vertex buffer.
This compact form uses less memory and also allows OptiX to easily identify segments that belong together into a curve (those where the step between indices is one).

Pull Request: https://projects.blender.org/blender/blender/pulls/125899
2024-08-15 15:00:56 +02:00
Sergey Sharybin
ce6454d02f Fix #126005: x64 Blender on Apple Silicon doesn't render properly in Cycles GPU
The GPU packed state is a static check from the Cycles core perspective,
and it is disabled for non-Apple Silicon GPUs. However, the Metal kernel
always used packed integrator.

This change makes it so the Host and Device side checks for the Host CPU
are aligned, and that Device-side packed state check does not differ from
the Host side.

Pull Request: https://projects.blender.org/blender/blender/pulls/126082
2024-08-08 16:01:23 +02:00
Alaska
50ba7a3033 Fix build failure after recent HIP-RT change
Fixes build failure after 6b848a9993

Pull Request: https://projects.blender.org/blender/blender/pulls/125936
2024-08-06 02:41:11 +02:00
Alaska
6b848a9993 Fix: Crash with deforming motion blurred meshes in HIPRT
Fixes a crash that can occur if motion blur was on, there is a
deforming mesh in the scene with deformable motion blur turned on,
with BVH time steps set >0.

Render results in my test scene appear to match CPU Embree.

Pull Request: https://projects.blender.org/blender/blender/pulls/125854
2024-08-05 18:38:28 +02:00
Xavier Hallade
cee4ad4518 Refactor: Cycles: oneAPI: Simplify num_concurrent_states()
Deduplicated code by reusing num_concurrent_busy_states().
2024-07-18 15:46:17 +02:00
Xavier Hallade
c8421a0007 Cycles: set num_sort_partition_elements to 65536 for simd16+ Intel GPUs
Intel(R) Data Center GPU Max greatly benefits from this change since
its bigger simd width leads to a greater execution divergence.
2024-07-18 15:15:00 +02:00
Weizhen Huang
4a4270d73c Merge branch 'blender-v4.2-release' 2024-07-08 16:19:41 +02:00
Michael Jones
5a29be3c75 Cycles: Fix #116243, #122022 - MetalRT live viewport stability issues
This PR fixes live viewport stability issues on Mac when MetalRT is enabled.

There were two sources of instability:

1) `MTLAccelerationStructure` instances were not being correctly retained meaning that use-after-free crashes could occur following a geometry sync.
2) `MTLIntersectionFunctionTable` objects could be unsafely shared between multiple `MetalDeviceQueue` instances (in this case, `setBuffer` being the unsafe mutation)

The solution to 2 involves creating a new `MetalDispatchPipeline` type which is strictly used by only 1 `MetalDeviceQueue` instance.

Pull Request: https://projects.blender.org/blender/blender/pulls/124055
2024-07-08 16:18:34 +02:00
Michael Jones
3c38bff667 Cycles: Fix MetalRT motion blur setup buffer overrun
This PR fixes a buffer overrun crash in the MetalRT backend. When non-traceable objects are in the scene, 'num_motion_transforms' is undercounting and the downstream buffer writes (i.e.`motion_transforms[motion_transform_index++]`) are overrunning.

Pull Request: https://projects.blender.org/blender/blender/pulls/124351
2024-07-08 16:17:19 +02:00
Nikita Sirgienko
74c09b2e63 Cycles: oneAPI: Fix undefined behavior when embree fails initializing
Embree device pointer can end up being nullptr even when Embree on GPU is
expected to be used.  Previous implementation overlooked this possibility,
leading to a completely silent fallback to the non-Hardware ray-tracing path,
this commit fixes it.  We've noticed this as now Embree relies on a driver
component: https://github.com/intel/level-zero-raytracing-support that can
potentially be missing from a system.

Pull Request: https://projects.blender.org/blender/blender/pulls/124085
2024-07-03 14:13:01 +02:00
Xavier Hallade
4477641467 Cycles: oneAPI: Fix driver version check for future Intel GPU drivers
SYCL runtime currently relies on an internal driver behavior that will
break the driver version string returned by SYCL if it changes:
https://github.com/oneapi-src/unified-runtime/issues/1777
This will be fixed at SYCL runtime level but until we use a new enough
one, we need to add additional verifications to avoid blocking execution
on a driver that will change this internal behavior.

Pull Request: https://projects.blender.org/blender/blender/pulls/124084
2024-07-03 14:12:16 +02:00
Xavier Hallade
275b6ee008 Merge branch 'blender-v4.2-release' 2024-07-03 14:03:37 +02:00
Alaska
c8340cf754 Cycles: Remove AMD and Intel GPU support from Metal backend
This is because with the addition of new features to Cycles, these GPUs
experienced significant performance regressions and bugs, all stemming
from bugs in the Metal GPU driver/compiler. The only reasonable way to
work around these issues was to disable parts of Cycles code on
these GPUs to avoid the driver/compiler bugs.

This resulted in increased development time maintaining these platforms
while being unable to deliver feature parity with other
GPU backends.

It has been decided that this development time is better spent
maintaining platforms that are still actively maintained by
hardware/software vendors, and so AMD and Intel GPU support will be
removed from the Metal backend for Cycles.

Pull Request: https://projects.blender.org/blender/blender/pulls/123551
2024-06-26 17:16:20 +02:00
Alaska
3232458152 Fix #123763: Cycles Metal renders with MNEE stuck on some Macs
On some Macs, MNEE would be disabled in Cycles to work around a bug.
However this just led to these devices skipping over MNEE related
parts of the rendering pipeline and not properly progressing through
the render.

This commit fixes this issue by properly disabling MNEE on these devices.

Pull Request: https://projects.blender.org/blender/blender/pulls/123765
2024-06-26 14:15:01 +02:00
Lukas Stockner
4bde68cdd6 Cycles: Compress GPU kernels to reduce file size
Precompiled Cycles kernels make up a considerable fraction of the total size of
Blender builds nowadays. As we add more features and support for more
architectures, this will only continue to increase.

However, since these kernels tend to be quite compressible, we can save a lot
of storage by storing them in compressed form and decompressing the required
kernel(s) during loading.

By using Zstandard compression with a high level, we can get decent compression
ratios (~5x for the current kernels) while keeping decompression time low
(about 30ms in the worse case in my tests). And since we already require zstd
for Blender, this doesn't introduce a new dependency.

While the main improvement is to the size of the extracted Blender installation
(which is reduced by ~400-500MB currently), this also shrinks the download on
Windows, since .zip's deflate compression is less effective. It doesn't help on
Linux since we're already using .tar.xz there, but the smaller installed size
is still a good thing.

See #123522 for initial discussion.

Pull Request: https://projects.blender.org/blender/blender/pulls/123557
2024-06-23 00:52:30 +02:00
Luya Tshimbalanga
a9fe638972 Fix: Cycles runtime compile using outdated HIP parameters
This commit resolves an warning message.

Signed-off-by: Luya Tshimbalanga <luya@fedoraproject.org>

Pull Request: https://projects.blender.org/blender/blender/pulls/118401
2024-06-20 12:43:27 +02:00
Patrick Mours
56c1163c21 Fix: Cycles OptiX wrong stack size for OSL pipeline
The callables generated by OSL reference other external functions
(defined in the OSL services module), in which case OptiX cannot
calculate the right stack size just based on the callable alone, it needs to
know all functions linked together in the pipeline to get to an accurate
result. `optixProgramGroupGetStackSize` has an optional pipeline
argument for this purpose, so make use of that to ensure the correct
stack size is calculated.

Ref #122779

Pull Request: https://projects.blender.org/blender/blender/pulls/123368
2024-06-18 15:27:14 +02:00
Brecht Van Lommel
90f09f016e Fix: Incorrect call to cuCtxPopCurrent
cuDevicePrimaryCtxRetain does not push the context onto the stack,
unlike cuCtxCreate.
2024-06-13 19:41:20 +02:00
Sergey Sharybin
b803d7fabb Fix: Command line Cycles render crash on multi-CUDA device
Since #118841 there are more cases where Cycles would check for the
graphics interop support. This could lead to a crash when graphics
interop functions are called without having active graphics context.

This change makes it so there is no graphics interop calls when doing
headless render. In order to achieve this the device creation is now
aware of the headless mode.

Pull Request: https://projects.blender.org/blender/blender/pulls/122844
2024-06-07 17:53:44 +02:00
Sergey Sharybin
11d311e300 Fix: Cycles assert in device consistency check
A regression since #118841.

It is possible that the selected preference device is not found, in which
case a default-initialized DeviceInfo would have added to the list. This
device is set to CPU, but with differnet other fields (such as description)
compared to the actual CPU device.

Pull Request: https://projects.blender.org/blender/blender/pulls/122701
2024-06-04 12:49:30 +02:00
Brecht Van Lommel
a1d52ee950 Fix: Cycles CUDA runtime compilation should mark CUDA 12 as supported 2024-06-03 14:04:30 +02:00
Xavier Hallade
db8021d61a Cycles: oneAPI: explicitly enable/disable SYSMAN
ZES_ENABLE_SYSMAN is supposed to be set for free_memory queries to be
available.
These queries are then optionally used since
759bb6c768, for the host memory fallback
feature.
Setting SYCL_ENABLE_PCI was leading ZES_ENABLE_SYSMAN to be set by DPCPP
2022-12 but it's not used by newer versions of DPCPP.

We however temporarily disable SYSMAN by default on Linux as builds with
JEMALLOC enabled currently lead to driver runtime issues. These can be
worked around by using LD_PRELOAD=libigsc.so.
2024-05-30 12:16:16 +02:00
Xavier Hallade
0b3157dc93 Cleanup: Cycles: remove unused SYCL environment variable
SYCL_PI_LEVEL_ZERO_USE_COPY_ENGINE_FOR_IN_ORDER_QUEUE has been removed
from SYCL runtime years ago.
2024-05-30 11:43:26 +02:00
Nikita Sirgienko
8ee8d01711 Cycles: oneAPI: Fix Out-Of-Memory errors on some integrated GPUs 2024-05-29 21:57:13 +02:00
Campbell Barton
c5a27f011e Cleanup: spelling in comments 2024-05-29 12:49:07 +10:00
Nikita Sirgienko
759bb6c768 Cycles: oneAPI: Enable host memory migration
This enables scenes with all textures not fitting in GPU
memory to finally render. For scenes that are fitting,
no functional change or performance change is expected.

Pull Request: https://projects.blender.org/blender/blender/pulls/122385
2024-05-28 19:04:19 +02:00
Michael Jones
e82d69daa1 Cycles: Disambiguate shadow integrator state buffer names
This patch adds a "shadow" prefix & array index suffixes to the shadow integrator state buffer names. This eliminates confusion when looking at GPU traces etc.

Pull Request: https://projects.blender.org/blender/blender/pulls/121745
2024-05-15 23:19:24 +02:00
Michael Jones
5508b41a40 Cycles: MetalRT optimisations (scene_intersect_shadow + random_walk)
This PR contains optimisations and a general tidy-up of the MetalRT backend.

- Currently `scene_intersect` is used for both normal and (opaque) shadow rays, however the usage patterns are different enough to warrant specialisation. Shadow intersection tests (flagged with `PATH_RAY_SHADOW_OPAQUE`) only need a bool result, but need a larger "self" payload in order to exclude hits against target lights. By specialising we can minimise the payload size in each case (which is helps performance) and avoid some dynamic branching. This PR introduces a new `scene_intersect_shadow` function which is specialised in Metal, and currently redirects to `scene_intersect` in the other backends.

- Currently `scene_intersect_local` is implemented for worst-case payload requirements as demanded by `subsurface_disk` (where `max_hits` is 4). The random_walk case only demands 1 hit result which we can retrieve directly from the intersector object (rather than stashing it in the payload). By specialising, we significantly reduce the payload size for random_walk queries, which has a big impact on performance. Additionally, we only need to use a custom intersection function for the first ray test in a random walk (for self-primitive filtering), so this PR forces faster `opaque` intersection testing for all but the first random walk test.

- Currently `scene_intersect_volume` has a lot of redundant code to handle non-triangle primitives despite volumes only being enclosed by trimeshes. This PR removes this code.

Additionally, this PR tidies up the convoluted intersection function linking code, removes some redundant intersection handlers, and uses more consistent naming of intersection functions.

On a M3 MacBook Pro, these changes give 2-3% performance increase on typical scenes with opaque trimesh materials (e.g. barbershop, classroom junkshop), but can give over 15% performance increase for certain scenes using random walk SSS (e.g. monster).

Pull Request: https://projects.blender.org/blender/blender/pulls/121397
2024-05-10 16:38:02 +02:00
Attila Áfra
26c93c8359 Cycles: Enable OIDN 2.3 lazy device module loading
This enables the new lazy module loading behavior introduced in OIDN 2.3,
without breaking compatibility with older versions of OIDN (using separate
code paths).

Also, the detection of OIDN support for devices is now much cleaner, and
devices do not need to be matched by PCI address or device name anymore.

Pull Request: https://projects.blender.org/blender/blender/pulls/121362
2024-05-07 14:07:39 +02:00
Attila Áfra
2a0a6f18cc Cycles: Add OpenImageDenoise quality option
This adds a new "Quality" option for OIDN to switch between the existing
"High" and "Balanced" modes and the new "Fast" mode introduced in OIDN 2.3.

Pull Request: https://projects.blender.org/blender/blender/pulls/121374
2024-05-06 18:56:16 +02:00
Michael Jones
9b833fdeba Cycles: Use more accurate GPU counter timestamps for profiling in Metal
This PR replaces the existing CPU wall-clock based profiling mechanism with more precise GPU counter based timestamps. As before, it is enabled by setting the env var `CYCLES_METAL_PROFILING=1`. Original implementation by Morteza Mostajabodaveh.

Pull Request: https://projects.blender.org/blender/blender/pulls/121208
2024-04-29 15:25:32 +02:00
Alaska
1dede89eee Refactor: Allow get_apple_gpu_architecture to report non Apple GPUs
get_apple_gpu_architecture will now report if the GPU being checked
is not an Apple GPU.

At the moment this has no functional changes. But it reduces the
chances of mistakes in the future where a developer tries to enable
a feature on newer Apple GPUs using get_apple_gpu_architecture,
and accidentally enables it on unsupported AMD and Intel GPUs.

Pull Request: https://projects.blender.org/blender/blender/pulls/120448
2024-04-15 15:04:23 +02:00
Patrick Mours
33d7fa8cb3 Fix #119959: Enabling "Distribute memory between devices" for Cycles results in error
With the switch to using the primary CUDA context it became possible
for peer access between CUDA devices to already have been enabled for
that context, either by a previous Cycles session or third-party library,
thus causing the call to `cuCtxEnablePeerAccess` to return
`CUDA_ERROR_PEER_ACCESS_ALREADY_ENABLED`. This is not a failure
state however, so just needs to be handled like a success return value.

Pull Request: https://projects.blender.org/blender/blender/pulls/120255
2024-04-15 12:17:32 +02:00
Alaska
eff4fe24cf Cycles: Properly default to Metal-RT off unless GPU is a M3 or newer
Ever since commit [1], `use_metalrt_by_default` will be True
if the GPU being used is not a M1 or M2 based system.
The intention of this was to enable MetalRT by default for
M3 and newer devices that have hardware for ray traversal.

However the side effect of this change was that all AMD GPUs would
have `use_metalrt_by_default` set to True. Which appears to be the
main culprit causing crashes on older AMD GPUs in #120126.
Since these GPUs don't support MetalRT.

This commit fixes this issue by only setting
`use_metalrt_by_default` to True if the GPU is not M1 or M2 based,
and the GPU is Apple Silicon based. Which equates to M3 or newer.
Which is the original intent of this code.

This resolves the issue where AMD GPUs were being told to use MetalRT
by default, when they shouldn't be.

[1] 322a2f7b12

Pull Request: https://projects.blender.org/blender/blender/pulls/120299
2024-04-09 16:19:24 +02:00
Xavier Hallade
cbc7962a73 Cycles: Tune kernel sizes for oneAPI device
This brings a 1-3% performance improvement depending on the scenes, on
the Arc A770.
2024-04-04 16:04:13 +02:00
Brecht Van Lommel
bd1f4343c3 Build: Improve OSL library dependency handling in Cycles
Might fix some missing symbols when the OSL library gets updated.

Pull Request: https://projects.blender.org/blender/blender/pulls/119391
2024-03-29 15:24:30 +01:00
Brecht Van Lommel
53e9fb6b78 Fix #117566: Cycles persistent data not updated by device preferences
Pull Request: https://projects.blender.org/blender/blender/pulls/119970
2024-03-27 18:55:46 +01:00