Commit Graph

1520 Commits

Author SHA1 Message Date
Campbell Barton
cccc2c77c5 Cleanup: consistent for C-style comment blocks 2025-08-08 07:37:33 +10:00
Michael Jones
50363918c7 Cycles: Stop Metal API validation asserts
Dynamic enqueue arguments weren't padded out to struct alignment causing API validation to assert.

Pull Request: https://projects.blender.org/blender/blender/pulls/143991
2025-08-05 14:45:14 +02:00
Campbell Barton
2c27d2be54 Cleanup: grammar corrections, minor improvements to wording 2025-08-01 21:41:24 +10:00
Patrick Mours
6487395fa5 Cycles: Add linear curve shape
Add new "Linear 3D Curves" option in the Curves panel in the render
properties. This renders curves as linear segments rather than smooth
curves, for faster render time at the cost of accuracy.

On NVIDIA Blackwell GPUs, this can give a 6x speedup compared to smooth
curves, due to hardware acceleration. On NVIDIA Ada there is still
a 3x speedup, and CPU and other GPU backends will also render this
faster.

A difference with smooth curves is that these have end caps, as this
was simpler to implement and they are usually helpful anyway.

In the future this functionality will also be used to properly support
the CURVE_TYPE_POLY on the new curves object.

Pull Request: https://projects.blender.org/blender/blender/pulls/139735
2025-07-29 17:05:01 +02:00
Stefan Werner
c81e1d95c1 Cycles: Fixed typo in my last commit 2025-07-29 10:53:13 +02:00
Stefan Werner
e7312b1ad5 Cycles: Explicitly setting SYCL device for Embree
This fixes issues when using Embree on mutliple GPUs.
A previous workaround used separate contexts, this one now
lets us keep a single context for all GPUs.

Pull Request: https://projects.blender.org/blender/blender/pulls/143089
2025-07-29 10:40:28 +02:00
Michael Jones
6f1c63597d Cycles: Disable lossless MTLTexture compression & render up to 2% faster
Disallow lossless texture compression in MetalDevice. Path-tracing texture access patterns are very random, and cache reuse gains are typically too low to offset the decompression overheads. This change doesn't increase memory usage for any of the benchmark scenes (https://projects.blender.org/blender/blender-benchmarks/src/branch/main/cycles) as most textures are high entropy and don't compress well using lossless methods.

Pull Request: https://projects.blender.org/blender/blender/pulls/143074
2025-07-25 17:29:27 +02:00
Michael Jones
f3485cc925 Cycles: MetalRT: Only use extended limits if needed (revisited)
Currently MetalRT renders always use extended limits, which is needed to correctly render scenes where the max primitive count can exceed 2^28 or the instance count can exceed 2^24. This patch adopts Metal best practices of only enabling this flag if it is needed.

This PR is similar to #133364, but there are some notable differences:

1) The old PR made an overly optimistic assumption that all the relevant visibility bits could be squeezed into 8 bits. This new PR adopts the same approach that Optix takes of using 8 bits as a primary HW filter, and checking the full 32 bit mask inside the SW intersection handler.

~~2) I moved the scene scanning check from Scene into MetalDevice. This avoids platform specific details leaking into platform agnostic areas.~~

~~3) In live viewport mode, we always use extended limits in case we tip over the threshold.~~

_EDIT:_
2) The limits are scanned in `Scene::update_kernel_features`, and given to the device by a new `set_bvh_limits` method which returns true if the BVH and kernels need to be reloaded.

Pull Request: https://projects.blender.org/blender/blender/pulls/142401
2025-07-24 13:27:20 +02:00
Thomas Dinges
ce0ae95ed3 Cycles: Bump minimum supported CUDA architecture to sm_50
Pull Request: https://projects.blender.org/blender/blender/pulls/142212
2025-07-21 19:49:21 +02:00
Michael Jones
8077384e3a Cycles: Improve Metal kernel specialisation
This improves the existing scene specialisation mechanism by replacing "kernel_data.kernel_features" with a function constant. It doesn't cause any additional compilation requests, but allows the backend compiler to eliminate more dead code. An additional compiler hint is provided for dead-stripping "volume_stack_enter_exit" which results in slightly faster rendering of non-volumetric scenes.

Pull Request: https://projects.blender.org/blender/blender/pulls/142235
2025-07-18 11:18:43 +02:00
Brecht Van Lommel
df6d6c0932 Refactor: Cycles: Use logging system for GPU error print
Pull Request: https://projects.blender.org/blender/blender/pulls/142257
2025-07-17 21:14:30 +02:00
Michael Jones
9d9d0a7259 Cycles: MTLAccelerationStructureUsagePreferFastIntersection on macOS>=26
macOS 26 introduces a new BVH usage hint: [MTLAccelerationStructureUsagePreferFastIntersection](https://developer.apple.com/documentation/metal/mtlaccelerationstructureusage/preferfastintersection?changes=_3&language=objc)

This will only be compiled if built with Xcode >= 26.

Pull Request: https://projects.blender.org/blender/blender/pulls/141891
2025-07-14 16:59:47 +02:00
Hans Goudey
c3181490f3 Cleanup: Formatting 2025-07-14 10:22:46 -04:00
Nikita Sirgienko
609f8ddbef Cycles: oneAPI: Fix DPC++ level issues for multi GPU execution
These changes introduce modifications to the SYCL queue creation
in OneapiDevice::create_queue. In case several DPC++ devices are
detected by Blender and exposed through it, we are now creating
a new SYCL context for each device, which allows us to prevent
execution failures due to some known issues in the DPC++ runtime
regarding multi GPU support. As this would have some small
performance impact, few percents, it is only applied to
multi GPU configurations, while the behavior for a single
GPU configuration remains the same.

Pull Request: https://projects.blender.org/blender/blender/pulls/141834
2025-07-14 14:33:42 +02:00
Brecht Van Lommel
73fe848e07 Fix: Cycles log levels conflict with macros on some platforms
In particular DEBUG, but prefix all of them to be sure.

Pull Request: https://projects.blender.org/blender/blender/pulls/141749
2025-07-10 19:44:14 +02:00
Miguel Pozo
b5ca00a403 Merge branch 'blender-v4.5-release' 2025-07-10 18:00:04 +02:00
Xavier Hallade
94e9203713 Fix previous 4.5 merge 2025-07-10 17:47:03 +02:00
Xavier Hallade
48f89ff1c3 Merge branch 'blender-v4.5-release' 2025-07-10 17:43:30 +02:00
Michael Jones
7ec0adf033 Fix: Cycles MetalRT motion blur crash in some scenes with static objects
Crash encountered during top-level BVH setup of an Agent 327 asset. Object had no keyframes so `decomp` was empty. Use the object's transform instead.

Pull Request: https://projects.blender.org/blender/blender/pulls/141740
2025-07-10 17:42:49 +02:00
Xavier Hallade
05f27f594e Fix #141661: Crash when selecting oneAPI in preferences with legacy drivers
On systems with multiple Intel GPUs with a mix of recent and old
unsupported drivers (such as 101.3302), the Level-Zero stack may have
troubles initializing, leading to a crash while enumerating devices.

Luckily this condition actually leads to an exception we can catch,
as implemented here in this commit.

Pull Request: https://projects.blender.org/blender/blender/pulls/141674
2025-07-10 17:36:00 +02:00
Brecht Van Lommel
4c25b49875 Refactor: Cycles: Deduplicate 3D texture sampling between devices
Pull Request: https://projects.blender.org/blender/blender/pulls/132908
2025-07-09 21:04:38 +02:00
Brecht Van Lommel
b6c4233b28 Refactor: Cycles: Remove now unused 3D image texture support
Pull Request: https://projects.blender.org/blender/blender/pulls/132908
2025-07-09 21:04:38 +02:00
Brecht Van Lommel
7978799e6f Cycles: Always render volume as NanoVDB
All GPU backends now support NanoVDB, using our own kernel side code
that is easily portable. This simplifies kernel and device code.

Volume bounds are now built from the NanoVDB grid instead of OpenVDB,
to avoid having to keep around the OpenVDB grid after loading.

While this reduces memory usage, it does have a performance impact,
particularly for the Cubic filter. That will be addressed by
another commit.

Pull Request: https://projects.blender.org/blender/blender/pulls/132908
2025-07-09 21:04:38 +02:00
Brecht Van Lommel
8cf031ba95 Fix: Wrong Cycles NanoVDB memory alignment on Windows
This was not a problem in practice so far, but will be with upcoming changes.

Pull Request: https://projects.blender.org/blender/blender/pulls/132908
2025-07-09 20:59:27 +02:00
Brecht Van Lommel
cf36acbc0c Refactor: Cycles: Replace remaining fprintf with logging
Pull Request: https://projects.blender.org/blender/blender/pulls/140244
2025-07-09 20:59:25 +02:00
Brecht Van Lommel
fb4e3c8167 Refactor: Cycles: Remove distinction between severity and verbosity
Only use LOG() and LOG_IS_ON() macros, no more VLOG_.

Pull Request: https://projects.blender.org/blender/blender/pulls/140244
2025-07-09 20:59:24 +02:00
Brecht Van Lommel
370ef854c0 Refactor: Cycles: Avoid unnecessary newlines in Metal logs
Pull Request: https://projects.blender.org/blender/blender/pulls/140244
2025-07-09 20:59:24 +02:00
Brecht Van Lommel
cf7f276d49 Refactor: Cycles: Tweak logging to prepare for dropping glog
* Implement own simple ScopedMockLog
* Always use names instead of numbers
* Avoid logging in header files

Pull Request: https://projects.blender.org/blender/blender/pulls/140244
2025-07-09 20:59:24 +02:00
Brecht Van Lommel
50a9472604 Fix: Cycles Metal issue rendering with multiple NanoVDB grids
After recent changes in b4be954856, pointer was written at the wrong offset.
2025-07-09 20:59:23 +02:00
Michael Jones
b4be954856 Cycles: Simplify Metal backend with direct bindless resource encoding
This PR is a more extensive follow on from #123551 (removal of AMD and Intel GPU support).

All supported Apple GPUs have Metal 3 and tier 2 argument buffer support. The invariant resource properties `gpuAddress` and `gpuResourceID` can be written directly into GPU structs once at setup time rather than once per dispatch. More background info can be found in [this article](https://developer.apple.com/documentation/metal/improving-cpu-performance-by-using-argument-buffers?language=objc).

Code changes:
- All code relating to `MTLArgumentEncoder` is removed
- `KernelParamsMetal` updates are directly written into `id<MTLBuffer> launch_params_buffer` which is used for the "static" dispatch arguments
- Dynamic dispatch arguments are small enough to be encoded using the `MTLComputeCommandEncoder.setBytes` function, eliminating the need for cycling temporary arg buffers

Pull Request: https://projects.blender.org/blender/blender/pulls/140671
2025-07-08 23:20:16 +02:00
Alaska
a6f92974ac Cycles: Allow Metal to print GPU Queue stats
When running Blender with `--debug-cycles` and the right
verbosity level, Cycles can output "GPU Queue Stats" to the terminal
at the end of rendering detailing how much time was spent in each kernel.

The Metal GPU backend did not support this specific way of gathering the
information. This commit fixes that by implementing support to the Metal
GPU backend.

Note: This kind of information could already be gathered for the Metal
GPU backend using the `CYCLES_METAL_PROFILING` environment variable,
and this is still the recommended way of gathering that information for
Metal. This change is just to add some consistency between platforms.

Pull Request: https://projects.blender.org/blender/blender/pulls/137592
2025-07-07 16:16:37 +02:00
Sergey Sharybin
9ace788faf Merge branch 'blender-v4.5-release' 2025-07-02 10:42:01 +02:00
Michael Jones
681eed7e4d Fix #135659: Some types of motion are incorrect at low step counts with MetalRT
Following #136253, this PR enables decomposed MetalRT motion
interpolation on macOS 15.6. The bounding box issue is fixed
in the latest macOS 15.6 beta (24G5054d).

Pull Request: https://projects.blender.org/blender/blender/pulls/141207
2025-07-02 10:41:42 +02:00
Thomas Dinges
61d51a5643 Cleanup: Make format 2025-07-01 15:14:37 +02:00
Thomas Dinges
b6b90d6835 Cleanup: make format 2025-07-01 15:12:26 +02:00
Sergey Sharybin
e01ca1fdae Merge branch 'blender-v4.5-release' 2025-07-01 14:25:40 +02:00
Michael Jones
03183c3328 Fix #135194: Deleting the last object in a scene leaves it visible in the viewport with MetalRT
TLAS wasn't being refreshed when empty.

This PR removes a spurious early-exit during BVH build that was preventing
the TLAS from being recreated when it was empty.

Pull Request: https://projects.blender.org/blender/blender/pulls/141215
2025-07-01 14:25:09 +02:00
Xavier Hallade
6e4c82e804 Merge branch 'blender-v4.5-release' 2025-06-30 16:40:46 +02:00
Xavier Hallade
7691e6520b Fix #141171: oneAPI: Rendering artifacts in barbershop scene
max_shaders was not updated when Embree was disabled.

Pull Request: https://projects.blender.org/blender/blender/pulls/141175
2025-06-30 16:39:53 +02:00
Brecht Van Lommel
78ab68431c Merge branch 'blender-v4.5-release' 2025-06-19 15:22:10 +02:00
Brecht Van Lommel
39e7c2444e Fix #139614: Cycles CUDA + Vulkan interop fails with unknown error on GTX 970
The cause of the error is unknown, but instead of failing only print an error
and continue without graphics interop.

Pull Request: https://projects.blender.org/blender/blender/pulls/140657
2025-06-19 15:21:11 +02:00
Brecht Van Lommel
975322dde5 Merge branch 'blender-v4.5-release' 2025-06-18 19:28:50 +02:00
Brecht Van Lommel
a7f9ad5af6 Fix #140527: Cycles CUDA + Vulkan animation render memory leak
Missing call to free memory for graphics interop.

Pull Request: https://projects.blender.org/blender/blender/pulls/140612
2025-06-18 19:27:44 +02:00
Brecht Van Lommel
7e93c5b387 Merge branch 'blender-v4.5-release' 2025-06-18 16:02:35 +02:00
Alaska
353789c559 Fix: Cycles distributed memory toggle could appear on unsupported configurations
The distributed memory access toggle in Cycles preferences would show up
when a user has two GPUs that can access each other's memory, but only one
of them is supported by Cycles.

For example the AMD RX 5700XT and AMD Vega 64 can access each other's
memory, but only the 5700XT is supported by Cycles.

Pull Request: https://projects.blender.org/blender/blender/pulls/140521
2025-06-18 16:02:06 +02:00
Xavier Hallade
588b9ff3cd Merge branch 'blender-v4.5-release' 2025-06-18 08:22:04 +02:00
Xavier Hallade
2df163a648 Fix: Cycles low performance with scenes with many shaders on Arc B570
The performance of the sorted_paths_array kernel on B570 is problematic.
Relying on local sorting+partitioning instead gives a 25% overall rendering
speedup and no regression in shade_surface when rendering Agent 327 Barbershop scene.
On Arc A770, it still gives a 2% speedup when rendering Barbershop.

Pull Request: https://projects.blender.org/blender/blender/pulls/140308
2025-06-18 08:21:19 +02:00
Brecht Van Lommel
b10b2d509c Merge branch 'blender-v4.5-release' 2025-06-16 18:03:22 +02:00
Brecht Van Lommel
e84fad92ea Fix #139986: Cycles crash on some scene updates, after Embree upgrade
Device::const_copy_to is sometimes called when the Embree BVH has been freed
and not replaced yet. Previously this was a simpler pointer copy, now there is
a function call. Make sure it's just a function copy.

Thanks to Nikita Sirgienko for figuring this out.

Pull Request: https://projects.blender.org/blender/blender/pulls/140457
2025-06-16 17:59:57 +02:00
Brecht Van Lommel
f7ffcfe652 Cleanup: Cycles: Use default initializers in oneAPI device
Ref #140457
2025-06-16 17:59:50 +02:00