Commit Graph

284 Commits

Author SHA1 Message Date
Bastien Montagne
dd98cede18 Merge branch 'blender-v4.4-release' 2025-03-14 18:20:26 +01:00
Sahar A. Kashi
9ad3b74867 Fix: SSS and Motion Blur or Curves not working on HIP-RT
This change fixes the remaining failing tests with SSS when using HIP-RT.
This includes crash when SSS is used on curves, and objects with motion
blur and SSS rendering black.

The root cause for both cases was the fact that traversal was always
assuming regular BVH (built for triangles), while curves and motion
triangles are using custom primitives, which requires specialized BVH
traversal.

This change includes:

- Early output from `scene_intersect_local()` for non-triangle and
  non-motion-triangle primitives. This fixes `sss_hair.blend` test,
  and also avoids unnecessary BVH traversal when the local intersection
  is requested from curve object. The same early-output could be added
  to other BVH traversal implementation.

- Use `hiprtGeomCustomTraversalAnyHitCustomStack` for motion triangles
  primitives. This fixes motion blur on objects with SSS render black.

Fixes #135856

Co-authored-by: Sahar A. Kashi <sahar.alipourkashi@amd.com>
Co-authored-by: Sergey Sharybin <sergey@blender.org>

Pull Request: https://projects.blender.org/blender/blender/pulls/135943
2025-03-14 18:17:54 +01:00
Sergey Sharybin
977a334f6f Merge branch 'blender-v4.4-release' 2025-03-12 19:24:01 +01:00
Sergey Sharybin
a3eb0faa3f Fix: Incorrect ray time used for HIP-RT local intersections
It was always hard-coded to be 0.

It does not seem to result in any extra tests passing, but they are
probably not sophisticated enough.

Noticed while looking into details for the #135856.

Pull Request: https://projects.blender.org/blender/blender/pulls/135878
2025-03-12 19:23:38 +01:00
Xavier Hallade
90a10dcd50 Cycles: Adjust inlining attributes for oneAPI device
Now ccl_device sets inlining and ccl_device_inline forces inlining.
This matches more closely with what is currently done for cuda and metal
backends.
I've measured from 1% to 6% overall performance improvement in rendering
benchmark scenes on Arc B580, as well as a small decrease in compile
time.
2025-03-03 18:20:02 +01:00
Alaska
fb7b53143e Merge branch 'blender-v4.4-release' 2025-02-27 12:03:30 +13:00
Alaska
d840d249b3 Cycles: Re-enable HIPRT point cloud rendering
Previously point cloud rendering was disabled on the HIPRT backend due
to unexpected performance regressions introduce by it.

With the recent update to HIP SDK 6.3 and HIPRT 2.5, these performance
regressions have been resolved and so this commit re-enables
point cloud rendering on HIPRT.

Pull Request: https://projects.blender.org/blender/blender/pulls/134902
2025-02-27 00:01:35 +01:00
Lukas Stockner
8cb5e05c48 Cleanup: Cycles: Deduplicate kernel attribute code using templating
The attribute handling code in the kernel is currently highly duplicated since
it needs to handle five different data types and we couldn't use templates
back then.
We can now, so might as well make use of it and get rid of ~1000 lines.

There are also some small fixes for the GPU OSL code:
- Wrong derivative for .w component when converting float2/float3->float4
- Different conversion for float2->float (CPU averages, GPU used to take .x)
- Removed useless code for converting to float2, not used by OSL

Pull Request: https://projects.blender.org/blender/blender/pulls/134694
2025-02-20 19:28:45 +01:00
Sahar A. Kashi
6363181af9 Cycles: HIP-RT 2.5 integration and gfx12 support
This change brings the following improvements on the user level
- Support of GPUs with gfx12 architecture
- New HIP-RT library which in addition to the gfx12 support brings
  various bug-fixes.

The known limitation of gfx12 is that OpenImageDenoiser does not yet
support this GPU architecture. This means that while Cycles will use the
full advantage of the gfx12 (including hardware accelerated ray-tracing),
denoising will only be possible on CPU, or secondary gfx11 or below GPU.
This is something that requires a change in OIDN and it is to late to do
it for Blender 4.4, but it is something to look forward for Blender 4.5.

The gfx12 changes for the pre-compiled kernels is rather trivial,
so it comes together (in the same PR) as the bigger HIP-RT change.

On the development side this change brings the following improvements:
- One step compile and link (much simpler CMake rules)
- Embedding BVH binaries in hiprt dll (which makes it easier to package
  and load, without relying on special path configuration)

Co-authored-by: Sahar Kashi <sahar.kashi@amd.com>
Co-authored-by: Sergey Sharybin <sergey@blender.org>
Co-authored-by: Brecht Van Lommel <brecht@blender.org>

Pull Request: https://projects.blender.org/blender/blender/pulls/133129
2025-02-20 17:34:14 +01:00
Nikita Sirgienko
2bab4ae370 Cycles: oneAPI: Optimize texture access by using GPU HW sampler
The current usage of software-based texture operations in
the oneAPI implementation puts additional register pressure on
the GPU compiler during register allocation. And it also creates
code that requires maintenance. This commit is intended to address
this situation by utilizing a recently productized SYCL bindless
texture API to enable HW-based texture operations using
Intel GPUs' hardware sampler.

This currently translates to 1-11% rendering speedups (scene-specific)
on my Arc A770 and Arc B580. At the moment, there are small
performance regressions with NanoVDB texture operations on Arc B580
and small performance regressions in shade surface MNEE and Raytrace
kernels on Arc A770, but they look recoverable and will be handled
in the future.

Pull Request: https://projects.blender.org/blender/blender/pulls/133457
2025-02-12 21:47:34 +01:00
Nikita Sirgienko
a0b7ad436b Cleanup: Cycles: oneAPI: Switch to non-experimental work item API
There is now a non-experimental API for this_work_item functionality, so
let's use it for better code quality and also to avoid the deprecation
warning during compilation.

No functional or performance changes are expected.

Pull Request: https://projects.blender.org/blender/blender/pulls/133472
2025-02-12 21:46:22 +01:00
Patrick Mours
5810c94f95 Cycles: Add Blackwell to Cycles CUDA binaries architectures
Enables building of a Cubin for GPUs based on Blackwell architecture
if CUDA toolkit version 12.8 or higher is installed.
Only added sm_120 to the default set, since it is the one relevant for
consumer GPUs (RTX 5090 etc.) that are generally used with Blender.

Pull Request: https://projects.blender.org/blender/blender/pulls/134170
2025-02-10 14:55:28 +01:00
Brecht Van Lommel
f2bf9d747e Cleanup: Cycles: Remove some unused kernel entry points on CPU 2025-01-13 10:07:37 +01:00
Brecht Van Lommel
2bf6d0fd71 Cleanup: Cycles: Remove unnecessary SSE4.2 CPU kernel
This is the minimum requirement, so just the regular kernel already
includes these instructions if supported by the CPU architecture.
2025-01-13 10:07:37 +01:00
Brecht Van Lommel
9971648783 Refactor: Cycles: Replace new/delete by unique_ptr, in simple cases
Pull Request: https://projects.blender.org/blender/blender/pulls/132361
2025-01-03 10:23:30 +01:00
Brecht Van Lommel
a8654a1dbe Refactor: Cycles: Make CPU kernel globals storage more sane
Pull Request: https://projects.blender.org/blender/blender/pulls/132361
2025-01-03 10:23:27 +01:00
Brecht Van Lommel
57ff24cb99 Refactor: Cycles: Add const keyword to more function parameters
Pull Request: https://projects.blender.org/blender/blender/pulls/132361
2025-01-03 10:23:24 +01:00
Brecht Van Lommel
dd51c8660b Refactor: Cycles: Add const keyword where possible, using clang-tidy
Check was misc-const-correctness, combined with readability-isolate-declaration
as suggested by the docs.

Temporarily clang-format "QualifierAlignment: Left" was used to get consistency
with the prevailing order of keywords.

Pull Request: https://projects.blender.org/blender/blender/pulls/132361
2025-01-03 10:23:20 +01:00
Brecht Van Lommel
71b8ecdd84 Cleanup: Cycles: Remove workaround for slow expf in glibc < 2.16
We're on 2.28 now, and were already on 2.17 for many years before that.

Pull Request: https://projects.blender.org/blender/blender/pulls/132361
2025-01-03 10:23:03 +01:00
Brecht Van Lommel
3a57b97eba Cleanup: Cycles: Remove unneeded oneAPI double emulation for NanoVDB
Pull Request: https://projects.blender.org/blender/blender/pulls/132361
2025-01-03 10:22:59 +01:00
Brecht Van Lommel
d0c2e68e5f Refactor: Cycles: Automated clang-tidy fixups in Cycles
* Use .empty() and .data()
* Use nullptr instead of 0
* No else after return
* Simple class member initialization
* Add override for virtual methods
* Include C++ instead of C headers
* Remove some unused includes
* Use default constructors
* Always use braces
* Consistent names in definition and declaration
* Change typedef to using

Pull Request: https://projects.blender.org/blender/blender/pulls/132361
2025-01-03 10:22:55 +01:00
Brecht Van Lommel
5c46063607 Refactor: Cycles: Make kernel headers work by themselves
Shuffle around some code and add more includes so that individual
header files compile without errors.

Pull Request: https://projects.blender.org/blender/blender/pulls/132361
2025-01-03 10:22:50 +01:00
Brecht Van Lommel
3c2a6fbb9c Refactor: Cycles: Use nullptr instead of NULL
Pull Request: https://projects.blender.org/blender/blender/pulls/132361
2025-01-03 10:22:43 +01:00
Brecht Van Lommel
4453ca25b4 Fix: Cycles table precompute app build failure 2024-12-31 00:50:44 +01:00
Thomas Dinges
1be75e86aa Cleanup: replace floatX_to_floatY() with make_floatY()
Now that function overloads are usable on all GPUs, replace the former explicit functions.

Pull Request: https://projects.blender.org/blender/blender/pulls/132067
2024-12-19 09:41:55 +01:00
Thomas Dinges
22e16ca096 Cycles: add make_float4(float3 a, float b) type
This resolves a todo from the code. Part of the Quality Project.

Pull Request: https://projects.blender.org/blender/blender/pulls/131915
2024-12-17 09:11:08 +01:00
Michael Jones
8fe2e37dd0 Fix #130641: MetalRT: Motion Blur (render errors)
This PR fixes #130641. The bug was caused by a missing self-object constraint when performing SSS on motion blur scenes. scene_intersect_local tests were erroneously hitting other objects, and out of range primitive IDs were causing spurious downstream behavior.

Pull Request: https://projects.blender.org/blender/blender/pulls/131156
2024-12-03 20:24:36 +01:00
Weizhen Huang
e2d7681fe6 Cleanup: Cycles: remove unused ccl_loop_no_unroll
Was added in 6121c28501 to ensure compiling
on OpenCL, now the definition is empty on all platforms

Pull Request: https://projects.blender.org/blender/blender/pulls/131100
2024-11-28 16:37:01 +01:00
Nikita Sirgienko
2aa9203f2f Cycles: Reintroduce noinline keyword for oneAPI device
In 891d71a4d4 this keyword was
dropped due to performance regression after
fdc2962beb, but currently code
does not experience this performance degradation, and in fact
there is minor performance improvement on Lunar Lake GPUs,
along with an expected improvement in compile time.
However, this change brings a minor performance regression to
shade_surface kernel on Intel Arc and Meteor Lake GPUs, which
will be solved later by disabling this keyword for
these platforms only.

Pull Request: https://projects.blender.org/blender/blender/pulls/130299
2024-11-15 12:09:37 +01:00
Campbell Barton
1b320d5205 Merge branch 'blender-v4.3-release' 2024-10-25 08:03:11 +11:00
Michael Jones
029cd1f739 Cycles: Remove invalid use of MetalRT accept_any_intersection in scene_intersect_local
This PR fixes a latent issue arising from invalid use of `accept_any_intersection(true)` when performing SSS ray-stepping with MetalRT. The comment incorrectly states that "we can optimize and accept the first hit", but to guarantee correct behaviour in future we need to request the closest hit.
2024-10-24 10:42:59 +01:00
Xavier Hallade
b614953971 Cycles: oneAPI: fix Linux compilation with fno-honor-nans
Previously, when compiling on Rocky Linux 8 with fno-honor-nans, compile
time was more than 5x longer than expected, and there was an unresolved
symbol to __sqrtf_finite in GPU binaries.
Once defining sqrtf in compat.h, both issues are effectively gone, this
was certainly due to problematic interactions with build system's math
library headers.
So we can remove current workaround of defining fhonor-nans, and now
have the same set of flags on both Windows and Linux.
2024-10-04 17:50:24 +02:00
Nikita Sirgienko
fb21f3fb56 Cleanup: Cycles: oneAPI: Fix deprecation warnings about get_pointer() 2024-10-01 22:26:15 +02:00
Sahar A. Kashi
26ed4d3892 Cycles: Linux Support for HIP-RT
This change switches Cycles to an opensource HIP-RT library which
implements hardware ray-tracing. This library is now used on
both Windows and Linux. While there should be no noticeable changes
on Windows, on Linux this adds support for hardware ray-tracing on
AMD GPUs.

The majority of the change is typical platform code to add new
library to the dependency builder, and a change in the way how
ahead-of-time (AoT) kernels are compiled. There are changes in
Cycles itself, but they are rather straightforward: some APIs
changed in the opensource version of the library.

There are a couple of extra files which are needed for this to
work: hiprt02003_6.1_amd.hipfb and oro_compiled_kernels.hipfb.
There are some assumptions in the HIP-RT library about how they
are available. Currently they follow the same rule as AoT
kernels for oneAPI:
- On Windows they are next to blender.exe
- On Linux they are in the lib/ folder

Performance comparison on Ubuntu 22.04.5:
```
GPU: AMD Radeon PRO W7800
Driver: amdgpu-install_6.1.60103-1_all.deb
                       main         hip-rt
attic                  0.1414s      0.0932s
barbershop_interior    0.1563s      0.1258s
bistro                 0.2134s      0.1597s
bmw27                  0.0119s      0.0099s
classroom              0.1006s      0.0803s
fishy_cat              0.0248s      0.0178s
junkshop               0.0916s      0.0713s
koro                   0.0589s      0.0720s
monster                0.0435s      0.0385s
pabellon               0.0543s      0.0391s
sponza                 0.0223s      0.0180s
spring                 0.1026s      1.5145s
victor                 0.1901s      0.1239s
wdas_cloud             0.1153s      0.1125s
```

Co-authored-by: Brecht Van Lommel <brecht@blender.org>
Co-authored-by: Ray Molenkamp <github@lazydodo.com>
Co-authored-by: Sergey Sharybin <sergey@blender.org>

Pull Request: https://projects.blender.org/blender/blender/pulls/121050
2024-09-24 14:35:24 +02:00
Campbell Barton
0fc27c8d81 Cleanup: spelling in comments 2024-09-20 13:14:57 +10:00
Alaska
27680118db Fix #127464: Disable HIPRT point clouds to fix performance regression
Temporarily disable point cloud rendering in HIPRT to fix a performance
regression triggered by increased register preasure until
a better solution can be developed.

Pull Request: https://projects.blender.org/blender/blender/pulls/127738
2024-09-17 17:59:18 +02:00
Nikita Sirgienko
d300098ee5 Fix #125093: Cycles: oneAPI: transparent shadows opaque when bounces>=1024
Pull Request: https://projects.blender.org/blender/blender/pulls/127404
2024-09-16 16:37:55 +02:00
salipourto
d4597e20b6 Fix #127131: Deforming motion blurred point clouds do not render in Cycles HIP-RT when BVH timesteps != 0
The device code was disabled for primitives with deformation blur
and the intersection function always returned false, hence no
rendered primitive.

Other than that, there were a few bugs on both device and host codes
(e.g., the order of current and previous times and the primitive name.)

Pull Request: https://projects.blender.org/blender/blender/pulls/127163
2024-09-06 12:27:17 +02:00
Nikita Sirgienko
94c9898f41 Fix #124811: Cycles: oneAPI: no hair strands in viewport with Embree
oneAPI kernels preloading logic was letting un-needed kernels to be
compiled without features, which would then miss when these kernels
were needed later.

Pull Request: https://projects.blender.org/blender/blender/pulls/127114
2024-09-04 11:08:00 +02:00
Alaska
8cf4d47fe2 Fix: Improve Cycles point clouds in HIPRT
Fixes a few issues with point clouds with HIPRT.
1. Crashing when building the BLAS due to an incorrect sized array.
2. A typo leading to all point cloud intersections being skipped.
3. A typo leading to some motion blurred point clouds rendering
as if they were stationary, or not rendering at all.

Pointclouds, with deformable motion blur, with BVH time steps set to >0
still do not render. Curves seem to have the same issue.

Ref #125086

Pull Request: https://projects.blender.org/blender/blender/pulls/125834
2024-09-03 16:31:41 +02:00
Xavier Hallade
1a0dbbd242 Fix: Cannot render Victor and Spring with embree disabled on Intel GPUs
The kernel zeroing memory since we've added host memory fallback didn't
expect large inputs, so with these scenes, it was running into
"Provided range is out of integer limits. Pass
`-fno-sycl-id-queries-fit-in-int' to disable range check" error.

This kernel was used instead of memset to avoid some issues with the
free_memory queries not always being updated.
As we can't reproduce these with recent drivers, we now use memset,
which fixes rendering with BVH2.
2024-09-02 18:35:51 +02:00
Weizhen Huang
d4ceade5ea Fix: Cycles BVH2 and Embree missing some transparent shadow bounces
the code snippet is supposed to compute the maximal `isect.t` in the
array, which is used to determine if subsequent intersections should be
added.

However, the previous implementation includes the old `isect.t` which is
going to be replaced, resulting an overestimation of `tmax_hits` and
thus missing closer intersections.

For BVH2, the issue is fixed by computing the `max_t` after a new entry
is inserted.

For Embree, the issue is fixed by finding the `second_largest_t` as well, and
compare that with the new insertion to find the new `max_t`.

Pull Request: https://projects.blender.org/blender/blender/pulls/125739
2024-08-06 15:37:49 +02:00
Campbell Barton
c071030ac3 Cleanup: spelling in comments 2024-08-04 13:45:06 +10:00
Alaska
5ce29bedf6 Fix: Cycles Shadow linking with HIP-RT
Fix shadow linking not working on HIP-RT by adding code to correctly
ignore certain shadow ray hits.

Ref #125086

Pull Request: https://projects.blender.org/blender/blender/pulls/125803
2024-08-02 12:17:09 +02:00
Alaska
c8340cf754 Cycles: Remove AMD and Intel GPU support from Metal backend
This is because with the addition of new features to Cycles, these GPUs
experienced significant performance regressions and bugs, all stemming
from bugs in the Metal GPU driver/compiler. The only reasonable way to
work around these issues was to disable parts of Cycles code on
these GPUs to avoid the driver/compiler bugs.

This resulted in increased development time maintaining these platforms
while being unable to deliver feature parity with other
GPU backends.

It has been decided that this development time is better spent
maintaining platforms that are still actively maintained by
hardware/software vendors, and so AMD and Intel GPU support will be
removed from the Metal backend for Cycles.

Pull Request: https://projects.blender.org/blender/blender/pulls/123551
2024-06-26 17:16:20 +02:00
Brecht Van Lommel
d72c4f0096 Fix: Cycles build issues when disabling various kernel features 2024-06-13 19:41:19 +02:00
Lukas Stockner
f3f05f945c Cycles: Add missing make_uintX definitions for Metal 2024-06-05 03:04:04 +02:00
Michael Jones
5be30b7d2b Cycles: "Struct-of-array-of-packed-structs" for parts of the integrator state
On a M3 MacBook Pro, this change increases the benchmark score by 8% (with classroom seeing a path-tracing speedup of 15%).

The integrator state is currently store using struct-of-arrays, with one array per field. Such fine grained separation can result in poor GPU cache utilisation in cases where multiple fields of the same parent struct are accessed together. This PR changes the layout of the `ray`, `isect`, `subsurface`, and `shadow_ray` structs so that the data is interleaved (per parent struct) instead of separate. To try and keep this change localised, I encapsulated the layout change by extending the integrator state access macros, however maybe we want to do this more explicitly? (e.g. by updating every bit of code that accesses these parts of the state). Feedback welcome.

Pull Request: https://projects.blender.org/blender/blender/pulls/122015
2024-06-04 14:53:30 +02:00
Nikita Sirgienko
759bb6c768 Cycles: oneAPI: Enable host memory migration
This enables scenes with all textures not fitting in GPU
memory to finally render. For scenes that are fitting,
no functional change or performance change is expected.

Pull Request: https://projects.blender.org/blender/blender/pulls/122385
2024-05-28 19:04:19 +02:00
Michael Jones
5508b41a40 Cycles: MetalRT optimisations (scene_intersect_shadow + random_walk)
This PR contains optimisations and a general tidy-up of the MetalRT backend.

- Currently `scene_intersect` is used for both normal and (opaque) shadow rays, however the usage patterns are different enough to warrant specialisation. Shadow intersection tests (flagged with `PATH_RAY_SHADOW_OPAQUE`) only need a bool result, but need a larger "self" payload in order to exclude hits against target lights. By specialising we can minimise the payload size in each case (which is helps performance) and avoid some dynamic branching. This PR introduces a new `scene_intersect_shadow` function which is specialised in Metal, and currently redirects to `scene_intersect` in the other backends.

- Currently `scene_intersect_local` is implemented for worst-case payload requirements as demanded by `subsurface_disk` (where `max_hits` is 4). The random_walk case only demands 1 hit result which we can retrieve directly from the intersector object (rather than stashing it in the payload). By specialising, we significantly reduce the payload size for random_walk queries, which has a big impact on performance. Additionally, we only need to use a custom intersection function for the first ray test in a random walk (for self-primitive filtering), so this PR forces faster `opaque` intersection testing for all but the first random walk test.

- Currently `scene_intersect_volume` has a lot of redundant code to handle non-triangle primitives despite volumes only being enclosed by trimeshes. This PR removes this code.

Additionally, this PR tidies up the convoluted intersection function linking code, removes some redundant intersection handlers, and uses more consistent naming of intersection functions.

On a M3 MacBook Pro, these changes give 2-3% performance increase on typical scenes with opaque trimesh materials (e.g. barbershop, classroom junkshop), but can give over 15% performance increase for certain scenes using random walk SSS (e.g. monster).

Pull Request: https://projects.blender.org/blender/blender/pulls/121397
2024-05-10 16:38:02 +02:00