Commit Graph

309 Commits

Author SHA1 Message Date
Sebastian Herholz
5abf42012d Cycles: Guiding cleaning up and refactoring the guiding code
In detail:
- Direct accesses of state attributes are replaced with the INTEGRATOR_STATE and INTEGRATOR_STATE_WRITE macros.
- Unified the checks for the __PATH_GUIDING define to use #  if defined (__PATH_GUIDING__).
- Even if __PATH_GUIDING__ is defined, we now check if the feature is enabled using if ((kernel_data.kernel_features & KERNEL_FEATURE_PATH_GUIDING)) {. This is important for later GPU ports.
- The kernel usage of the guiding field, surface, and volume sampling distributions is wrapped behind macros for each specific device (atm only CPU). This will make it easier for a GPU port later.
2025-05-22 13:46:30 +02:00
Nikita Sirgienko
54766b6a54 Cycles: Introducing the code for adoption of Embree 4.4
Embree 4.4 introduces an improvement in the Embree GPU
implementation by dropping shared memory usage in favor
of direct controllable memory transfers. This should allow
addressing several problems spotted in Blender regarding
multithreading and memory corruption when BVH and rendering
happen at the same time. However, to implement such
improvements, the API has changed for several functions, and
this commit adopts Blender code to these changes, making Blender
buildable and functional with all existing Embree 4.X
versions, before and after 4.4.

No functional changes in Blender behavior are expected if
using Embree versions below 4.4.

Pull Request: https://projects.blender.org/blender/blender/pulls/139061
2025-05-19 11:25:50 +02:00
Brecht Van Lommel
2c99edbffa Cycles: Bump Embree minimum version to 4.0.0
The build is already failing with Embree 3, as noticed in #137556.
And Embree 4 was released 2 years ago.

Pull Request: https://projects.blender.org/blender/blender/pulls/138221
2025-04-30 19:50:14 +02:00
Lukas Stockner
0dc4754da4 Cycles: Move OptiX OSL Camera kernel into its own PTX module
On the one hand, this improves initialization time since we don't need to
load/compile the full OSL module with all the shading logic if we're only
using a custom camera with SVM shading.

On the other hand, it also fixes a bug I noticed while preparing test scenes:
The AO and Bevel nodes don't work when using custom cameras with SVM on OptiX.

The issue there is that those two are handled by the SHADE_SURFACE_RAYTRACE
kernel, but since that one has intersection logic, we use the OptiX-specific
kernel even if OSL shading is disabled.
However, with the previous unified OSL module, this would mean loading
SHADE_SURFACE_RAYTRACE from kernel_osl.cu, which has `#undef __SVM__` and
therefore doesn't handle them correctly.

With this change, we'll use the kernels from kernel_shader_raytrace.cu in that
case, which do support SVM nodes just fine.

Disk usage of the new kernel_optix_osl_camera.ptx.zst file is 30KB, so this
also doesn't blow up the kernel disk size (and kernel_optix_osl.ptx.zst is
probably smaller by that amount now).

Since it seems that we can mix modules just fine, I'm suspecting that we could
split the modules properly (intersection, SVM shading with raytracing,
OSL shading, OSL camera), instead of the current approach where modules
essentially correspond to feature set tiers and each includes the previous
one's kernels as well - but that's a separate refactor.

Pull Request: https://projects.blender.org/blender/blender/pulls/138021
2025-04-28 12:49:35 +02:00
Campbell Barton
682e5e3597 Cleanup: spelling in comments (make check_spelling_*) 2025-04-26 00:48:04 +00:00
Lukas Stockner
bf412ed9dd Cycles: Support for custom OSL cameras
This allows users to implement arbitrary camera models using OSL by writing
shaders that take an image position as input and compute ray origin and
direction.

The obvious applications for this are e.g. panorama modes, lens distortion
models and realistic lens simulation, but the possibilities are endless.

Currently, this is only supported on devices with OSL support, so CPU and
OptiX. However, it is independent from the shading model used, so custom
cameras can be used without getting the performance hit of OSL shading.

A few samples are provided as Text Editor templates.

One notable current limitation (in addition to the limited device support)
is that inverse mapping is not supported, so Window texture coordinates and
the Vector pass will not work with custom cameras.

Pull Request: https://projects.blender.org/blender/blender/pulls/129495
2025-04-25 19:27:30 +02:00
Weizhen Huang
23c762e388 Fix: Cycles: Do not count volume bounds bounce as transparent
In forward path tracing, when we pass volume bounding meshes, we
accumulate `volume_bounds_bounce`. We should match this behaviour in NEE
instead of accumulating `transparent_bounce`.

Pull Request: https://projects.blender.org/blender/blender/pulls/137556
2025-04-24 13:10:33 +02:00
Sergey Sharybin
36559fd89f Fix #136811: HIP-RT performance regression in 4.5
Reduce the register pressure and branching in the switch() by using
subclass and cast from void* to the base class.

This ensures intersection functions are not inlined multiple times,
bringing performance back.

Alternative could be to avoid functions (they are quite large) but
that only partially resolves the performance regression.

Pull Request: https://projects.blender.org/blender/blender/pulls/136823
2025-04-01 17:59:44 +02:00
Campbell Barton
42ad772a1f Cleanup: spelling & repeated terms (make check_spelling_*)
Also use comment blocks for English text.
2025-03-27 01:13:34 +00:00
Sergey Sharybin
2ab231d802 Refactor: Pass proper KernelGlobals
HIP-RT functions do have access to kg, and it was used inconsistently:
some functions were passed actual kg, other were passed nullptr.

This change makes it consistent and passes kg everywhere.

Pull Request: https://projects.blender.org/blender/blender/pulls/136503
2025-03-26 11:07:06 +01:00
Sergey Sharybin
709371b278 Refactor: Avoid creation of local copy of RaySelfPrimitives 2025-03-26 11:07:04 +01:00
Sergey Sharybin
888c7e1df9 Cleanup: Avoid redundant data fetch 2025-03-26 11:07:04 +01:00
Sergey Sharybin
3d882acee2 Cleanup: Else after return 2025-03-26 11:07:04 +01:00
Sergey Sharybin
b2dd523d0d Cleanup: Avoid default hit initialization
The entire object is assigned later on, no need to initialize it.
2025-03-26 11:07:04 +01:00
Sergey Sharybin
323e27d825 Cleanup: Remove redundant assignment
The payload stores pointers, no need to restore pointer
of the function argument to the same value.
2025-03-26 11:07:04 +01:00
Sergey Sharybin
e92a8042c3 Refactor: Payload for shadow intersection and filter in HIP-RT
The code before this change was relying on the ShadowPayload have
the same "header" as RayPayload for some of the primitive types
(curve, motion triangle, point): intersection functions were shared
between "regular" and shadow rays (shadow in this case is shadow_all),
but extra filter function was used for shadow rays.

This is fragile if someone changes one of these structures. What is
worse is that compiler might actually decide to shuffle things in
some structs, or remove unused fields.

This change also solves confusion about ShadowPayload::prim_type
seemingly only being assigned to PRIMITIVE_NONE. With time it is
not impossible that compiler will also see this, and constant-fold
some checks, or even remove the field. If that happens then the
render result will be wrong. Maybe it is already happening as there
are some GPU and driver and optimization flag specific bugs in the
area.

It is unclear whether it was causing any actual problem: W7800
seems to render all hair correctly on Linux.
2025-03-26 11:07:04 +01:00
Sergey Sharybin
cdb3f34944 Cleanup: Use full name for the primitive_type
Makes it extra clear locally type of what the variable contains:
primitive, ray, or something else.
2025-03-26 11:07:04 +01:00
Sergey Sharybin
72542f3bb4 Cleanup: Follow Blender style and use more const
Also make some style decisions more consistent: for example,
the way how stop/continue search return value is commented.
Prefer lower vertical space for those.
2025-03-26 11:07:04 +01:00
Sergey Sharybin
bf9c95f164 Cleanup: Move payload type cast to caller in HIP-RT
Mainly readability purposes:
- Having variables called local_payload is ambiguous: does it refer to
  LocalPayload type or to a variable be local in a function?
- Some of the functions are used for different ray types, so having the
  type case in intersectFunc and filterFunc makes it easier to scan.

For the latter: now it is more obvious that Curve_Intersect_Shadow
expects RayPayload, but Curve_Filter_Shadow expects ShadowPayload.
It might not be a problem currently as ShadowPayload has the same
"header" RayPayload, but it might change in the future. Also, compiler
might optimize fields out from one but not from the other.
2025-03-26 11:07:04 +01:00
Sergey Sharybin
3daaf21bab Cleanup: Remove unused function argument in HIP-RT 2025-03-26 11:07:04 +01:00
Sergey Sharybin
5ce4e91a80 Fix #136319: Incorrect transparent bounce count with spatial splits
The transparent bounce test was too optimistic in regards to the intersection
being considered. The check needs to happen after it has been validated that
it is not duplicate.

It was already the case for Metal and HIP-RT, but not for Embree and BVH2.

Tests updated by: Alaska <Alaskayou01@gmail.com>
Pull Request: https://projects.blender.org/blender/blender/pulls/136325
2025-03-22 04:51:42 +01:00
Sergey Sharybin
50180283e9 Fix #117527: Spatial split leads to artifacts on transparent shadows
The reason for this to happen is because when spatial split is used
the same intersection could be recorded twice (via different BVH nodes).

This change introduces check for the intersection being already recoded,
similar to the check in the local BVH. The check is done during BVH
intersection which allows to properly ignore intersections even for the
maximum bounce number check. A faster approach would be to do such
filtering after sorting, but then we can not keep bounce check in the
BVH code consistent with and without spatial splits.

Intuitively it seems that it should be possible to merge the new loop
with the one that checks for which intersection to keep. But it is not
so trivial in practice: it doesn't run for all intersections, and also
it is formulated in a way that updates isect_index for the next record.

Pull Request: https://projects.blender.org/blender/blender/pulls/136251
2025-03-21 13:56:50 +01:00
Sergey Sharybin
bf65b64708 Refactor: De-duplicate local intersection reservoir sampling logic
The code which was checking whether local intersection is to be
recorded, and under which index was duplicated for triangles,
motion triangles, and HIP-RT triangle filter function.

This change moves the common logic to an utility function which
is reused from all the places mentioned above.

Pull Request: https://projects.blender.org/blender/blender/pulls/136244
2025-03-20 17:19:31 +01:00
Sergey Sharybin
7165146fb2 Cleanup: More spelling fixes in comments 2025-03-20 10:37:09 +01:00
Sergey Sharybin
ae4f6026dc Cleanup: Spelling in comments 2025-03-20 10:36:12 +01:00
Bastien Montagne
dd98cede18 Merge branch 'blender-v4.4-release' 2025-03-14 18:20:26 +01:00
Sahar A. Kashi
9ad3b74867 Fix: SSS and Motion Blur or Curves not working on HIP-RT
This change fixes the remaining failing tests with SSS when using HIP-RT.
This includes crash when SSS is used on curves, and objects with motion
blur and SSS rendering black.

The root cause for both cases was the fact that traversal was always
assuming regular BVH (built for triangles), while curves and motion
triangles are using custom primitives, which requires specialized BVH
traversal.

This change includes:

- Early output from `scene_intersect_local()` for non-triangle and
  non-motion-triangle primitives. This fixes `sss_hair.blend` test,
  and also avoids unnecessary BVH traversal when the local intersection
  is requested from curve object. The same early-output could be added
  to other BVH traversal implementation.

- Use `hiprtGeomCustomTraversalAnyHitCustomStack` for motion triangles
  primitives. This fixes motion blur on objects with SSS render black.

Fixes #135856

Co-authored-by: Sahar A. Kashi <sahar.alipourkashi@amd.com>
Co-authored-by: Sergey Sharybin <sergey@blender.org>

Pull Request: https://projects.blender.org/blender/blender/pulls/135943
2025-03-14 18:17:54 +01:00
Sergey Sharybin
977a334f6f Merge branch 'blender-v4.4-release' 2025-03-12 19:24:01 +01:00
Sergey Sharybin
a3eb0faa3f Fix: Incorrect ray time used for HIP-RT local intersections
It was always hard-coded to be 0.

It does not seem to result in any extra tests passing, but they are
probably not sophisticated enough.

Noticed while looking into details for the #135856.

Pull Request: https://projects.blender.org/blender/blender/pulls/135878
2025-03-12 19:23:38 +01:00
Xavier Hallade
90a10dcd50 Cycles: Adjust inlining attributes for oneAPI device
Now ccl_device sets inlining and ccl_device_inline forces inlining.
This matches more closely with what is currently done for cuda and metal
backends.
I've measured from 1% to 6% overall performance improvement in rendering
benchmark scenes on Arc B580, as well as a small decrease in compile
time.
2025-03-03 18:20:02 +01:00
Alaska
fb7b53143e Merge branch 'blender-v4.4-release' 2025-02-27 12:03:30 +13:00
Alaska
d840d249b3 Cycles: Re-enable HIPRT point cloud rendering
Previously point cloud rendering was disabled on the HIPRT backend due
to unexpected performance regressions introduce by it.

With the recent update to HIP SDK 6.3 and HIPRT 2.5, these performance
regressions have been resolved and so this commit re-enables
point cloud rendering on HIPRT.

Pull Request: https://projects.blender.org/blender/blender/pulls/134902
2025-02-27 00:01:35 +01:00
Lukas Stockner
8cb5e05c48 Cleanup: Cycles: Deduplicate kernel attribute code using templating
The attribute handling code in the kernel is currently highly duplicated since
it needs to handle five different data types and we couldn't use templates
back then.
We can now, so might as well make use of it and get rid of ~1000 lines.

There are also some small fixes for the GPU OSL code:
- Wrong derivative for .w component when converting float2/float3->float4
- Different conversion for float2->float (CPU averages, GPU used to take .x)
- Removed useless code for converting to float2, not used by OSL

Pull Request: https://projects.blender.org/blender/blender/pulls/134694
2025-02-20 19:28:45 +01:00
Sahar A. Kashi
6363181af9 Cycles: HIP-RT 2.5 integration and gfx12 support
This change brings the following improvements on the user level
- Support of GPUs with gfx12 architecture
- New HIP-RT library which in addition to the gfx12 support brings
  various bug-fixes.

The known limitation of gfx12 is that OpenImageDenoiser does not yet
support this GPU architecture. This means that while Cycles will use the
full advantage of the gfx12 (including hardware accelerated ray-tracing),
denoising will only be possible on CPU, or secondary gfx11 or below GPU.
This is something that requires a change in OIDN and it is to late to do
it for Blender 4.4, but it is something to look forward for Blender 4.5.

The gfx12 changes for the pre-compiled kernels is rather trivial,
so it comes together (in the same PR) as the bigger HIP-RT change.

On the development side this change brings the following improvements:
- One step compile and link (much simpler CMake rules)
- Embedding BVH binaries in hiprt dll (which makes it easier to package
  and load, without relying on special path configuration)

Co-authored-by: Sahar Kashi <sahar.kashi@amd.com>
Co-authored-by: Sergey Sharybin <sergey@blender.org>
Co-authored-by: Brecht Van Lommel <brecht@blender.org>

Pull Request: https://projects.blender.org/blender/blender/pulls/133129
2025-02-20 17:34:14 +01:00
Nikita Sirgienko
2bab4ae370 Cycles: oneAPI: Optimize texture access by using GPU HW sampler
The current usage of software-based texture operations in
the oneAPI implementation puts additional register pressure on
the GPU compiler during register allocation. And it also creates
code that requires maintenance. This commit is intended to address
this situation by utilizing a recently productized SYCL bindless
texture API to enable HW-based texture operations using
Intel GPUs' hardware sampler.

This currently translates to 1-11% rendering speedups (scene-specific)
on my Arc A770 and Arc B580. At the moment, there are small
performance regressions with NanoVDB texture operations on Arc B580
and small performance regressions in shade surface MNEE and Raytrace
kernels on Arc A770, but they look recoverable and will be handled
in the future.

Pull Request: https://projects.blender.org/blender/blender/pulls/133457
2025-02-12 21:47:34 +01:00
Nikita Sirgienko
a0b7ad436b Cleanup: Cycles: oneAPI: Switch to non-experimental work item API
There is now a non-experimental API for this_work_item functionality, so
let's use it for better code quality and also to avoid the deprecation
warning during compilation.

No functional or performance changes are expected.

Pull Request: https://projects.blender.org/blender/blender/pulls/133472
2025-02-12 21:46:22 +01:00
Patrick Mours
5810c94f95 Cycles: Add Blackwell to Cycles CUDA binaries architectures
Enables building of a Cubin for GPUs based on Blackwell architecture
if CUDA toolkit version 12.8 or higher is installed.
Only added sm_120 to the default set, since it is the one relevant for
consumer GPUs (RTX 5090 etc.) that are generally used with Blender.

Pull Request: https://projects.blender.org/blender/blender/pulls/134170
2025-02-10 14:55:28 +01:00
Brecht Van Lommel
f2bf9d747e Cleanup: Cycles: Remove some unused kernel entry points on CPU 2025-01-13 10:07:37 +01:00
Brecht Van Lommel
2bf6d0fd71 Cleanup: Cycles: Remove unnecessary SSE4.2 CPU kernel
This is the minimum requirement, so just the regular kernel already
includes these instructions if supported by the CPU architecture.
2025-01-13 10:07:37 +01:00
Brecht Van Lommel
9971648783 Refactor: Cycles: Replace new/delete by unique_ptr, in simple cases
Pull Request: https://projects.blender.org/blender/blender/pulls/132361
2025-01-03 10:23:30 +01:00
Brecht Van Lommel
a8654a1dbe Refactor: Cycles: Make CPU kernel globals storage more sane
Pull Request: https://projects.blender.org/blender/blender/pulls/132361
2025-01-03 10:23:27 +01:00
Brecht Van Lommel
57ff24cb99 Refactor: Cycles: Add const keyword to more function parameters
Pull Request: https://projects.blender.org/blender/blender/pulls/132361
2025-01-03 10:23:24 +01:00
Brecht Van Lommel
dd51c8660b Refactor: Cycles: Add const keyword where possible, using clang-tidy
Check was misc-const-correctness, combined with readability-isolate-declaration
as suggested by the docs.

Temporarily clang-format "QualifierAlignment: Left" was used to get consistency
with the prevailing order of keywords.

Pull Request: https://projects.blender.org/blender/blender/pulls/132361
2025-01-03 10:23:20 +01:00
Brecht Van Lommel
71b8ecdd84 Cleanup: Cycles: Remove workaround for slow expf in glibc < 2.16
We're on 2.28 now, and were already on 2.17 for many years before that.

Pull Request: https://projects.blender.org/blender/blender/pulls/132361
2025-01-03 10:23:03 +01:00
Brecht Van Lommel
3a57b97eba Cleanup: Cycles: Remove unneeded oneAPI double emulation for NanoVDB
Pull Request: https://projects.blender.org/blender/blender/pulls/132361
2025-01-03 10:22:59 +01:00
Brecht Van Lommel
d0c2e68e5f Refactor: Cycles: Automated clang-tidy fixups in Cycles
* Use .empty() and .data()
* Use nullptr instead of 0
* No else after return
* Simple class member initialization
* Add override for virtual methods
* Include C++ instead of C headers
* Remove some unused includes
* Use default constructors
* Always use braces
* Consistent names in definition and declaration
* Change typedef to using

Pull Request: https://projects.blender.org/blender/blender/pulls/132361
2025-01-03 10:22:55 +01:00
Brecht Van Lommel
5c46063607 Refactor: Cycles: Make kernel headers work by themselves
Shuffle around some code and add more includes so that individual
header files compile without errors.

Pull Request: https://projects.blender.org/blender/blender/pulls/132361
2025-01-03 10:22:50 +01:00
Brecht Van Lommel
3c2a6fbb9c Refactor: Cycles: Use nullptr instead of NULL
Pull Request: https://projects.blender.org/blender/blender/pulls/132361
2025-01-03 10:22:43 +01:00
Brecht Van Lommel
4453ca25b4 Fix: Cycles table precompute app build failure 2024-12-31 00:50:44 +01:00
Thomas Dinges
1be75e86aa Cleanup: replace floatX_to_floatY() with make_floatY()
Now that function overloads are usable on all GPUs, replace the former explicit functions.

Pull Request: https://projects.blender.org/blender/blender/pulls/132067
2024-12-19 09:41:55 +01:00