Commit Graph

340 Commits

Author SHA1 Message Date
Sergey Sharybin
15fd8ad7a1 Fix: Cycles linear curves on Metal-RT
Metal-RT implementation for curve intersect has an additional self
intersection check happening in curve_ribbon_accept(). It is done
for all curve types that has PRIMITIVE_CURVE_RIBBON bit set on them,
including Thick Linear curves. However, the logic in the function is
hardcoded to handle flat ribbon curves with the Catmull Rom basis.

This change makes it so curve_ribbon_accept() is only called for the
ribbon curve type, not when type has ribbon bit set.

Additionally, other places where curve type was checked as a bitmask
were fixed.

Ref #146072

Pull Request: https://projects.blender.org/blender/blender/pulls/146140
2025-09-12 14:16:09 +02:00
Patrick Mours
b4bb075285 Cycles: Flip image vertically before passing to OptiX denoiser to improve result quality
Experiments have shown that the OptiX denoiser performs best when
operating on images that have their origin at the top-left corner,
while Blender renders with the origin at the bottom-left corner.
Simply flipping the image vertically before and after denoising is a
relatively trivial operation, so this patch introduces this as an
additional preprocessing and postprocessing step for denoising when the
OptiX denoiser is used. Additionally, this patch also removes an unused
helper function, now that OptiX 8.0 is the minimum.

Pull Request: https://projects.blender.org/blender/blender/pulls/145358
2025-09-04 16:04:23 +02:00
Nikita Sirgienko
a984114d5e Cleanup: oneAPI: Fix warnings about unused variables
No performance or functional changes are expected
2025-09-03 11:01:20 +02:00
Patrick Mours
1b42975e94 Cycles: Add support for building with CUDA 13.0 and OptiX 9.0
The compiler in the CUDA 13 toolkit dropped support for Maxwell, Pascal and Volta architectures (sm_5X, sm_6X and sm_70), which affects both CUDA and OptiX kernel compilation for Cycles. This patch makes it so building CUDA kernel binaries for those architectures are skipped when CUDA 13 is used, but it will still build them if there is a CUDA 11 toolkit available (e.g. on buildbot), like how things are handled for other architectures. The OptiX PTX kernel is compiled with the minimum architecture available (compute_75 with CUDA 13, compute_50 with previous CUDA versions).

In addition, loading the PTX kernel after initializing OptiX version 9.0 would fail with a OPTIX_ERROR_INVALID_FUNCTION_USE, due to the use of "optixTrace" within direct callables (as part of the AO and bevel SVM nodes). Starting with OptiX 9.0 this is no longer allowed, rather one has to use "optixTraverse" in those cases. This patch thus changes the affected intersection routines to use "optixTraverse". As a side effect it also simplifies the `scene_intersect_shadow` function, which no longer invokes the closest hit program, and can just quickly return hit status. The minimum OptiX version Cycles requires is already 8.0, which supports "optixTraverse", so it can just be applied always.

Finally, this patch also adds the `--split-compile=0` argument to nvcc when available, which tells the compiler to internally split the module into pieces that can be processed in parallel on multiple threads (the `=0` notes to use as many threads as there are CPU cores), which can greatly improving compile times, while not making compromises on performance.

Pull Request: https://projects.blender.org/blender/blender/pulls/145130
2025-08-27 14:28:01 +02:00
Michael Jones
193e22ee7e Refactor: Cycles: Simplify Metal backend with direct bindless resource encoding
This re-applies pull request #140671, but with a fix for #144713 where the
non-pointer part of IntegratorStateGPU was not initialized.

This PR is a more extensive follow on from #123551 (removal of AMD and Intel
GPU support).

All supported Apple GPUs have Metal 3 and tier 2 argument buffer support.
The invariant resource properties `gpuAddress` and `gpuResourceID` can be
written directly into GPU structs once at setup time rather than once per
dispatch. More background info can be found in this article:
https://developer.apple.com/documentation/metal/improving-cpu-performance-by-using-argument-buffers?language=objc

Code changes:
- All code relating to `MTLArgumentEncoder` is removed
- `KernelParamsMetal` updates are directly written into
  `id<MTLBuffer> launch_params_buffer` which is used for the "static"
  dispatch arguments
- Dynamic dispatch arguments are small enough to be encoded using the
  `MTLComputeCommandEncoder.setBytes` function, eliminating the need for
  cycling temporary arg buffers

Fix #144713

Co-authored-by: Brecht Van Lommel <brecht@noreply.localhost>
Pull Request: https://projects.blender.org/blender/blender/pulls/145175
2025-08-27 13:58:30 +02:00
Brecht Van Lommel
98e9dd1aa2 Revert "Cycles: Simplify Metal backend with direct bindless resource encoding"
This reverts commit b4be954856.

It is causing render artifacts in the barbershop benchmark. There were some
conflicts to resolve when reverting this, mainly related to the removal of
3D textures.

Fix #144713
Ref #140671, #144712

Pull Request: https://projects.blender.org/blender/blender/pulls/144880
2025-08-20 20:53:40 +02:00
Weizhen Huang
a4f8e0bfa2 Cycles: Use RGBE for denoised guiding buffers to reduce memory usage
Co-authored-by: Brecht Van Lommel <brecht@blender.org>
2025-08-13 10:28:50 +02:00
Weizhen Huang
5cb6014efd Cycles: Volume Scattering Probability Guiding
Guide the probability to scatter in or transmit through the volume.
Only applied for primary rays.

Co-authored-by: Brecht Van Lommel <brecht@blender.org>
2025-08-13 10:28:50 +02:00
Weizhen Huang
8c36f9ce49 Cycles: Compute volume transmittance using telescoping 2025-08-13 10:28:50 +02:00
Weizhen Huang
b2b2d9a4f3 Cycles: Render volume by ray marching through octrees
One octree per volume per shader based on the density. In preparation
for the null scattering
2025-08-13 10:28:50 +02:00
Brecht Van Lommel
dce6269d1f Fix #143714: Cycles OptiX fails to render linear and ribbon curves together
This case was not accounted for previously, but is now possible when
the new curves object has curves with type poly.

Pull Request: https://projects.blender.org/blender/blender/pulls/144087
2025-08-11 19:36:26 +02:00
Weizhen Huang
1667d69d3b Cleanup: Cycles: use constexpr in kernel
instead of lambda and macro guard. Should be possible after ce0ae95ed3

Pull Request: https://projects.blender.org/blender/blender/pulls/143723
2025-08-01 14:06:13 +02:00
Hugh Delaney
930a942dd0 Refactor: Cycles: Move block sizes into common header
This change puts all the block size macros in the same common header, so
they can be included in host side code without needing to also include
the kernels that are defined in the device headers that contained these
values.

This change also removes a magic number used to enqueue a kernel, which
happened to agree with the GPU_PARALLEL_SORT_BLOCK_SIZE macro.

Pull Request: https://projects.blender.org/blender/blender/pulls/143646
2025-08-01 13:26:02 +02:00
Patrick Mours
6487395fa5 Cycles: Add linear curve shape
Add new "Linear 3D Curves" option in the Curves panel in the render
properties. This renders curves as linear segments rather than smooth
curves, for faster render time at the cost of accuracy.

On NVIDIA Blackwell GPUs, this can give a 6x speedup compared to smooth
curves, due to hardware acceleration. On NVIDIA Ada there is still
a 3x speedup, and CPU and other GPU backends will also render this
faster.

A difference with smooth curves is that these have end caps, as this
was simpler to implement and they are usually helpful anyway.

In the future this functionality will also be used to properly support
the CURVE_TYPE_POLY on the new curves object.

Pull Request: https://projects.blender.org/blender/blender/pulls/139735
2025-07-29 17:05:01 +02:00
Michael Jones
f3485cc925 Cycles: MetalRT: Only use extended limits if needed (revisited)
Currently MetalRT renders always use extended limits, which is needed to correctly render scenes where the max primitive count can exceed 2^28 or the instance count can exceed 2^24. This patch adopts Metal best practices of only enabling this flag if it is needed.

This PR is similar to #133364, but there are some notable differences:

1) The old PR made an overly optimistic assumption that all the relevant visibility bits could be squeezed into 8 bits. This new PR adopts the same approach that Optix takes of using 8 bits as a primary HW filter, and checking the full 32 bit mask inside the SW intersection handler.

~~2) I moved the scene scanning check from Scene into MetalDevice. This avoids platform specific details leaking into platform agnostic areas.~~

~~3) In live viewport mode, we always use extended limits in case we tip over the threshold.~~

_EDIT:_
2) The limits are scanned in `Scene::update_kernel_features`, and given to the device by a new `set_bvh_limits` method which returns true if the BVH and kernels need to be reloaded.

Pull Request: https://projects.blender.org/blender/blender/pulls/142401
2025-07-24 13:27:20 +02:00
Thomas Dinges
ce0ae95ed3 Cycles: Bump minimum supported CUDA architecture to sm_50
Pull Request: https://projects.blender.org/blender/blender/pulls/142212
2025-07-21 19:49:21 +02:00
Nikita Sirgienko
9875836519 Cycles: oneAPI: Compile only needed device binaries in multi-GPU case
The code of the "oneapi_load_kernels" function before this modification
was loading kernels and compiling them, if needed, for all devices in
the associated GPU context. This makes sense for one GPU execution
scenario, as well as for execution scenario of multi identical GPU,
but in cases where Blender users have several different GPUs in
render, the previous implementation would compile all kernels
for all devices for each device, unnecessarily doing the same
work multiple times. Because of this, I am changing the
implementation so that now compilation happens only for the used
device per used device, ensuring that no unnecessary work is done.

No render performance changes are expected.
2025-07-19 14:15:36 +02:00
Michael Jones
8077384e3a Cycles: Improve Metal kernel specialisation
This improves the existing scene specialisation mechanism by replacing "kernel_data.kernel_features" with a function constant. It doesn't cause any additional compilation requests, but allows the backend compiler to eliminate more dead code. An additional compiler hint is provided for dead-stripping "volume_stack_enter_exit" which results in slightly faster rendering of non-volumetric scenes.

Pull Request: https://projects.blender.org/blender/blender/pulls/142235
2025-07-18 11:18:43 +02:00
Brecht Van Lommel
4c25b49875 Refactor: Cycles: Deduplicate 3D texture sampling between devices
Pull Request: https://projects.blender.org/blender/blender/pulls/132908
2025-07-09 21:04:38 +02:00
Brecht Van Lommel
b6c4233b28 Refactor: Cycles: Remove now unused 3D image texture support
Pull Request: https://projects.blender.org/blender/blender/pulls/132908
2025-07-09 21:04:38 +02:00
Brecht Van Lommel
7978799e6f Cycles: Always render volume as NanoVDB
All GPU backends now support NanoVDB, using our own kernel side code
that is easily portable. This simplifies kernel and device code.

Volume bounds are now built from the NanoVDB grid instead of OpenVDB,
to avoid having to keep around the OpenVDB grid after loading.

While this reduces memory usage, it does have a performance impact,
particularly for the Cubic filter. That will be addressed by
another commit.

Pull Request: https://projects.blender.org/blender/blender/pulls/132908
2025-07-09 21:04:38 +02:00
Michael Jones
b4be954856 Cycles: Simplify Metal backend with direct bindless resource encoding
This PR is a more extensive follow on from #123551 (removal of AMD and Intel GPU support).

All supported Apple GPUs have Metal 3 and tier 2 argument buffer support. The invariant resource properties `gpuAddress` and `gpuResourceID` can be written directly into GPU structs once at setup time rather than once per dispatch. More background info can be found in [this article](https://developer.apple.com/documentation/metal/improving-cpu-performance-by-using-argument-buffers?language=objc).

Code changes:
- All code relating to `MTLArgumentEncoder` is removed
- `KernelParamsMetal` updates are directly written into `id<MTLBuffer> launch_params_buffer` which is used for the "static" dispatch arguments
- Dynamic dispatch arguments are small enough to be encoded using the `MTLComputeCommandEncoder.setBytes` function, eliminating the need for cycling temporary arg buffers

Pull Request: https://projects.blender.org/blender/blender/pulls/140671
2025-07-08 23:20:16 +02:00
Weizhen Huang
2f7797dd4d Merge branch 'blender-v4.5-release' 2025-06-20 14:20:00 +02:00
weizhen
bf9836da65 Fix: Cycles not building with OptiX 9.0
As suggested by @pmoursnv

Was throwing errors like  `identifier "half" is undefined`.

Pull Request: https://projects.blender.org/blender/blender/pulls/140676
2025-06-20 14:19:43 +02:00
Brecht Van Lommel
7f380e0644 Revert "Fix: Cycles: Do not count volume bounds bounce as transparent"
This reverts commit 23c762e388 in the
blender-v4.5-release branch to work around HIP compiler issues. It will
remain in the main branch.

Ref blender/blender#139836
2025-06-11 15:47:07 +02:00
Brecht Van Lommel
04e325029f Revert "Cycles: Guiding cleaning up and refactoring the guiding code"
This reverts commit 5abf42012d in the
blender-v4.5-release branch to work around HIP compiler issues. It will
remain in the main branch.

Ref blender/blender#139836
2025-06-11 15:47:06 +02:00
Brecht Van Lommel
501b4641f6 Revert "Cleanup: Unused arguments in Cycles kernel"
This reverts commit 0e7a696819 in the
blender-v4.5-release branch to work around HIP compiler issues. It will
remain in the main branch.

Ref blender/blender#139836
2025-06-11 15:47:06 +02:00
Campbell Barton
07121d44ae Cleanup: use braces (follow own style guide) 2025-06-11 09:05:26 +00:00
Lukas Stockner
39d7576844 Cycles: Switch OptiX OSL to use LLVM bitcode for shadeops
This is required to make ray differentials work correctly for OSL custom
cameras.

But it also lets us simplify the implementation, and makes the OSL
functionality more complete, such as implementing all noise types.

Pull Request: https://projects.blender.org/blender/blender/pulls/138161
2025-06-03 20:12:07 +02:00
Nikita Sirgienko
69091c5028 Cycles: Show device optimizations status in preferences for oneAPI
With these changes, we can now mark devices which are expected to work as
performant as possible, and devices which were not optimized for some reason.

For example, because the device was released after the Blender release,
making it impossible for developers to optimize for devices in already
released unchangeable code. This is primarily relevant for the LTS versions,
which are supported for two years and require proper communication about
optimization status for the new devices released during this time.

This is implemented for oneAPI devices. Other device types currently are
marked as optimized for compatibility with old behavior, but may implement
the same in the future.

Pull Request: https://projects.blender.org/blender/blender/pulls/139751
2025-06-03 20:07:52 +02:00
Brecht Van Lommel
0e7a696819 Cleanup: Unused arguments in Cycles kernel
And add back the compiler flag that hid them.

Pull Request: https://projects.blender.org/blender/blender/pulls/139497
2025-05-27 21:30:45 +02:00
Sebastian Herholz
5abf42012d Cycles: Guiding cleaning up and refactoring the guiding code
In detail:
- Direct accesses of state attributes are replaced with the INTEGRATOR_STATE and INTEGRATOR_STATE_WRITE macros.
- Unified the checks for the __PATH_GUIDING define to use #  if defined (__PATH_GUIDING__).
- Even if __PATH_GUIDING__ is defined, we now check if the feature is enabled using if ((kernel_data.kernel_features & KERNEL_FEATURE_PATH_GUIDING)) {. This is important for later GPU ports.
- The kernel usage of the guiding field, surface, and volume sampling distributions is wrapped behind macros for each specific device (atm only CPU). This will make it easier for a GPU port later.
2025-05-22 13:46:30 +02:00
Nikita Sirgienko
54766b6a54 Cycles: Introducing the code for adoption of Embree 4.4
Embree 4.4 introduces an improvement in the Embree GPU
implementation by dropping shared memory usage in favor
of direct controllable memory transfers. This should allow
addressing several problems spotted in Blender regarding
multithreading and memory corruption when BVH and rendering
happen at the same time. However, to implement such
improvements, the API has changed for several functions, and
this commit adopts Blender code to these changes, making Blender
buildable and functional with all existing Embree 4.X
versions, before and after 4.4.

No functional changes in Blender behavior are expected if
using Embree versions below 4.4.

Pull Request: https://projects.blender.org/blender/blender/pulls/139061
2025-05-19 11:25:50 +02:00
Brecht Van Lommel
2c99edbffa Cycles: Bump Embree minimum version to 4.0.0
The build is already failing with Embree 3, as noticed in #137556.
And Embree 4 was released 2 years ago.

Pull Request: https://projects.blender.org/blender/blender/pulls/138221
2025-04-30 19:50:14 +02:00
Lukas Stockner
0dc4754da4 Cycles: Move OptiX OSL Camera kernel into its own PTX module
On the one hand, this improves initialization time since we don't need to
load/compile the full OSL module with all the shading logic if we're only
using a custom camera with SVM shading.

On the other hand, it also fixes a bug I noticed while preparing test scenes:
The AO and Bevel nodes don't work when using custom cameras with SVM on OptiX.

The issue there is that those two are handled by the SHADE_SURFACE_RAYTRACE
kernel, but since that one has intersection logic, we use the OptiX-specific
kernel even if OSL shading is disabled.
However, with the previous unified OSL module, this would mean loading
SHADE_SURFACE_RAYTRACE from kernel_osl.cu, which has `#undef __SVM__` and
therefore doesn't handle them correctly.

With this change, we'll use the kernels from kernel_shader_raytrace.cu in that
case, which do support SVM nodes just fine.

Disk usage of the new kernel_optix_osl_camera.ptx.zst file is 30KB, so this
also doesn't blow up the kernel disk size (and kernel_optix_osl.ptx.zst is
probably smaller by that amount now).

Since it seems that we can mix modules just fine, I'm suspecting that we could
split the modules properly (intersection, SVM shading with raytracing,
OSL shading, OSL camera), instead of the current approach where modules
essentially correspond to feature set tiers and each includes the previous
one's kernels as well - but that's a separate refactor.

Pull Request: https://projects.blender.org/blender/blender/pulls/138021
2025-04-28 12:49:35 +02:00
Campbell Barton
682e5e3597 Cleanup: spelling in comments (make check_spelling_*) 2025-04-26 00:48:04 +00:00
Lukas Stockner
bf412ed9dd Cycles: Support for custom OSL cameras
This allows users to implement arbitrary camera models using OSL by writing
shaders that take an image position as input and compute ray origin and
direction.

The obvious applications for this are e.g. panorama modes, lens distortion
models and realistic lens simulation, but the possibilities are endless.

Currently, this is only supported on devices with OSL support, so CPU and
OptiX. However, it is independent from the shading model used, so custom
cameras can be used without getting the performance hit of OSL shading.

A few samples are provided as Text Editor templates.

One notable current limitation (in addition to the limited device support)
is that inverse mapping is not supported, so Window texture coordinates and
the Vector pass will not work with custom cameras.

Pull Request: https://projects.blender.org/blender/blender/pulls/129495
2025-04-25 19:27:30 +02:00
Weizhen Huang
23c762e388 Fix: Cycles: Do not count volume bounds bounce as transparent
In forward path tracing, when we pass volume bounding meshes, we
accumulate `volume_bounds_bounce`. We should match this behaviour in NEE
instead of accumulating `transparent_bounce`.

Pull Request: https://projects.blender.org/blender/blender/pulls/137556
2025-04-24 13:10:33 +02:00
Sergey Sharybin
36559fd89f Fix #136811: HIP-RT performance regression in 4.5
Reduce the register pressure and branching in the switch() by using
subclass and cast from void* to the base class.

This ensures intersection functions are not inlined multiple times,
bringing performance back.

Alternative could be to avoid functions (they are quite large) but
that only partially resolves the performance regression.

Pull Request: https://projects.blender.org/blender/blender/pulls/136823
2025-04-01 17:59:44 +02:00
Campbell Barton
42ad772a1f Cleanup: spelling & repeated terms (make check_spelling_*)
Also use comment blocks for English text.
2025-03-27 01:13:34 +00:00
Sergey Sharybin
2ab231d802 Refactor: Pass proper KernelGlobals
HIP-RT functions do have access to kg, and it was used inconsistently:
some functions were passed actual kg, other were passed nullptr.

This change makes it consistent and passes kg everywhere.

Pull Request: https://projects.blender.org/blender/blender/pulls/136503
2025-03-26 11:07:06 +01:00
Sergey Sharybin
709371b278 Refactor: Avoid creation of local copy of RaySelfPrimitives 2025-03-26 11:07:04 +01:00
Sergey Sharybin
888c7e1df9 Cleanup: Avoid redundant data fetch 2025-03-26 11:07:04 +01:00
Sergey Sharybin
3d882acee2 Cleanup: Else after return 2025-03-26 11:07:04 +01:00
Sergey Sharybin
b2dd523d0d Cleanup: Avoid default hit initialization
The entire object is assigned later on, no need to initialize it.
2025-03-26 11:07:04 +01:00
Sergey Sharybin
323e27d825 Cleanup: Remove redundant assignment
The payload stores pointers, no need to restore pointer
of the function argument to the same value.
2025-03-26 11:07:04 +01:00
Sergey Sharybin
e92a8042c3 Refactor: Payload for shadow intersection and filter in HIP-RT
The code before this change was relying on the ShadowPayload have
the same "header" as RayPayload for some of the primitive types
(curve, motion triangle, point): intersection functions were shared
between "regular" and shadow rays (shadow in this case is shadow_all),
but extra filter function was used for shadow rays.

This is fragile if someone changes one of these structures. What is
worse is that compiler might actually decide to shuffle things in
some structs, or remove unused fields.

This change also solves confusion about ShadowPayload::prim_type
seemingly only being assigned to PRIMITIVE_NONE. With time it is
not impossible that compiler will also see this, and constant-fold
some checks, or even remove the field. If that happens then the
render result will be wrong. Maybe it is already happening as there
are some GPU and driver and optimization flag specific bugs in the
area.

It is unclear whether it was causing any actual problem: W7800
seems to render all hair correctly on Linux.
2025-03-26 11:07:04 +01:00
Sergey Sharybin
cdb3f34944 Cleanup: Use full name for the primitive_type
Makes it extra clear locally type of what the variable contains:
primitive, ray, or something else.
2025-03-26 11:07:04 +01:00
Sergey Sharybin
72542f3bb4 Cleanup: Follow Blender style and use more const
Also make some style decisions more consistent: for example,
the way how stop/continue search return value is commented.
Prefer lower vertical space for those.
2025-03-26 11:07:04 +01:00
Sergey Sharybin
bf9c95f164 Cleanup: Move payload type cast to caller in HIP-RT
Mainly readability purposes:
- Having variables called local_payload is ambiguous: does it refer to
  LocalPayload type or to a variable be local in a function?
- Some of the functions are used for different ray types, so having the
  type case in intersectFunc and filterFunc makes it easier to scan.

For the latter: now it is more obvious that Curve_Intersect_Shadow
expects RayPayload, but Curve_Filter_Shadow expects ShadowPayload.
It might not be a problem currently as ShadowPayload has the same
"header" RayPayload, but it might change in the future. Also, compiler
might optimize fields out from one but not from the other.
2025-03-26 11:07:04 +01:00