Commit Graph

3938 Commits

Author SHA1 Message Date
Sergey Sharybin
15fd8ad7a1 Fix: Cycles linear curves on Metal-RT
Metal-RT implementation for curve intersect has an additional self
intersection check happening in curve_ribbon_accept(). It is done
for all curve types that has PRIMITIVE_CURVE_RIBBON bit set on them,
including Thick Linear curves. However, the logic in the function is
hardcoded to handle flat ribbon curves with the Catmull Rom basis.

This change makes it so curve_ribbon_accept() is only called for the
ribbon curve type, not when type has ribbon bit set.

Additionally, other places where curve type was checked as a bitmask
were fixed.

Ref #146072

Pull Request: https://projects.blender.org/blender/blender/pulls/146140
2025-09-12 14:16:09 +02:00
Campbell Barton
3c7f4edd92 Cleanup: spelling in comments & string
Also back-tick quote literals in CMakeLists files.
2025-09-06 09:27:54 +10:00
Amogh Shivaram
11d98c14b7 Fix #144258: Cycles: Subsurface scattering doesn't work with shadow linking
When shadow linking is enabled, `intersect_dedicated_light` is scheduled even
if the `PATH_RAY_SUBSURFACE` flag is set. This checks the flag and schedules
`intersect_subsurface` instead.

Pull Request: https://projects.blender.org/blender/blender/pulls/145621
2025-09-05 15:31:50 +02:00
Brecht Van Lommel
9856615813 Color Management: Change byte color attributes to always be sRGB
These don't really work as scene linear with sRGB transfer function for e.g.
ACEScg, there are not enough bits. If you want wide gamut you need to use
float colors.

Pull Request: https://projects.blender.org/blender/blender/pulls/145763
2025-09-05 11:11:33 +02:00
Patrick Mours
b4bb075285 Cycles: Flip image vertically before passing to OptiX denoiser to improve result quality
Experiments have shown that the OptiX denoiser performs best when
operating on images that have their origin at the top-left corner,
while Blender renders with the origin at the bottom-left corner.
Simply flipping the image vertically before and after denoising is a
relatively trivial operation, so this patch introduces this as an
additional preprocessing and postprocessing step for denoising when the
OptiX denoiser is used. Additionally, this patch also removes an unused
helper function, now that OptiX 8.0 is the minimum.

Pull Request: https://projects.blender.org/blender/blender/pulls/145358
2025-09-04 16:04:23 +02:00
Nikita Sirgienko
a984114d5e Cleanup: oneAPI: Fix warnings about unused variables
No performance or functional changes are expected
2025-09-03 11:01:20 +02:00
Campbell Barton
b2abb81b65 Cleanup: repeated word 2025-09-03 17:53:27 +10:00
Brecht Van Lommel
f49b3dabf1 Fix #144786: Cycles curve thickness missing transform in viewport
This should be in world space, like point radius and most other shader nodes.

Pull Request: https://projects.blender.org/blender/blender/pulls/144802
2025-09-02 19:01:50 +02:00
Brecht Van Lommel
c693e72841 Fix #145254: Cycles normal map node strength wrong results with displacement
Can not use tangent space interpolation in this case, as the tangent space
is for the undisplaced normal.

Pull Request: https://projects.blender.org/blender/blender/pulls/145273
2025-09-02 17:50:41 +02:00
Sergey Sharybin
03003365cc Cycles: Switch to HIP SDK 6.4.2 on Windows
This also reverts 367d5b7eabd53229fb7e79465b4761e65e531741,
as the math flags workaround is no longer needed.

Fix #139796
Fix #138646
Fix #139071
Fix #139070

Ref #140278

Co-authored-by: Brecht Van Lommel <brecht@blender.org>
Pull Request: https://projects.blender.org/blender/blender/pulls/145311
2025-08-29 12:49:11 +02:00
Weizhen Huang
f77881a795 Fix #144918: World volume with zero density not rendering correctly
OneAPI has some problem with `exp(-0 * FLT_MAX)`.

Pull Request: https://projects.blender.org/blender/blender/pulls/144979
2025-08-28 10:41:33 +02:00
Patrick Mours
1b42975e94 Cycles: Add support for building with CUDA 13.0 and OptiX 9.0
The compiler in the CUDA 13 toolkit dropped support for Maxwell, Pascal and Volta architectures (sm_5X, sm_6X and sm_70), which affects both CUDA and OptiX kernel compilation for Cycles. This patch makes it so building CUDA kernel binaries for those architectures are skipped when CUDA 13 is used, but it will still build them if there is a CUDA 11 toolkit available (e.g. on buildbot), like how things are handled for other architectures. The OptiX PTX kernel is compiled with the minimum architecture available (compute_75 with CUDA 13, compute_50 with previous CUDA versions).

In addition, loading the PTX kernel after initializing OptiX version 9.0 would fail with a OPTIX_ERROR_INVALID_FUNCTION_USE, due to the use of "optixTrace" within direct callables (as part of the AO and bevel SVM nodes). Starting with OptiX 9.0 this is no longer allowed, rather one has to use "optixTraverse" in those cases. This patch thus changes the affected intersection routines to use "optixTraverse". As a side effect it also simplifies the `scene_intersect_shadow` function, which no longer invokes the closest hit program, and can just quickly return hit status. The minimum OptiX version Cycles requires is already 8.0, which supports "optixTraverse", so it can just be applied always.

Finally, this patch also adds the `--split-compile=0` argument to nvcc when available, which tells the compiler to internally split the module into pieces that can be processed in parallel on multiple threads (the `=0` notes to use as many threads as there are CPU cores), which can greatly improving compile times, while not making compromises on performance.

Pull Request: https://projects.blender.org/blender/blender/pulls/145130
2025-08-27 14:28:01 +02:00
Michael Jones
193e22ee7e Refactor: Cycles: Simplify Metal backend with direct bindless resource encoding
This re-applies pull request #140671, but with a fix for #144713 where the
non-pointer part of IntegratorStateGPU was not initialized.

This PR is a more extensive follow on from #123551 (removal of AMD and Intel
GPU support).

All supported Apple GPUs have Metal 3 and tier 2 argument buffer support.
The invariant resource properties `gpuAddress` and `gpuResourceID` can be
written directly into GPU structs once at setup time rather than once per
dispatch. More background info can be found in this article:
https://developer.apple.com/documentation/metal/improving-cpu-performance-by-using-argument-buffers?language=objc

Code changes:
- All code relating to `MTLArgumentEncoder` is removed
- `KernelParamsMetal` updates are directly written into
  `id<MTLBuffer> launch_params_buffer` which is used for the "static"
  dispatch arguments
- Dynamic dispatch arguments are small enough to be encoded using the
  `MTLComputeCommandEncoder.setBytes` function, eliminating the need for
  cycling temporary arg buffers

Fix #144713

Co-authored-by: Brecht Van Lommel <brecht@noreply.localhost>
Pull Request: https://projects.blender.org/blender/blender/pulls/145175
2025-08-27 13:58:30 +02:00
Lukas Tönne
12f0bc7736 Fix #138388: Use grid voxel corners as value locations like OpenVDB
Blender grid rendering interprets voxel transforms in such a way that the voxel
values are located at the center of a voxel. This is inconsistent with OpenVDB
where the values are located at the lower corners for the purpose or sampling
and related algorithms.

While it is possible to offset grids when communicating with the OpenVDB
library, this is also error-prone and does not add any major advantage.
Every time a grid is passed to OpenVDB we currently have to take care to
transform by half a voxel to ensure correct sampling weights are used that match
the density displayed by the viewport rendering.

This patch changes volume grid generation, conversion, and rendering code so
that grid transforms match the corner-located values in OpenVDB.

- The volume primitive cube node aligns the grid transform with the location of
  the first value, which is now also the same as min/max bounds input of the
  node.
- Mesh<->Grid conversion does no longer require offsetting grid transform and
  mesh vertices respectively by 0.5 voxels.
- Texture space for viewport rendering is offset by half a voxel, so that it
  covers the same area as before and voxel centers remain at the same texture
  space locations.

Co-authored-by: Brecht Van Lommel <brecht@blender.org>
Pull Request: https://projects.blender.org/blender/blender/pulls/138449
2025-08-26 12:27:20 +02:00
Brecht Van Lommel
1d9bd460fc Fix #144814: Cycles OSL crash accessing geom:name string attribute
This should be a ustring hash now, not a ustring.

Pull Request: https://projects.blender.org/blender/blender/pulls/144881
2025-08-20 21:00:12 +02:00
Brecht Van Lommel
98e9dd1aa2 Revert "Cycles: Simplify Metal backend with direct bindless resource encoding"
This reverts commit b4be954856.

It is causing render artifacts in the barbershop benchmark. There were some
conflicts to resolve when reverting this, mainly related to the removal of
3D textures.

Fix #144713
Ref #140671, #144712

Pull Request: https://projects.blender.org/blender/blender/pulls/144880
2025-08-20 20:53:40 +02:00
Brecht Van Lommel
f41a0d5ab9 Fix: Cycles OptiX + OSL fails to render images with OSL releases
It works with the beta we are using to build Blender 4.5, but the official
release is a bit different. This fix was tested to work with OSL 1.14.7.

Thanks to Paul Zander for finding the OSL commit that lead to this.

Pull Request: https://projects.blender.org/blender/blender/pulls/144715
2025-08-19 13:22:07 +02:00
Weizhen Huang
df496eb894 Cycles: use one-tap stochastic interpolation for volume
It has ~1.2x speed-up on CPU and ~1.5x speed-up on GPU (tested on Metal
M2 Ultra).

Individual samples are noisier, but equal time renders are mostly
better.

Note that volume emission renders differently than before.

Pull Request: https://projects.blender.org/blender/blender/pulls/144451
2025-08-14 15:22:44 +02:00
Weizhen Huang
0c371ca3c5 Cycles: use deterministic linear interpolation for velocity
Cubic is too costly, stochastic interpolation is inaccurate.
2025-08-14 15:22:43 +02:00
Weizhen Huang
6eb7075fa1 Fix: Cycles: lcg_state uninitialized before volume density baking
_No response_

Pull Request: https://projects.blender.org/blender/blender/pulls/144489
2025-08-13 16:10:06 +02:00
Weizhen Huang
d717c78ca4 Revert "Cycles: Store octree parent nodes in a stack"
This reverts commit bccad10b3be75deb0825b9234087e613678af407.
The stack approach seems slower

Pull Request: https://projects.blender.org/blender/blender/pulls/134460
2025-08-13 10:28:53 +02:00
Weizhen Huang
146ac0d9fe Cycles: Store octree parent nodes in a stack 2025-08-13 10:28:50 +02:00
Weizhen Huang
ed48905b41 Cycles: Use analytic formula for homogeneous volume 2025-08-13 10:28:50 +02:00
Weizhen Huang
a4f8e0bfa2 Cycles: Use RGBE for denoised guiding buffers to reduce memory usage
Co-authored-by: Brecht Van Lommel <brecht@blender.org>
2025-08-13 10:28:50 +02:00
Weizhen Huang
5cb6014efd Cycles: Volume Scattering Probability Guiding
Guide the probability to scatter in or transmit through the volume.
Only applied for primary rays.

Co-authored-by: Brecht Van Lommel <brecht@blender.org>
2025-08-13 10:28:50 +02:00
Weizhen Huang
a7283fc1d5 Cycles: Shade volume with null scattering
The distance sampling is mostly based on weighted delta tracking from
[Monte Carlo Methods for Volumetric Light Transport Simulation]
(http://iliyan.com/publications/VolumeSTAR/VolumeSTAR_EG2018.pdf).

The recursive Monte Carlo estimation of the Radiative Transfer Equation is
\[\langle L \rangle=\frac{\bar T(x\rightarrow y)}{\bar p(x\rightarrow
y)}(L_e+\sigma_s L_s + \sigma_n L).\]
where \(\bar T(x\rightarrow y) = e^{-\bar\sigma\Vert x-y\Vert}\) is the
majorant transmittance between points \(x\) and \(y\), \(p(x\rightarrow
y) = \bar\sigma e^{-\bar\sigma\Vert x-y\Vert}\) is the probability of
sampling point \(y\) from point \(x\) following exponential
distribution.

At each recursive step, we randomly pick one of the two events
proportional to their weights:
* If \(\xi < \frac{\sigma_s}{\sigma_s+\vert\sigma_n\vert}\), we sample
scatter event and evaluate \(L_s\).
* Otherwise, no real collision happens and we continue the recursive
process.

The emission \(L_e\) is evaluated at each step.

This also removes some unused volume settings from the UI:

* "Max Steps" is removed, because the step size is automatically specified
by the volume octree. There is a hard-coded threshold `VOLUME_MAX_STEPS`
to prevent numerical issues.
* "Homogeneous" is automatically detected during density evaluation

An option "Unbiased" is added to the UI. When enabled, densities above
the majorant are clamped.
2025-08-13 10:28:50 +02:00
Weizhen Huang
8c36f9ce49 Cycles: Compute volume transmittance using telescoping 2025-08-13 10:28:50 +02:00
Weizhen Huang
b2b2d9a4f3 Cycles: Render volume by ray marching through octrees
One octree per volume per shader based on the density. In preparation
for the null scattering
2025-08-13 10:28:50 +02:00
Weizhen Huang
872528814e Cycles: do not sample direct light when ray segment is invalid
Since we sample the same light for distance sampling and equiangular
sampling, the sample is invalid anyway, so just avoid sampling direct
light for distance sampling too.
2025-08-13 10:28:50 +02:00
Brecht Van Lommel
dce6269d1f Fix #143714: Cycles OptiX fails to render linear and ribbon curves together
This case was not accounted for previously, but is now possible when
the new curves object has curves with type poly.

Pull Request: https://projects.blender.org/blender/blender/pulls/144087
2025-08-11 19:36:26 +02:00
Brecht Van Lommel
2193096106 Cycles: Change normal map node to work with undisplaced normal and tangent
This fits better with the way normal and displacement maps are typically
combined. Previously there was a mixing of displaced normal and undisplaced
tangent, which was broken behavior.

Additionally, to undisplaced_N and undisplaced_tangent attributes must now
always be used to get undisplaced coordinates. The regular N and tangent
attributes now always include displacement.

Ref #142022

Pull Request: https://projects.blender.org/blender/blender/pulls/143109
2025-08-11 12:08:12 +02:00
Campbell Barton
cccc2c77c5 Cleanup: consistent for C-style comment blocks 2025-08-08 07:37:33 +10:00
Campbell Barton
e8501d2f54 Cleanup: grammar corrections, minor improvements to wording
Also back-tick quote some code references in comments
to differentiate them from English text.
2025-08-06 00:20:39 +00:00
Lukas Stockner
793040ad1c Cycles: Improve parameter packing for the Principled BSDF
The Principled BSDF has a ton of inputs, and the previous SVM code just always
allocated stack space for all of them. This results in a ton of additional
NODE_VALUE_x SVM nodes, which slow down execution.

However, this is not really needed for two reasons:
- First, many inputs are only used consitionally. For example, if the
  subsurface weight is zero, none of the other subsurface inputs are used.
- Many of the inputs have a "usual" value that they will have in most
  materials, so if they happen to have that value we can just indicate that
  by not allocating space for them.
  This is a bit similar to the standard "pack the fixed value and provide
  a stack offset if there's a link" pattern, except that the fixed value
  is a constant in the code and we allocate a NODE_VALUE_x if a different
  fixed value is used.

Therefore, this PR re-implements the parameter packing in a more efficient way:
- If we can determine that a component is disabled, all conditional inputs are
  disconnected (to avoid generating upstream nodes).
- If we can determine that a component is disabled, we skip allocating all
  conditional inputs on the stack.
- The inputs for which a reasonable "usual" value exists are changed to
  respect that, and to only be allocated if they differ.
- param1 and param2 (which are fixed-value-packed as on all BSDF nodes) are
  used to store IOR and roughness, which have a decent chance to be fixed
  values.
- The parameter packing is more aggressive about using uchar4, which allows
  to get rid of two SVM nodes while still storing the same inputs.

The result is a considerable speedup in scenes that make heavy use of the
Principled BSDF:

| Scene | CPU speedup | OptiX speedup |
| --- | --- | --- |
| attic | 5% | 9% |
| bistro | 5% | 8% |
| junkshop | 5% | 10% |
| monster | 3% | 4% |
| spring | 1% | 6% |

Pull Request: https://projects.blender.org/blender/blender/pulls/143910
2025-08-04 18:34:58 +02:00
Amogh Shivaram
ff4d840cf8 Cycles: Add polarized Fresnel function for conductors
This PR adds a new `fresnel_conductor_polarized` function, which calculates reflectance and phase shift (if requested) for both parallel and perpendicular polarized light. This is needed for applying thin film iridescence to conductors (see !141131).

For consistency, this PR also makes `fresnel_conductor` call `fresnel_conductor_polarized` instead of using a fast approximation of the Fresnel equations that is inaccurate at lower n and k values. This will change the output of some Metallic BSDF renders using Physical Conductor and prevent discrepancies when enabling thin film iridescence.

I didn't do any rigorous performance testing, but from timing the functions outside of Blender, `fresnel_conductor_polarized` is significantly slower than the approximation, between 1.5-3x depending on the compiler. This makes sense because it has three square roots and the approximation has none. In some informal tests with metallic_multiggx_physical.blend modified to have more spheres, the new renders took around 1-2% longer on both CPU and GPU.

There are some avoidable inefficiencies in this approach of just calling `fresnel_conductor_polarized`:

- one of the three square roots could be saved since `fresnel_conductor` never needs the phase shift and there are simplifications possible when only calculating the reflectance
- there are several unnecessary multiplications by 1.0 since `fresnel_conductor` uses relative IOR and `fresnel_conductor_polarized` doesn't, though those could get optimized out if inlined

Pull Request: https://projects.blender.org/blender/blender/pulls/143903
2025-08-04 15:36:36 +02:00
Lukas Stockner
3107d1f962 Cycles: Improve parameter packing for BSDFs and emission
This replaces `stack_assign` with `stack_assign_if_linked`, which should save a few SVM nodes for constant parameters.

Running benchmarks (all scenes in the benchmark repo, 3 runs, median value for each) shows 1.0% improvement on CPU and 1.5% on OptiX. Not huge, but fairly (all between -0.2% and 3.0%).

Pull Request: https://projects.blender.org/blender/blender/pulls/143404
2025-08-04 15:19:40 +02:00
Campbell Barton
a3bf386d43 Cleanup: use full sentences in text editor code-comments
Also minor improvements, clarifications.
2025-08-02 13:33:05 +10:00
Weizhen Huang
1667d69d3b Cleanup: Cycles: use constexpr in kernel
instead of lambda and macro guard. Should be possible after ce0ae95ed3

Pull Request: https://projects.blender.org/blender/blender/pulls/143723
2025-08-01 14:06:13 +02:00
Campbell Barton
2c27d2be54 Cleanup: grammar corrections, minor improvements to wording 2025-08-01 21:41:24 +10:00
Hugh Delaney
930a942dd0 Refactor: Cycles: Move block sizes into common header
This change puts all the block size macros in the same common header, so
they can be included in host side code without needing to also include
the kernels that are defined in the device headers that contained these
values.

This change also removes a magic number used to enqueue a kernel, which
happened to agree with the GPU_PARALLEL_SORT_BLOCK_SIZE macro.

Pull Request: https://projects.blender.org/blender/blender/pulls/143646
2025-08-01 13:26:02 +02:00
Weizhen Huang
f8eae6b58a Fix: Cycles: Division by zero in Oren-Nayar shader
`Eavg` can still be 1 for very small roughness, causing division by zero
when computing `Ems`.
A roughness of 1e-5 gives an `Evg` of 0.999998, seems reasonable.

Pull Request: https://projects.blender.org/blender/blender/pulls/143637
2025-07-30 16:57:00 +02:00
Patrick Mours
6487395fa5 Cycles: Add linear curve shape
Add new "Linear 3D Curves" option in the Curves panel in the render
properties. This renders curves as linear segments rather than smooth
curves, for faster render time at the cost of accuracy.

On NVIDIA Blackwell GPUs, this can give a 6x speedup compared to smooth
curves, due to hardware acceleration. On NVIDIA Ada there is still
a 3x speedup, and CPU and other GPU backends will also render this
faster.

A difference with smooth curves is that these have end caps, as this
was simpler to implement and they are usually helpful anyway.

In the future this functionality will also be used to properly support
the CURVE_TYPE_POLY on the new curves object.

Pull Request: https://projects.blender.org/blender/blender/pulls/139735
2025-07-29 17:05:01 +02:00
Brecht Van Lommel
f03ac5ec4b Fix #142876: Cycles crash with OSL and interactive updates
Update use_shading, use_camera and the shading system pointers in the same
location, so that when the render is interrupted they are in a consistent state.

The added null pointer checks are not strictly needed, but just in case it
goes out of sync for another reason.

Pull Request: https://projects.blender.org/blender/blender/pulls/143467
2025-07-28 18:43:57 +02:00
Weizhen Huang
ea45c776fd Cycles: introduce dual types
to replace some uses of dfdx/dfdy/differentials.
No functional change expected.

Pull Request: https://projects.blender.org/blender/blender/pulls/143178
2025-07-28 17:34:24 +02:00
Weizhen Huang
345d23bff8 Cleanup: Cycles: add more float3 util functions
and vectorize `wrap` and `safe_fmod`.
2025-07-28 17:34:21 +02:00
Weizhen Huang
48777385c2 Cleanup: Cycles: simplify computation of dPdx and dPdy
`sd->dPdu`, `sd->dPdv`, `sd->du` and `sd->dv` are computed from
`sd->dP` by constructing a local frame, so both results are the same, subject to some numerical differences.

This avoids constructing the local frame again, so might be faster.
2025-07-28 17:34:21 +02:00
Weizhen Huang
f9a65ebbea Cleanup: Cycles: Deduplication svm bump functions 2025-07-28 17:34:21 +02:00
Sergey Sharybin
dcae48d1d3 Cycles: Add Portal Depth light pass information
It allows to implement tricks based on a knowledge whether the path
ever cam through a portal or not, and even something more advanced
based on the number of portals.

The main current objective is for strokes shading: stroke shader
uses Ray Portal BSDF to place ray to the center of the stroke and
point it in the direction of the surface it is generated for. This
gives stroke a single color which matches shading of the original
object. For this usecase to work the ray bounced from the original
surface should ignore the strokes, which is now possible by using
Portal Depth input and mixing with the Transparent BSDF. It also
helps to make shading look better when there are multiple stroke
layers.

A solution of using portal depth is chosen over a single flag due
to various factors:
- Last time we've looked into it it was a bit tricky to implement
	as a flag due to us running out of bits.
- It feels to be more flexible solution, even though it is a bit
	hard to come up with 100% compelling setup for it.
- It needs to be slightly different from the current "Is Foo"
	flags, and be more "Is Portal Descendant" or something.

An extra uint16 is added to the state to count the portal depth,
but it is only allocated for scenes that use Ray Portal BSDF.

Portal BSDF still increments Transparent bounce, as it is required
to have some "limiting" factor so that ray does not get infinitely
move to different place of the scene.

Ref #125213

Pull Request: https://projects.blender.org/blender/blender/pulls/143107
2025-07-25 18:09:38 +02:00
Weizhen Huang
3a1fbe17b9 Fix: OSL: Attribute "generated" not available for World and Point Cloud
When "generated" is required via Attribute Node, it is not available for
World and Point Cloud.

Make OSL match the SVM behavior to use the object coordinates
(see `svm/attribute.h`),

Pull Request: https://projects.blender.org/blender/blender/pulls/143198
2025-07-25 15:39:06 +02:00
Brecht Van Lommel
47f9b7a98e Fix #142022: Cycles undisplaced normal not available
Previously with adaptive subdivision this happened to work with the N
attribute, but that was not meant to be undisplaced. This adds a new
undisplaced_N attribute specifically for this purpose.

For backwards compatibility in Blender 4.5, this also keeps N undisplaced.
But that will be changed in 5.0.

Pull Request: https://projects.blender.org/blender/blender/pulls/142090
2025-07-24 18:16:25 +02:00