The Principled BSDF has a ton of inputs, and the previous SVM code just always
allocated stack space for all of them. This results in a ton of additional
NODE_VALUE_x SVM nodes, which slow down execution.
However, this is not really needed for two reasons:
- First, many inputs are only used consitionally. For example, if the
subsurface weight is zero, none of the other subsurface inputs are used.
- Many of the inputs have a "usual" value that they will have in most
materials, so if they happen to have that value we can just indicate that
by not allocating space for them.
This is a bit similar to the standard "pack the fixed value and provide
a stack offset if there's a link" pattern, except that the fixed value
is a constant in the code and we allocate a NODE_VALUE_x if a different
fixed value is used.
Therefore, this PR re-implements the parameter packing in a more efficient way:
- If we can determine that a component is disabled, all conditional inputs are
disconnected (to avoid generating upstream nodes).
- If we can determine that a component is disabled, we skip allocating all
conditional inputs on the stack.
- The inputs for which a reasonable "usual" value exists are changed to
respect that, and to only be allocated if they differ.
- param1 and param2 (which are fixed-value-packed as on all BSDF nodes) are
used to store IOR and roughness, which have a decent chance to be fixed
values.
- The parameter packing is more aggressive about using uchar4, which allows
to get rid of two SVM nodes while still storing the same inputs.
The result is a considerable speedup in scenes that make heavy use of the
Principled BSDF:
| Scene | CPU speedup | OptiX speedup |
| --- | --- | --- |
| attic | 5% | 9% |
| bistro | 5% | 8% |
| junkshop | 5% | 10% |
| monster | 3% | 4% |
| spring | 1% | 6% |
Pull Request: https://projects.blender.org/blender/blender/pulls/143910
This replaces `stack_assign` with `stack_assign_if_linked`, which should save a few SVM nodes for constant parameters.
Running benchmarks (all scenes in the benchmark repo, 3 runs, median value for each) shows 1.0% improvement on CPU and 1.5% on OptiX. Not huge, but fairly (all between -0.2% and 3.0%).
Pull Request: https://projects.blender.org/blender/blender/pulls/143404
Add new "Linear 3D Curves" option in the Curves panel in the render
properties. This renders curves as linear segments rather than smooth
curves, for faster render time at the cost of accuracy.
On NVIDIA Blackwell GPUs, this can give a 6x speedup compared to smooth
curves, due to hardware acceleration. On NVIDIA Ada there is still
a 3x speedup, and CPU and other GPU backends will also render this
faster.
A difference with smooth curves is that these have end caps, as this
was simpler to implement and they are usually helpful anyway.
In the future this functionality will also be used to properly support
the CURVE_TYPE_POLY on the new curves object.
Pull Request: https://projects.blender.org/blender/blender/pulls/139735
Update use_shading, use_camera and the shading system pointers in the same
location, so that when the render is interrupted they are in a consistent state.
The added null pointer checks are not strictly needed, but just in case it
goes out of sync for another reason.
Pull Request: https://projects.blender.org/blender/blender/pulls/143467
It allows to implement tricks based on a knowledge whether the path
ever cam through a portal or not, and even something more advanced
based on the number of portals.
The main current objective is for strokes shading: stroke shader
uses Ray Portal BSDF to place ray to the center of the stroke and
point it in the direction of the surface it is generated for. This
gives stroke a single color which matches shading of the original
object. For this usecase to work the ray bounced from the original
surface should ignore the strokes, which is now possible by using
Portal Depth input and mixing with the Transparent BSDF. It also
helps to make shading look better when there are multiple stroke
layers.
A solution of using portal depth is chosen over a single flag due
to various factors:
- Last time we've looked into it it was a bit tricky to implement
as a flag due to us running out of bits.
- It feels to be more flexible solution, even though it is a bit
hard to come up with 100% compelling setup for it.
- It needs to be slightly different from the current "Is Foo"
flags, and be more "Is Portal Descendant" or something.
An extra uint16 is added to the state to count the portal depth,
but it is only allocated for scenes that use Ray Portal BSDF.
Portal BSDF still increments Transparent bounce, as it is required
to have some "limiting" factor so that ray does not get infinitely
move to different place of the scene.
Ref #125213
Pull Request: https://projects.blender.org/blender/blender/pulls/143107
Previously with adaptive subdivision this happened to work with the N
attribute, but that was not meant to be undisplaced. This adds a new
undisplaced_N attribute specifically for this purpose.
For backwards compatibility in Blender 4.5, this also keeps N undisplaced.
But that will be changed in 5.0.
Pull Request: https://projects.blender.org/blender/blender/pulls/142090
Currently MetalRT renders always use extended limits, which is needed to correctly render scenes where the max primitive count can exceed 2^28 or the instance count can exceed 2^24. This patch adopts Metal best practices of only enabling this flag if it is needed.
This PR is similar to #133364, but there are some notable differences:
1) The old PR made an overly optimistic assumption that all the relevant visibility bits could be squeezed into 8 bits. This new PR adopts the same approach that Optix takes of using 8 bits as a primary HW filter, and checking the full 32 bit mask inside the SW intersection handler.
~~2) I moved the scene scanning check from Scene into MetalDevice. This avoids platform specific details leaking into platform agnostic areas.~~
~~3) In live viewport mode, we always use extended limits in case we tip over the threshold.~~
_EDIT:_
2) The limits are scanned in `Scene::update_kernel_features`, and given to the device by a new `set_bvh_limits` method which returns true if the BVH and kernels need to be reloaded.
Pull Request: https://projects.blender.org/blender/blender/pulls/142401
Use Principled BSDF instead of Diffuse BSDF. This is a breaking change but
likely does not affect many scenes significantly. It adds some specularity
when an object does not have a a material assigned.
Fix#142538
Pull Request: https://projects.blender.org/blender/blender/pulls/142703
Cycles automatically tries to decide if the camera ray should be a
surface or a volume + surface camera ray by checking to see if the
scene contains a volumetric material, and if it does, is it near
where the camera rays are expected to spawn. This step is done
during scene intialization.
With the OSL camera, it is impratical to predict where the
camera rays will spawn during scene intialization, which makes it
impratical to predict if the OSL camera ray will spawn near a
volumetric object. So this commit marks all OSL cameras as
"inside a volume", leading to the spawning of volume + surface camera
rays for OSL cameras while the scene contains a volumetric material.
This leads to increased render times ranging between 1% - 5% in scenes
that use a OSL camera, has a volumetric object in it, and the
volumetric object is far away from the camera. Every other scene should
see no performance impact.
Testing was done on a AMD Ryzen 9 5950X and a NVIDIA GeForce RTX 4090.
Pull Request: https://projects.blender.org/blender/blender/pulls/142036
This is not an actual solution, it falls back to a perspective camera instead
of crashing. Note full_rastertocamera exists specifically for computing raster
size for adaptive subdivision, and changing it should not affect anything else.
Pull Request: https://projects.blender.org/blender/blender/pulls/141905
Previously, we used precomputed Gaussian fits to the XYZ CMFs, performed
the spectral integration in that space, and then converted the result
to the RGB working space.
That worked because we're only supporting dielectric base layers for
the thin film code, so the inputs to the spectral integration
(reflectivity and phase) are both constant w.r.t. wavelength.
However, this will no longer work for conductive base layers.
We could handle reflectivity by converting to XYZ, but that won't work
for phase since its effect on the output is nonlinear.
Therefore, it's time to do this properly by performing the spectral
integration directly in the RGB primaries. To do this, we need to:
- Compute the RGB CMFs from the XYZ CMFs and XYZ-to-RGB matrix
- Resample the RGB CMFs to be parametrized by frequency instead of wavelength
- Compute the FFT of the CMFs
- Store it as a LUT to be used by the kernel code
However, there's two optimizations we can make:
- Both the resampling and the FFT are linear operations, as is the
XYZ-to-RGB conversion. Therefore, we can resample and Fourier-transform
the XYZ CMFs once, store the result in a precomputed table, and then just
multiply the entries by the XYZ-to-RGB matrix at runtime.
- I've included the Python script used to compute the table under
`intern/cycles/doc/precompute`.
- The reference implementation by the paper authors [1] simply stores the
real and imaginary parts in the LUT, and then computes
`cos(shift)*real + sin(shift)*imag`. However, the real and imaginary parts
are oscillating, so the LUT with linear interpolation is not particularly
good at representing them. Instead, we can convert the table to
Magnitude/Phase representation, which is much smoother, and do
`mag * cos(phase - shift)` in the kernel.
- Phase needs to be unwrapped to handle the interpolation decently,
but that's easy.
- This requires an extra trig operation in the kernel in the dielectric case,
but for the conductive case we'll actually save three.
Rendered output is mostly the same, just slightly different because we're
no longer using the Gaussian approximation.
[1] "A Practical Extension to Microfacet Theory for the Modeling of
Varying Iridescence" by Laurent Belcour and Pascal Barla,
https://belcour.github.io/blog/research/publication/2017/05/01/brdf-thin-film.html
Pull Request: https://projects.blender.org/blender/blender/pulls/140944
Supporting this on the Metallic BSDF will require some extra work,
and on the Glossy BSDF it doesn't make much sense conceptually
(for that kind of shader setup, we'll want to support layering in SVM),
but Glass BSDF just needs to be hooked up so might as well do that.
Pull Request: https://projects.blender.org/blender/blender/pulls/140832
Detect which volume attributes nodes have a linear mapping to their usage
as density / color / temperature in volume shader nodes, and use stochastic
sampling for them.
Pull Request: https://projects.blender.org/blender/blender/pulls/132908
All GPU backends now support NanoVDB, using our own kernel side code
that is easily portable. This simplifies kernel and device code.
Volume bounds are now built from the NanoVDB grid instead of OpenVDB,
to avoid having to keep around the OpenVDB grid after loading.
While this reduces memory usage, it does have a performance impact,
particularly for the Cubic filter. That will be addressed by
another commit.
Pull Request: https://projects.blender.org/blender/blender/pulls/132908
After spending way too much time looking into the image handle code
because I assumed the issue is some interaction between OSL code using
a texture and the image system that's in SVM mode (which never happened
before the custom camera) with printf debuggine because OptiX doesn't
work in debug builds, it turns out the issue was something else entirely:
C++ iterator bullshit.
Specifically, when we remove an entry from services->textures, this
invalidates the iterator, so the code restarts from the beginning.
However, the for-loop still increments the iterator *before* checking
the termination criterion.
If we remove the only element of the map, we:
- Set it = map.begin(), which equals map.end() since it's empty now
- Increment it at the end of the loop iteration
- Compare it == map.end(), which is wrong now since we're past the end
No idea how this didn't blow up sooner, none of this seems camera-specific??
Anyways, the fix is simple - only increment if we didn't restart.
Pull Request: https://projects.blender.org/blender/blender/pulls/141580
In Blender 3.3 (1) the individual combine and separate color nodes were
combined together into a single combine/separate color node.
To ensure legacy addons still worked, the old nodes were left in
Blender, but hidden from the Add menus.
It has been nearly 3 years since that change was made, most if not all
addons should have been updated by now. So this commit removes these
hidden legacy nodes.
(1) blender/blender@82df48227b
Pull Request: https://projects.blender.org/blender/blender/pulls/135376
This is replaced by geometry nodes, where volumes can now be generated from
point clouds and meshes with more control, and more efficient rendering as a
sparse volume.
No backwareds compatibility is provided, as this would be complicated, and
probably this feature was not used much in the past few years.
This node was supported in Cycles only, not by EEVEE.
Pull Request: https://projects.blender.org/blender/blender/pulls/140292
This reverts commit 64dc9cc98c in the
blender-v4.5-release branch to work around HIP compiler issues. It will
remain in the main branch.
Ref blender/blender#139836
when there is no uv, we call the function `map_to_sphere()` to create
temporary uv for computing the tangent. It could happen that a triangle
has vertices with the u coordinates going across the line where u wraps
from 1 to 0. In this case, just computing the difference of the u
coordinates results in the wrong triangle area.
To fix this problem, we compute distance in toroidal (wrap around)
space.
This is safe for coordinates generated by `map_to_sphere()` function,
because it is not supposed to map the positions of a triangle to u
coordinates that span larger than 0.5.
Pull Request: https://projects.blender.org/blender/blender/pulls/139880
Keep around the dummy BVH for lights, even if it serves no purpose for now.
Previously I assumed it was not needed, but there is some device specific
code that assumes it exists, and not much point trying to refactor that now
when in the future we actually want to create a BVH for lights.
Pull Request: https://projects.blender.org/blender/blender/pulls/139798
This started with investigating a render issue that appears to be caused by
GCC 15. From what I can tell, it was caused by
`*viewplane = (*viewplane) * bcam->zoom;`.
I'm not entirely sure what the root cause is (potentially pointer aliasing?),
but the restructured code works fine now.
Pull Request: https://projects.blender.org/blender/blender/pulls/139416
If Mix Color node clamps the result, we do not bypass the node, but
disconnect other inputs. However, for the case where both inputs were the
same, the disconnected one would use the stored socket value instead, so
set the factor to 0 to only use the intended input.
Pull Request: https://projects.blender.org/blender/blender/pulls/139098