The Principled BSDF has a ton of inputs, and the previous SVM code just always
allocated stack space for all of them. This results in a ton of additional
NODE_VALUE_x SVM nodes, which slow down execution.
However, this is not really needed for two reasons:
- First, many inputs are only used consitionally. For example, if the
subsurface weight is zero, none of the other subsurface inputs are used.
- Many of the inputs have a "usual" value that they will have in most
materials, so if they happen to have that value we can just indicate that
by not allocating space for them.
This is a bit similar to the standard "pack the fixed value and provide
a stack offset if there's a link" pattern, except that the fixed value
is a constant in the code and we allocate a NODE_VALUE_x if a different
fixed value is used.
Therefore, this PR re-implements the parameter packing in a more efficient way:
- If we can determine that a component is disabled, all conditional inputs are
disconnected (to avoid generating upstream nodes).
- If we can determine that a component is disabled, we skip allocating all
conditional inputs on the stack.
- The inputs for which a reasonable "usual" value exists are changed to
respect that, and to only be allocated if they differ.
- param1 and param2 (which are fixed-value-packed as on all BSDF nodes) are
used to store IOR and roughness, which have a decent chance to be fixed
values.
- The parameter packing is more aggressive about using uchar4, which allows
to get rid of two SVM nodes while still storing the same inputs.
The result is a considerable speedup in scenes that make heavy use of the
Principled BSDF:
| Scene | CPU speedup | OptiX speedup |
| --- | --- | --- |
| attic | 5% | 9% |
| bistro | 5% | 8% |
| junkshop | 5% | 10% |
| monster | 3% | 4% |
| spring | 1% | 6% |
Pull Request: https://projects.blender.org/blender/blender/pulls/143910
This PR adds a new `fresnel_conductor_polarized` function, which calculates reflectance and phase shift (if requested) for both parallel and perpendicular polarized light. This is needed for applying thin film iridescence to conductors (see !141131).
For consistency, this PR also makes `fresnel_conductor` call `fresnel_conductor_polarized` instead of using a fast approximation of the Fresnel equations that is inaccurate at lower n and k values. This will change the output of some Metallic BSDF renders using Physical Conductor and prevent discrepancies when enabling thin film iridescence.
I didn't do any rigorous performance testing, but from timing the functions outside of Blender, `fresnel_conductor_polarized` is significantly slower than the approximation, between 1.5-3x depending on the compiler. This makes sense because it has three square roots and the approximation has none. In some informal tests with metallic_multiggx_physical.blend modified to have more spheres, the new renders took around 1-2% longer on both CPU and GPU.
There are some avoidable inefficiencies in this approach of just calling `fresnel_conductor_polarized`:
- one of the three square roots could be saved since `fresnel_conductor` never needs the phase shift and there are simplifications possible when only calculating the reflectance
- there are several unnecessary multiplications by 1.0 since `fresnel_conductor` uses relative IOR and `fresnel_conductor_polarized` doesn't, though those could get optimized out if inlined
Pull Request: https://projects.blender.org/blender/blender/pulls/143903
This replaces `stack_assign` with `stack_assign_if_linked`, which should save a few SVM nodes for constant parameters.
Running benchmarks (all scenes in the benchmark repo, 3 runs, median value for each) shows 1.0% improvement on CPU and 1.5% on OptiX. Not huge, but fairly (all between -0.2% and 3.0%).
Pull Request: https://projects.blender.org/blender/blender/pulls/143404
`Eavg` can still be 1 for very small roughness, causing division by zero
when computing `Ems`.
A roughness of 1e-5 gives an `Evg` of 0.999998, seems reasonable.
Pull Request: https://projects.blender.org/blender/blender/pulls/143637
`sd->dPdu`, `sd->dPdv`, `sd->du` and `sd->dv` are computed from
`sd->dP` by constructing a local frame, so both results are the same, subject to some numerical differences.
This avoids constructing the local frame again, so might be faster.
It allows to implement tricks based on a knowledge whether the path
ever cam through a portal or not, and even something more advanced
based on the number of portals.
The main current objective is for strokes shading: stroke shader
uses Ray Portal BSDF to place ray to the center of the stroke and
point it in the direction of the surface it is generated for. This
gives stroke a single color which matches shading of the original
object. For this usecase to work the ray bounced from the original
surface should ignore the strokes, which is now possible by using
Portal Depth input and mixing with the Transparent BSDF. It also
helps to make shading look better when there are multiple stroke
layers.
A solution of using portal depth is chosen over a single flag due
to various factors:
- Last time we've looked into it it was a bit tricky to implement
as a flag due to us running out of bits.
- It feels to be more flexible solution, even though it is a bit
hard to come up with 100% compelling setup for it.
- It needs to be slightly different from the current "Is Foo"
flags, and be more "Is Portal Descendant" or something.
An extra uint16 is added to the state to count the portal depth,
but it is only allocated for scenes that use Ray Portal BSDF.
Portal BSDF still increments Transparent bounce, as it is required
to have some "limiting" factor so that ray does not get infinitely
move to different place of the scene.
Ref #125213
Pull Request: https://projects.blender.org/blender/blender/pulls/143107
Supporting this on the Metallic BSDF will require some extra work,
and on the Glossy BSDF it doesn't make much sense conceptually
(for that kind of shader setup, we'll want to support layering in SVM),
but Glass BSDF just needs to be hooked up so might as well do that.
Pull Request: https://projects.blender.org/blender/blender/pulls/140832
Detect which volume attributes nodes have a linear mapping to their usage
as density / color / temperature in volume shader nodes, and use stochastic
sampling for them.
Pull Request: https://projects.blender.org/blender/blender/pulls/132908
Stochastically turn a tricubic filter into a trilinear one. This
reduces the number of taps from 64 to 8. It combines ideas from
the "Stochastic Texture Filtering" paper and our previous GPU
sampling of 3D textures.
This is currently only used in a few places where we know stochastic
interpolation is valid or close enough in practice.
* Principled volume density, color and temperature
* Motion blur velocity
On an Macbook Pro M3 with the openvdb_smoke.blend regression test
and cubic sampling, this gives a ~2x speedup for CPU and ~4x speedup
for GPU. However it also increases noise, usually only a little. Equal
time renders for this scene show a clear reduction in noise for both
CPU and GPU.
Note we can probably get a bigger speedup with acceptable noise trade-off
using full stochastic sampling, but will investigate that separately.
Pull Request: https://projects.blender.org/blender/blender/pulls/132908
In Blender 3.3 (1) the individual combine and separate color nodes were
combined together into a single combine/separate color node.
To ensure legacy addons still worked, the old nodes were left in
Blender, but hidden from the Add menus.
It has been nearly 3 years since that change was made, most if not all
addons should have been updated by now. So this commit removes these
hidden legacy nodes.
(1) blender/blender@82df48227b
Pull Request: https://projects.blender.org/blender/blender/pulls/135376
This is replaced by geometry nodes, where volumes can now be generated from
point clouds and meshes with more control, and more efficient rendering as a
sparse volume.
No backwareds compatibility is provided, as this would be complicated, and
probably this feature was not used much in the past few years.
This node was supported in Cycles only, not by EEVEE.
Pull Request: https://projects.blender.org/blender/blender/pulls/140292
The 2D->2D, 3D->3D, 4D->4D hash functions used in Voronoi node were
using quite an expensive hash function. Switch these to dedicated
2D/3D/4D hash functions (pcg2d, pcg3d, pcg4d) -- these are still very
good quality, but the hash function itself is 3x-4x faster.
Which makes Voronoi node calculation overall be around 2x faster. In
some cases when using OSL, the speedup is even larger.
This visibly changes output of the Voronoi noise however. The actual
noise "behaves" the same, just if someone was depending on the noise
pattern being exactly like it was before, this will change the pattern.
Images, more performance results and details wrt OSL are in the PR.
Pull Request: https://projects.blender.org/blender/blender/pulls/139520
a degenerate triangle could produce a tangent that is antiparallel to
the normal, resulting the mapped normal to be zero, and becomes NaN when
normalized in `object_normal_transform()`. Fixed by falling back to
unperturbed normal in this case.
Fixes an assertion in the attic benchmark scene.
Pull Request: https://projects.blender.org/blender/blender/pulls/140135
Several small speedups for Voronoi node (no behavior change). This
affects Cycles and CPU execution of Voronoi node e.g. in Compositor.
- F1 mode: when evaluating distance for Voronoi cells, use a faster
distance estimation, and only do final distance calculation on the
resulting closest cell. This is only really relevant for the default
Euclidian distance, where this saves a square root per evaluated cell
(in 3D Voronoi case saves 26 square roots; in 4D case saves 80 square
roots).
- N-Sphere Radius mode: speedup by doing squared distance calculations.
We only need to find the closest one, so again doing the square root
per cell is not needed here.
Something like 5%-10% speedup for F1 3D Voronoi; more performance details
in the PR.
Pull Request: https://projects.blender.org/blender/blender/pulls/139490
Add a new shader node to control volume coefficients (scattering,
absorption and emission) directly, making it easier to model existing
volumes with measured data.
Pull Request: https://projects.blender.org/blender/blender/pulls/136287
The goal is to reduce the affect of the fmod() used in the noise code,
which was initially reported in the comment:
https://projects.blender.org/blender/blender/pulls/119884#issuecomment-1258902
Basic idea is to benefit from SIMD vectorization on CPU.
Tested on Linux i9-11900K and macOS on M2 Ultra, in both cases performance
after this change is very close to what it could be with the fmod() commented
out (the call itself, `p = p + precision_correction`).
On macOS the penalty of fmod() was about 10%, on Linux it was closer to 30%
when built with GCC-13. With Linux builds from the buildbot it is more like 18%.
The optimization is only done for 3d and 4d noise. It might be possible to
gain some performance improvement for 1d and 2d cases, but the approach would
need to be different: we'd need to optimize scalar version fmodf(). Maybe
tricks with integer cast will be faster (since we are a bit optimistic in the
kernel and do not guarantee exact behavior in extreme cases such as NaN inputs).
Pull Request: https://projects.blender.org/blender/blender/pulls/137109
This makes it possible to restore previous Blender 4.3 behavior of bump
mapping, where the large filter width was sometimes (ab)used to get a bevel
like effect on stepwise textures.
For bump from the displacement socket, filter width remains fixed at 0.1.
Ref #133991, #135841
Pull Request: https://projects.blender.org/blender/blender/pulls/136465
This is an intermediate steps towards making lights actual geometry.
Light is now a subclass of Geometry, which simplifies some code.
The geometry is not added to the BVH yet, which would be the next
step and improve light intersection performance with many lights.
This makes object attributes work on lights.
Co-authored-by: Lukas Stockner <lukas@lukasstockner.de>
Pull Request: https://projects.blender.org/blender/blender/pulls/134846
The attribute handling code in the kernel is currently highly duplicated since
it needs to handle five different data types and we couldn't use templates
back then.
We can now, so might as well make use of it and get rid of ~1000 lines.
There are also some small fixes for the GPU OSL code:
- Wrong derivative for .w component when converting float2/float3->float4
- Different conversion for float2->float (CPU averages, GPU used to take .x)
- Removed useless code for converting to float2, not used by OSL
Pull Request: https://projects.blender.org/blender/blender/pulls/134694
In Cycles lights can be given a light falloff node to control their
light falloff.
This worked by multiplying the light's strength by different
combinations of the ray length, which would be FLT_MAX for
distant lights. This resulting in almost every configuration of the
light falloff node overflowing when used on distant lights, which is
undesirable.
This commit fixes this issue by ignoring most of the functions of the
light falloff node when used on a distant light.
And in the process fixes a small discrepancy between SVM and OSL when
using the light falloff node on distant lights.
Pull Request: https://projects.blender.org/blender/blender/pulls/134539
This commit fixes a issue where ray depth for emissive objects
(E.g. Lights) was incorrect when using the ray depth output of the
light path node in Cycles OSL.
Pull Request: https://projects.blender.org/blender/blender/pulls/134496
The derivatives of the normal were simply not computed.
The offsetted normals are computed by perturbating the barycentric
coordinates. At triangle boundaries, the normals are extrapolated,
so discontinuities might be visible.
Currently only supported on triangles.
Pull Request: https://projects.blender.org/blender/blender/pulls/133769
Use sub-pixel differentials for bump mapping helps with reducing
artifacts when objects are moving or when textures have high frequency
details.
Currently we scale it by 0.1 because it seems to work good in practice,
we can adjust the value in the future if it turns out to be impractical.
Ref: #122892
Pull Request: https://projects.blender.org/blender/blender/pulls/133991
- Rename dx/dy -> dfdx/dfdy to match the actual computed quantity
- Add template functions to compute dfdx/dfdy on triangles for sharing
among different data types
- Add documentation to some functions
- Some code shuffling that makes it easier to scale dfdx/dfdy in the
future
- Some other trivial changes
In a recent refactor (1), the subsurface weight was set to be
constant, when it can be modified in the case that `__SUBSURFACE__`
is false, such as when using the adaptive compilation feature.
This commit fixes this issue by rearranging the code so the subsurface
weight is never overwritten.
(1) blender/blender@dd51c8660b
Pull Request: https://projects.blender.org/blender/blender/pulls/132620
Check was misc-const-correctness, combined with readability-isolate-declaration
as suggested by the docs.
Temporarily clang-format "QualifierAlignment: Left" was used to get consistency
with the prevailing order of keywords.
Pull Request: https://projects.blender.org/blender/blender/pulls/132361
* Use .empty() and .data()
* Use nullptr instead of 0
* No else after return
* Simple class member initialization
* Add override for virtual methods
* Include C++ instead of C headers
* Remove some unused includes
* Use default constructors
* Always use braces
* Consistent names in definition and declaration
* Change typedef to using
Pull Request: https://projects.blender.org/blender/blender/pulls/132361
Previous implemenation of 5 < d < 50 was taken from the main paper,
fitting for smaller sizes are found in the supplemental. They are less
forward-scattering.
Pull Request: https://projects.blender.org/blender/blender/pulls/130234
`CLOSURE_WEIGHT_CUTOFF` avoids allocating a closure when its weight is
too small. It makes sense for surface closures, but for volume closures
the contribution also depends on the object size/ray length, such a
cutoff seems random and is causing problem in atmospheric scatterings.
Therefore remove the cutoff for volume, just make sure the weight is
positive.
Pull Request: https://projects.blender.org/blender/blender/pulls/131696