Add new "Linear 3D Curves" option in the Curves panel in the render
properties. This renders curves as linear segments rather than smooth
curves, for faster render time at the cost of accuracy.
On NVIDIA Blackwell GPUs, this can give a 6x speedup compared to smooth
curves, due to hardware acceleration. On NVIDIA Ada there is still
a 3x speedup, and CPU and other GPU backends will also render this
faster.
A difference with smooth curves is that these have end caps, as this
was simpler to implement and they are usually helpful anyway.
In the future this functionality will also be used to properly support
the CURVE_TYPE_POLY on the new curves object.
Pull Request: https://projects.blender.org/blender/blender/pulls/139735
Update use_shading, use_camera and the shading system pointers in the same
location, so that when the render is interrupted they are in a consistent state.
The added null pointer checks are not strictly needed, but just in case it
goes out of sync for another reason.
Pull Request: https://projects.blender.org/blender/blender/pulls/143467
`sd->dPdu`, `sd->dPdv`, `sd->du` and `sd->dv` are computed from
`sd->dP` by constructing a local frame, so both results are the same, subject to some numerical differences.
This avoids constructing the local frame again, so might be faster.
It allows to implement tricks based on a knowledge whether the path
ever cam through a portal or not, and even something more advanced
based on the number of portals.
The main current objective is for strokes shading: stroke shader
uses Ray Portal BSDF to place ray to the center of the stroke and
point it in the direction of the surface it is generated for. This
gives stroke a single color which matches shading of the original
object. For this usecase to work the ray bounced from the original
surface should ignore the strokes, which is now possible by using
Portal Depth input and mixing with the Transparent BSDF. It also
helps to make shading look better when there are multiple stroke
layers.
A solution of using portal depth is chosen over a single flag due
to various factors:
- Last time we've looked into it it was a bit tricky to implement
as a flag due to us running out of bits.
- It feels to be more flexible solution, even though it is a bit
hard to come up with 100% compelling setup for it.
- It needs to be slightly different from the current "Is Foo"
flags, and be more "Is Portal Descendant" or something.
An extra uint16 is added to the state to count the portal depth,
but it is only allocated for scenes that use Ray Portal BSDF.
Portal BSDF still increments Transparent bounce, as it is required
to have some "limiting" factor so that ray does not get infinitely
move to different place of the scene.
Ref #125213
Pull Request: https://projects.blender.org/blender/blender/pulls/143107
When "generated" is required via Attribute Node, it is not available for
World and Point Cloud.
Make OSL match the SVM behavior to use the object coordinates
(see `svm/attribute.h`),
Pull Request: https://projects.blender.org/blender/blender/pulls/143198
Previously with adaptive subdivision this happened to work with the N
attribute, but that was not meant to be undisplaced. This adds a new
undisplaced_N attribute specifically for this purpose.
For backwards compatibility in Blender 4.5, this also keeps N undisplaced.
But that will be changed in 5.0.
Pull Request: https://projects.blender.org/blender/blender/pulls/142090
Currently MetalRT renders always use extended limits, which is needed to correctly render scenes where the max primitive count can exceed 2^28 or the instance count can exceed 2^24. This patch adopts Metal best practices of only enabling this flag if it is needed.
This PR is similar to #133364, but there are some notable differences:
1) The old PR made an overly optimistic assumption that all the relevant visibility bits could be squeezed into 8 bits. This new PR adopts the same approach that Optix takes of using 8 bits as a primary HW filter, and checking the full 32 bit mask inside the SW intersection handler.
~~2) I moved the scene scanning check from Scene into MetalDevice. This avoids platform specific details leaking into platform agnostic areas.~~
~~3) In live viewport mode, we always use extended limits in case we tip over the threshold.~~
_EDIT:_
2) The limits are scanned in `Scene::update_kernel_features`, and given to the device by a new `set_bvh_limits` method which returns true if the BVH and kernels need to be reloaded.
Pull Request: https://projects.blender.org/blender/blender/pulls/142401
When `delta` is significantly larger than the difference between `tmin`
and `tmax`, the precision of `theta_a` and `theta_b` reduces, resulting
in banding artefacts.
For equiangular sampling, we use uniform sampling to fix this case.
For light tree, we use the equality
atan(a) + atan(b) = atan2(a + b, 1 - a*b).
Pull Request: https://projects.blender.org/blender/blender/pulls/142845
The code of the "oneapi_load_kernels" function before this modification
was loading kernels and compiling them, if needed, for all devices in
the associated GPU context. This makes sense for one GPU execution
scenario, as well as for execution scenario of multi identical GPU,
but in cases where Blender users have several different GPUs in
render, the previous implementation would compile all kernels
for all devices for each device, unnecessarily doing the same
work multiple times. Because of this, I am changing the
implementation so that now compilation happens only for the used
device per used device, ensuring that no unnecessary work is done.
No render performance changes are expected.
This improves the existing scene specialisation mechanism by replacing "kernel_data.kernel_features" with a function constant. It doesn't cause any additional compilation requests, but allows the backend compiler to eliminate more dead code. An additional compiler hint is provided for dead-stripping "volume_stack_enter_exit" which results in slightly faster rendering of non-volumetric scenes.
Pull Request: https://projects.blender.org/blender/blender/pulls/142235
Previously, we used precomputed Gaussian fits to the XYZ CMFs, performed
the spectral integration in that space, and then converted the result
to the RGB working space.
That worked because we're only supporting dielectric base layers for
the thin film code, so the inputs to the spectral integration
(reflectivity and phase) are both constant w.r.t. wavelength.
However, this will no longer work for conductive base layers.
We could handle reflectivity by converting to XYZ, but that won't work
for phase since its effect on the output is nonlinear.
Therefore, it's time to do this properly by performing the spectral
integration directly in the RGB primaries. To do this, we need to:
- Compute the RGB CMFs from the XYZ CMFs and XYZ-to-RGB matrix
- Resample the RGB CMFs to be parametrized by frequency instead of wavelength
- Compute the FFT of the CMFs
- Store it as a LUT to be used by the kernel code
However, there's two optimizations we can make:
- Both the resampling and the FFT are linear operations, as is the
XYZ-to-RGB conversion. Therefore, we can resample and Fourier-transform
the XYZ CMFs once, store the result in a precomputed table, and then just
multiply the entries by the XYZ-to-RGB matrix at runtime.
- I've included the Python script used to compute the table under
`intern/cycles/doc/precompute`.
- The reference implementation by the paper authors [1] simply stores the
real and imaginary parts in the LUT, and then computes
`cos(shift)*real + sin(shift)*imag`. However, the real and imaginary parts
are oscillating, so the LUT with linear interpolation is not particularly
good at representing them. Instead, we can convert the table to
Magnitude/Phase representation, which is much smoother, and do
`mag * cos(phase - shift)` in the kernel.
- Phase needs to be unwrapped to handle the interpolation decently,
but that's easy.
- This requires an extra trig operation in the kernel in the dielectric case,
but for the conductive case we'll actually save three.
Rendered output is mostly the same, just slightly different because we're
no longer using the Gaussian approximation.
[1] "A Practical Extension to Microfacet Theory for the Modeling of
Varying Iridescence" by Laurent Belcour and Pascal Barla,
https://belcour.github.io/blog/research/publication/2017/05/01/brdf-thin-film.html
Pull Request: https://projects.blender.org/blender/blender/pulls/140944
Supporting this on the Metallic BSDF will require some extra work,
and on the Glossy BSDF it doesn't make much sense conceptually
(for that kind of shader setup, we'll want to support layering in SVM),
but Glass BSDF just needs to be hooked up so might as well do that.
Pull Request: https://projects.blender.org/blender/blender/pulls/140832
Detect which volume attributes nodes have a linear mapping to their usage
as density / color / temperature in volume shader nodes, and use stochastic
sampling for them.
Pull Request: https://projects.blender.org/blender/blender/pulls/132908
Stochastically turn a tricubic filter into a trilinear one. This
reduces the number of taps from 64 to 8. It combines ideas from
the "Stochastic Texture Filtering" paper and our previous GPU
sampling of 3D textures.
This is currently only used in a few places where we know stochastic
interpolation is valid or close enough in practice.
* Principled volume density, color and temperature
* Motion blur velocity
On an Macbook Pro M3 with the openvdb_smoke.blend regression test
and cubic sampling, this gives a ~2x speedup for CPU and ~4x speedup
for GPU. However it also increases noise, usually only a little. Equal
time renders for this scene show a clear reduction in noise for both
CPU and GPU.
Note we can probably get a bigger speedup with acceptable noise trade-off
using full stochastic sampling, but will investigate that separately.
Pull Request: https://projects.blender.org/blender/blender/pulls/132908
All GPU backends now support NanoVDB, using our own kernel side code
that is easily portable. This simplifies kernel and device code.
Volume bounds are now built from the NanoVDB grid instead of OpenVDB,
to avoid having to keep around the OpenVDB grid after loading.
While this reduces memory usage, it does have a performance impact,
particularly for the Cubic filter. That will be addressed by
another commit.
Pull Request: https://projects.blender.org/blender/blender/pulls/132908
This PR is a more extensive follow on from #123551 (removal of AMD and Intel GPU support).
All supported Apple GPUs have Metal 3 and tier 2 argument buffer support. The invariant resource properties `gpuAddress` and `gpuResourceID` can be written directly into GPU structs once at setup time rather than once per dispatch. More background info can be found in [this article](https://developer.apple.com/documentation/metal/improving-cpu-performance-by-using-argument-buffers?language=objc).
Code changes:
- All code relating to `MTLArgumentEncoder` is removed
- `KernelParamsMetal` updates are directly written into `id<MTLBuffer> launch_params_buffer` which is used for the "static" dispatch arguments
- Dynamic dispatch arguments are small enough to be encoded using the `MTLComputeCommandEncoder.setBytes` function, eliminating the need for cycling temporary arg buffers
Pull Request: https://projects.blender.org/blender/blender/pulls/140671
Multi-bounce was mainly disabled for disk sampling where the probability of
hitting something is relatively low even with high albedo, but this is not so
much an issue with random walk.
This reduces darkening artifacts at the cost of some extra render time. The
difference is mainly visible when using a high radius.
Pull Request: https://projects.blender.org/blender/blender/pulls/140665
This was broken by !138632, the refactor of the microfacet code to no longer
check the "geometric normal", which in reality was the smoothed normal.
Since the logic is now the same for all closure types, it seemed weird that
the light leak only affects Microfacet closures, not Diffuse.
Turns out that for diffuse closures, the relevant paths were rejected by
the initial hemisphere check in the smooth bump terminator code, which also
incorporates the smoothed but non-bump/normal-mapped normal sd->N.
So, we can detect and prevent the new light leaks by extending this check to
all closure types for the eval case. Sampling already has stricter checks,
so this doesn't apply there.
With this change, we can revert the two test cases back to their pre-refactor
version. In hindsight it was a mistake to just shrug off these changes as okay,
I should have looked closer into the difference.
Pull Request: https://projects.blender.org/blender/blender/pulls/140415
In Blender 3.3 (1) the individual combine and separate color nodes were
combined together into a single combine/separate color node.
To ensure legacy addons still worked, the old nodes were left in
Blender, but hidden from the Add menus.
It has been nearly 3 years since that change was made, most if not all
addons should have been updated by now. So this commit removes these
hidden legacy nodes.
(1) blender/blender@82df48227b
Pull Request: https://projects.blender.org/blender/blender/pulls/135376
This is replaced by geometry nodes, where volumes can now be generated from
point clouds and meshes with more control, and more efficient rendering as a
sparse volume.
No backwareds compatibility is provided, as this would be complicated, and
probably this feature was not used much in the past few years.
This node was supported in Cycles only, not by EEVEE.
Pull Request: https://projects.blender.org/blender/blender/pulls/140292
The 2D->2D, 3D->3D, 4D->4D hash functions used in Voronoi node were
using quite an expensive hash function. Switch these to dedicated
2D/3D/4D hash functions (pcg2d, pcg3d, pcg4d) -- these are still very
good quality, but the hash function itself is 3x-4x faster.
Which makes Voronoi node calculation overall be around 2x faster. In
some cases when using OSL, the speedup is even larger.
This visibly changes output of the Voronoi noise however. The actual
noise "behaves" the same, just if someone was depending on the noise
pattern being exactly like it was before, this will change the pattern.
Images, more performance results and details wrt OSL are in the PR.
Pull Request: https://projects.blender.org/blender/blender/pulls/139520
a degenerate triangle could produce a tangent that is antiparallel to
the normal, resulting the mapped normal to be zero, and becomes NaN when
normalized in `object_normal_transform()`. Fixed by falling back to
unperturbed normal in this case.
Fixes an assertion in the attic benchmark scene.
Pull Request: https://projects.blender.org/blender/blender/pulls/140135
This reverts commit 23c762e388 in the
blender-v4.5-release branch to work around HIP compiler issues. It will
remain in the main branch.
Ref blender/blender#139836
This reverts commit 64dc9cc98c in the
blender-v4.5-release branch to work around HIP compiler issues. It will
remain in the main branch.
Ref blender/blender#139836
This reverts commit a6015e1411 in the
blender-v4.5-release branch to work around HIP compiler issues. It will
remain in the main branch.
Ref blender/blender#139836
This reverts commit 5abf42012d in the
blender-v4.5-release branch to work around HIP compiler issues. It will
remain in the main branch.
Ref blender/blender#139836
This reverts commit 0e7a696819 in the
blender-v4.5-release branch to work around HIP compiler issues. It will
remain in the main branch.
Ref blender/blender#139836
This is required to make ray differentials work correctly for OSL custom
cameras.
But it also lets us simplify the implementation, and makes the OSL
functionality more complete, such as implementing all noise types.
Pull Request: https://projects.blender.org/blender/blender/pulls/138161
With these changes, we can now mark devices which are expected to work as
performant as possible, and devices which were not optimized for some reason.
For example, because the device was released after the Blender release,
making it impossible for developers to optimize for devices in already
released unchangeable code. This is primarily relevant for the LTS versions,
which are supported for two years and require proper communication about
optimization status for the new devices released during this time.
This is implemented for oneAPI devices. Other device types currently are
marked as optimized for compatibility with old behavior, but may implement
the same in the future.
Pull Request: https://projects.blender.org/blender/blender/pulls/139751