`CLOSURE_WEIGHT_CUTOFF` avoids allocating a closure when its weight is
too small. It makes sense for surface closures, but for volume closures
the contribution also depends on the object size/ray length, such a
cutoff seems random and is causing problem in atmospheric scatterings.
Therefore remove the cutoff for volume, just make sure the weight is
positive.
Pull Request: https://projects.blender.org/blender/blender/pulls/131696
The original paper uses the single scattering albedo `sigma_s/sigma_t`
to pick a channel for sampling the scattering distance. However, this
only considers the situation where there is scattering inside the volume.
If some channel has an extinction coefficient of zero, the light passes
through without attenuation for that channel. We assign such channel
with a weight of 1 instead of 0 to make sure it can be sampled.
Pull Request: https://projects.blender.org/blender/blender/pulls/131741
The cause is numerical issues with `fast_sinf()`. While fixing
`fast_sinf()` would ultimately fix the problem, it involves more
complications in other code paths, and it is safer to clamp the
integration range anyway.
Pull Request: https://projects.blender.org/blender/blender/pulls/131689
This PR fixes#130641. The bug was caused by a missing self-object constraint when performing SSS on motion blur scenes. scene_intersect_local tests were erroneously hitting other objects, and out of range primitive IDs were causing spurious downstream behavior.
Pull Request: https://projects.blender.org/blender/blender/pulls/131156
This logic is copied from surface shader, so that the sampled closure
does not need to be evaluated twice when summing all the closures, but
it is not used in volume.
Based on #123377 by @brecht, but Gitea doesn't like the rebase these
so here's a new PR.
The purpose here is to switch to fused OptiX programs for OSL execution
on CUDA. On the one hand, this makes the code easier since, but there's
also another advantage - how memory allocation is managed.
OSL shaders need memory to store intermediate values, but how much is
needed depends on the complexity of the shader. With the split program
approach, Cycles had to provide that memory, so we had to allocate a
certain amount (2 KiB, to be precise) statically and show an error if
the shader would need more. If the shader used less (which is the case
for the vast majority), the memory was just wasted.
By switching to fused kernels, OSL knows the required amount during JIT
codegen, so it can allocate only what's required, which avoids this
waste. One still needs to set a maximum, and in theory, OSL would also
support spilling over into a Cycles-provided alternative memory region.
However, we currently don't implement that - instead, we default to the
same 2048 limit as before and let advanced users override it via the
CYCLES_OSL_GROUPDATA_ALLOC environment variable if really needed.
Co-authored-by: Brecht Van Lommel <brecht@blender.org>
Pull Request: https://projects.blender.org/blender/blender/pulls/130149
Turns out that with `-fassociative-math`, GCC turns
`(1.0f - cos_NH2) + alpha2 * cos_NH2` into
`cos_NH2 * (alpha2 - 1.0f) + 1.0f`.
Not sure why since the operation count is the same, but if alpha2 is very
small, `alpha2 - 1.0f` will be exactly -1.0f, which then causes issues.
Luckily, having one_minus_cos_NH2 as its own variable appears to be enough to
make GCC keep the original formulation.
Just to be safe, I've also used one_minus_cos_NH2 in the other branch to
hopefully reduce the chance of it being folded in again. Also turns a
division into a reciprocal, which is in theory slightly faster.
Pull Request: https://projects.blender.org/blender/blender/pulls/130469
In 891d71a4d4 this keyword was
dropped due to performance regression after
fdc2962beb, but currently code
does not experience this performance degradation, and in fact
there is minor performance improvement on Lunar Lake GPUs,
along with an expected improvement in compile time.
However, this change brings a minor performance regression to
shade_surface kernel on Intel Arc and Meteor Lake GPUs, which
will be solved later by disabling this keyword for
these platforms only.
Pull Request: https://projects.blender.org/blender/blender/pulls/130299
This change makes it so only kernels of the same vendor are compiled in
parallel. For example for the release builds it will be:
1. All CUDA kernels
2. All OptiX kernels
3. All HIP kernels
4. All OneAPI kernels
This potentially leads to a lower CPU utilization, but it makes it much
easier to manage memory usage and tweak per-vendor concurrency.
The goal of this change is to solve occasional out-of-memory during the
GPU kernels compilation step on the CI/CD farm.
This change also includes tweaks to the prallel jobs for HIP-RT and
oneAPI. The tweak is based on measuring apparent memory usage peak on
Linux when doing single-thread compilation, and giving some safe margin
from the available memory on the buildbot.
Pull Request: https://projects.blender.org/blender/blender/pulls/129945
All the OSL matrix functions had been implemented using the
`Transform` utility of Cycles, but that's built around a 4x3 matrix,
when the OSL matrix functions are working with 4x4 matrices.
This resulted in them not producing results consistent with the
CPU implementation.
This fixes that by making use of the `ProjectionTransform` utility
of Cycles instead, because it's built around a 4x4 matrix. Since
matrix inversion is required, I had to make a few more utility
functions available on the GPU (except Metal, due to use of
references/pointers without specification) that were previously
CPU-only.
Co-authored-by: Brecht Van Lommel <brecht@blender.org>
Pull Request: https://projects.blender.org/blender/blender/pulls/110102
The `osl_wavelength_color_vf` intrinsic was missing an implementation for
OptiX, causing a link error when attempting to load OSL shaders using the
wavelength node.
Pull Request: https://projects.blender.org/blender/blender/pulls/129372
In volume segment, the minimal angle formed by the emitter bounding cone
axis and the vector pointing from the cluster centroid to any point on
the ray is computed via `dot(bcone.axis, point_to_centroid)`, see Fig.8.
in paper.
For distant light this angle is 0, but due to numerical issues this is
not always true. Therefore explicitly assign `-bcone.axis` to
`point_to_centroid` in this case.
Pull Request: https://projects.blender.org/blender/blender/pulls/129489
This PR fixes a latent issue arising from invalid use of `accept_any_intersection(true)` when performing SSS ray-stepping with MetalRT. The comment incorrectly states that "we can optimize and accept the first hit", but to guarantee correct behaviour in future we need to request the closest hit.
Draine phase function sampling internally use Henyey-Greenstein and
Rayleigh sampling for degenerated cases, but the sampling pattern was
different between Draine and Rayleigh. The commit effectively replace
`rand` with `1 - rand` in Rayleigh sampling.
Pull Request: https://projects.blender.org/blender/blender/pulls/129261
When `g == 0`, the Draine phase function from
https://doi.org/10.1051/0004-6361/202142437 simplifies to
\[\Phi(\theta)=\frac{3}{4\pi(3+\alpha)}(1+\alpha\cos^2\theta).\]
Similar as Rayleigh sampling in https://doi.org/10.1364/JOSAA.28.002436,
The solution to the CDF of the marginal density function is
\[\cos^3\theta+a\cos\theta+b=0,\]
with
\[a=\frac{3}{\alpha},\quad b=\frac{3+\alpha}{\alpha}(2\xi_1-1),\]
which has only one real root since \(\alpha > 0\),
resulting in the sample technique
\[\cos\theta=u-\frac{1}{\alpha u}.\]
Pull Request: https://projects.blender.org/blender/blender/pulls/129259
Fix the failing rendering of volumes on Windows with HIP SDK 6.1
by reducing the optimization level.
There should be no functional or performance difference for the average
user as the Blender foundation currently does not use HIP SDK 6.1
on Windows. This change is primarily to fix issues for community members
building Blender locally.
Pull Request: https://projects.blender.org/blender/blender/pulls/128836
Gitea would complain the apostrophe in one of the code comments in
tree.h was an ambiguous Unicode character. So fix it by swapping it
for a more common apostrophe type.
The ray offsetting triangle tests are not numerically identical to
those found in custom BVH implementations.
There was a TODO to fix this, but there was no explaination for why
it should be done. This fixes that.
This packs the SVM stack, current node offset and closure weight into one struct, and just passes that to each SVM node implementation.
This way we don't have to pass the offset back and forth all over the place, and adding additional state (e.g. for layering in the future) becomes easier.
Pull Request: https://projects.blender.org/blender/blender/pulls/110443
- Deduplicate Fisheye projection code
- Replace spherical/cartesian conversions with shared helpers
- Replace transforms from/to local coordinate systems with shared helpers
The main type of repeated transform that's not covered here is `to/from_coords`, but with separate values for xy and z (e.g. BSDFs that already computed `dot(wi, N)` earlier, so they only need `dot(wi, X)` and `dot(wi, Y)` later). Could also be replaced, but it would feel weirdly specific for a helper function.
Pull Request: https://projects.blender.org/blender/blender/pulls/125999
Previously, when compiling on Rocky Linux 8 with fno-honor-nans, compile
time was more than 5x longer than expected, and there was an unresolved
symbol to __sqrtf_finite in GPU binaries.
Once defining sqrtf in compat.h, both issues are effectively gone, this
was certainly due to problematic interactions with build system's math
library headers.
So we can remove current workaround of defining fhonor-nans, and now
have the same set of flags on both Windows and Linux.
Fixes a issue where the Principled BSDF would render incorrectly if
`__SUBSURFACE__` is off. Which is common when using adaptive kernel
compilation (a unsupported Cycles feature).
Pull Request: https://projects.blender.org/blender/blender/pulls/128003
Previously, Cycles only supported the Henyey-Greenstein phase function for volume scattering.
While HG is flexible and works for a wide range of effects, sometimes a more physically accurate
phase function may be needed for realism.
Therefore, this adds three new phase functions to the code:
Rayleigh: For particles with a size below the wavelength of light, mostly athmospheric scattering.
Fournier-Forand: For realistic underwater scattering.
Draine: Fairly specific on its own (mostly for interstellar dust), but useful for the next entry.
Mie: Approximates Mie scattering in water droplets using a mix of Draine and HG phase functions.
These phase functions can be combined using Mix nodes as usual.
Co-authored-by: Lukas Stockner <lukas@lukasstockner.de>
Pull Request: https://projects.blender.org/blender/blender/pulls/123532
This enables most of the GPU compiler's optimizations while -ffast-math
isn't set at DPC++ level.
It brings an overall 1% speedup and currently doesn't change the unit
tests pass rate.