This recently changed after a fix in 28f93d5443
but we get better performance by ensuring int3 is packed instead.
Packing int3 currently gives a 7% speedup when rendering wdas_cloud on
Intel Arc B580.
Pull Request: https://projects.blender.org/blender/blender/pulls/145593
Meshes that require adaptive subdivision are currently tesselated one at
a time. Change this part of device update to be done in parallel.
To remove the possibility of the status message going backwards, a mutex
was required to keep that portion of the loop atomic.
Results for the loop in question: On one particular scene with over 300
meshes requiring tesselation, the update time drops from ~16 seconds to
~3 seconds. The attached synthetic test drops from ~9 seconds down to ~1
second.
Pull Request: https://projects.blender.org/blender/blender/pulls/145220
It has ~1.2x speed-up on CPU and ~1.5x speed-up on GPU (tested on Metal
M2 Ultra).
Individual samples are noisier, but equal time renders are mostly
better.
Note that volume emission renders differently than before.
Pull Request: https://projects.blender.org/blender/blender/pulls/144451
Guide the probability to scatter in or transmit through the volume.
Only applied for primary rays.
Co-authored-by: Brecht Van Lommel <brecht@blender.org>
This PR adds a new `fresnel_conductor_polarized` function, which calculates reflectance and phase shift (if requested) for both parallel and perpendicular polarized light. This is needed for applying thin film iridescence to conductors (see !141131).
For consistency, this PR also makes `fresnel_conductor` call `fresnel_conductor_polarized` instead of using a fast approximation of the Fresnel equations that is inaccurate at lower n and k values. This will change the output of some Metallic BSDF renders using Physical Conductor and prevent discrepancies when enabling thin film iridescence.
I didn't do any rigorous performance testing, but from timing the functions outside of Blender, `fresnel_conductor_polarized` is significantly slower than the approximation, between 1.5-3x depending on the compiler. This makes sense because it has three square roots and the approximation has none. In some informal tests with metallic_multiggx_physical.blend modified to have more spheres, the new renders took around 1-2% longer on both CPU and GPU.
There are some avoidable inefficiencies in this approach of just calling `fresnel_conductor_polarized`:
- one of the three square roots could be saved since `fresnel_conductor` never needs the phase shift and there are simplifications possible when only calculating the reflectance
- there are several unnecessary multiplications by 1.0 since `fresnel_conductor` uses relative IOR and `fresnel_conductor_polarized` doesn't, though those could get optimized out if inlined
Pull Request: https://projects.blender.org/blender/blender/pulls/143903
Previously, we used precomputed Gaussian fits to the XYZ CMFs, performed
the spectral integration in that space, and then converted the result
to the RGB working space.
That worked because we're only supporting dielectric base layers for
the thin film code, so the inputs to the spectral integration
(reflectivity and phase) are both constant w.r.t. wavelength.
However, this will no longer work for conductive base layers.
We could handle reflectivity by converting to XYZ, but that won't work
for phase since its effect on the output is nonlinear.
Therefore, it's time to do this properly by performing the spectral
integration directly in the RGB primaries. To do this, we need to:
- Compute the RGB CMFs from the XYZ CMFs and XYZ-to-RGB matrix
- Resample the RGB CMFs to be parametrized by frequency instead of wavelength
- Compute the FFT of the CMFs
- Store it as a LUT to be used by the kernel code
However, there's two optimizations we can make:
- Both the resampling and the FFT are linear operations, as is the
XYZ-to-RGB conversion. Therefore, we can resample and Fourier-transform
the XYZ CMFs once, store the result in a precomputed table, and then just
multiply the entries by the XYZ-to-RGB matrix at runtime.
- I've included the Python script used to compute the table under
`intern/cycles/doc/precompute`.
- The reference implementation by the paper authors [1] simply stores the
real and imaginary parts in the LUT, and then computes
`cos(shift)*real + sin(shift)*imag`. However, the real and imaginary parts
are oscillating, so the LUT with linear interpolation is not particularly
good at representing them. Instead, we can convert the table to
Magnitude/Phase representation, which is much smoother, and do
`mag * cos(phase - shift)` in the kernel.
- Phase needs to be unwrapped to handle the interpolation decently,
but that's easy.
- This requires an extra trig operation in the kernel in the dielectric case,
but for the conductive case we'll actually save three.
Rendered output is mostly the same, just slightly different because we're
no longer using the Gaussian approximation.
[1] "A Practical Extension to Microfacet Theory for the Modeling of
Varying Iridescence" by Laurent Belcour and Pascal Barla,
https://belcour.github.io/blog/research/publication/2017/05/01/brdf-thin-film.html
Pull Request: https://projects.blender.org/blender/blender/pulls/140944
All GPU backends now support NanoVDB, using our own kernel side code
that is easily portable. This simplifies kernel and device code.
Volume bounds are now built from the NanoVDB grid instead of OpenVDB,
to avoid having to keep around the OpenVDB grid after loading.
While this reduces memory usage, it does have a performance impact,
particularly for the Cubic filter. That will be addressed by
another commit.
Pull Request: https://projects.blender.org/blender/blender/pulls/132908
* Add own simple logging system to replace glog, which is no longer
maintained by Google.
* When building in Blender, integrate with CLOG and print all messages
through that system instead.
* --log cycles now replaces --debug-cycles. The latter still works but
is no longer documented.
Pull Request: https://projects.blender.org/blender/blender/pulls/140244
The 2D->2D, 3D->3D, 4D->4D hash functions used in Voronoi node were
using quite an expensive hash function. Switch these to dedicated
2D/3D/4D hash functions (pcg2d, pcg3d, pcg4d) -- these are still very
good quality, but the hash function itself is 3x-4x faster.
Which makes Voronoi node calculation overall be around 2x faster. In
some cases when using OSL, the speedup is even larger.
This visibly changes output of the Voronoi noise however. The actual
noise "behaves" the same, just if someone was depending on the noise
pattern being exactly like it was before, this will change the pattern.
Images, more performance results and details wrt OSL are in the PR.
Pull Request: https://projects.blender.org/blender/blender/pulls/139520
This reverts commit 5abf42012d in the
blender-v4.5-release branch to work around HIP compiler issues. It will
remain in the main branch.
Ref blender/blender#139836
This reverts commit 0e7a696819 in the
blender-v4.5-release branch to work around HIP compiler issues. It will
remain in the main branch.
Ref blender/blender#139836
This started with investigating a render issue that appears to be caused by
GCC 15. From what I can tell, it was caused by
`*viewplane = (*viewplane) * bcam->zoom;`.
I'm not entirely sure what the root cause is (potentially pointer aliasing?),
but the restructured code works fine now.
Pull Request: https://projects.blender.org/blender/blender/pulls/139416
This PR adds a global mutex to `path_create_directories` to fix a thread-safety issue which can occur when concurrently creating multiple subdirectories with common stems.
Pull Request: https://projects.blender.org/blender/blender/pulls/139266
In detail:
- Direct accesses of state attributes are replaced with the INTEGRATOR_STATE and INTEGRATOR_STATE_WRITE macros.
- Unified the checks for the __PATH_GUIDING define to use # if defined (__PATH_GUIDING__).
- Even if __PATH_GUIDING__ is defined, we now check if the feature is enabled using if ((kernel_data.kernel_features & KERNEL_FEATURE_PATH_GUIDING)) {. This is important for later GPU ports.
- The kernel usage of the guiding field, surface, and volume sampling distributions is wrapped behind macros for each specific device (atm only CPU). This will make it easier for a GPU port later.
The particle system generates some particles with NaN values. The
set_if_different mechanism skipped copying those due to a refactor
in the matrix equality test. Revert that part of 689633d802 for now.
A better solution would be to improve handling of NaNs in Cycles,
and to find and fix the cause of the NaN in the particle system.
Pull Request: https://projects.blender.org/blender/blender/pulls/139238
There is a corner case where one side of a quad needs splitting and the other
side has only one segment. Previously this would produce either gaps or after
recent changes to stitch together geometry, uninitialized memory.
Now solve this by splitting into triangular patches, as suggested in the
DiagSplit paper. These triangular patches can be further subdivided themselves.
Dicing has special cases for 1 or 2 segments on edges. For more segments it
works the same as: quad dicing: A regular inner triangle grid stitched to the
outer edges.
Fix#136973: Inconsistent results with adaptive subdivision
Pull Request: https://projects.blender.org/blender/blender/pulls/139062
osl_transform_triple(), osl_transform_dvmdv() and so on are supposed to apply
the given transform in the context of OSL's auto-differentiation system.
Therefore, the given input is a dual vector, containing both the value as v[0]
and its derivatives w.r.t. X and Y in v[1] and v[2].
However, the existing code treats these as a simple list of vectors, applying
the same operation to all three instead of propagating the derivatives.
On top of that, it also treated the given matrix input as if there were three
of them, which isn't the case.
Therefore, this commit replaces the implementation to do the right thing.
The Vector and Normal case are straightforward since the operation is linear,
so applying the same operation to all three vectors works.
The Point case is a bit more complicated, but not too bad when written out.
This bug mostly became apparent when using Object or Camera texture coordinates
with a Bump node, since that node uses OSL differentials and Object/Camera
coordinates are implemented using transform().
I'm pretty sure that all the other builtin functions (e.g. sin) at the bottom
of services_gpu.h have the same problem, but one thing at a time...
Pull Request: https://projects.blender.org/blender/blender/pulls/138045
Previously spell checker ignored text in single quotes however this
meant incorrect spelling was ignored in text where it shouldn't have
been.
In cases single quotes were used for literal strings
(such as variables, code & compiler flags),
replace these with back-ticks.
In cases they were used for UI labels,
replace these with double quotes.
In cases they were used to reference symbols,
replace them with doxygens symbol link syntax (leading hash).
Apply some spelling corrections & tweaks (for check_spelling_* targets).
The goal is to reduce the affect of the fmod() used in the noise code,
which was initially reported in the comment:
https://projects.blender.org/blender/blender/pulls/119884#issuecomment-1258902
Basic idea is to benefit from SIMD vectorization on CPU.
Tested on Linux i9-11900K and macOS on M2 Ultra, in both cases performance
after this change is very close to what it could be with the fmod() commented
out (the call itself, `p = p + precision_correction`).
On macOS the penalty of fmod() was about 10%, on Linux it was closer to 30%
when built with GCC-13. With Linux builds from the buildbot it is more like 18%.
The optimization is only done for 3d and 4d noise. It might be possible to
gain some performance improvement for 1d and 2d cases, but the approach would
need to be different: we'd need to optimize scalar version fmodf(). Maybe
tricks with integer cast will be faster (since we are a bit optimistic in the
kernel and do not guarantee exact behavior in extreme cases such as NaN inputs).
Pull Request: https://projects.blender.org/blender/blender/pulls/137109