Commit Graph

1327 Commits

Author SHA1 Message Date
Xavier Hallade
aeb103fb50 Cycles: Pack uint3/int3 structs for oneAPI
This recently changed after a fix in 28f93d5443
but we get better performance by ensuring int3 is packed instead.

Packing int3 currently gives a 7% speedup when rendering wdas_cloud on
Intel Arc B580.

Pull Request: https://projects.blender.org/blender/blender/pulls/145593
2025-09-08 09:22:32 +02:00
Jesse Yurkovich
96e7242678 Cycles: Tesselate adaptive subdivision meshes in parallel
Meshes that require adaptive subdivision are currently tesselated one at
a time. Change this part of device update to be done in parallel.

To remove the possibility of the status message going backwards, a mutex
was required to keep that portion of the loop atomic.

Results for the loop in question: On one particular scene with over 300
meshes requiring tesselation, the update time drops from ~16 seconds to
~3 seconds. The attached synthetic test drops from ~9 seconds down to ~1
second.

Pull Request: https://projects.blender.org/blender/blender/pulls/145220
2025-08-28 20:22:14 +02:00
Campbell Barton
c45ee0eb98 Cleanup: quiet compiler warnings
Suppressing "null-pointer-subtraction" was needed for clang
but caused a warning with GCC.
2025-08-20 11:18:29 +10:00
Brecht Van Lommel
c7e2368d6c Fix #144528: Cycles renders OpenVDB grids with rotation wrong
Pull Request: https://projects.blender.org/blender/blender/pulls/144825
2025-08-19 21:39:30 +02:00
Brecht Van Lommel
28f93d5443 Fix #144569: Cycles NanoVDB rendering broken with oneAPI
Wrong assumption about packed_int3, and not caught because the assert was in
the wrong place.

Pull Request: https://projects.blender.org/blender/blender/pulls/144803
2025-08-19 18:41:53 +02:00
Brecht Van Lommel
2615cecf10 Refactor: Cycles: Align log levels with CLOG
WORK -> DEBUG
DEBUG, STATS -> TRACE

Pull Request: https://projects.blender.org/blender/blender/pulls/144490
2025-08-18 20:22:44 +02:00
Weizhen Huang
df496eb894 Cycles: use one-tap stochastic interpolation for volume
It has ~1.2x speed-up on CPU and ~1.5x speed-up on GPU (tested on Metal
M2 Ultra).

Individual samples are noisier, but equal time renders are mostly
better.

Note that volume emission renders differently than before.

Pull Request: https://projects.blender.org/blender/blender/pulls/144451
2025-08-14 15:22:44 +02:00
Weizhen Huang
a4f8e0bfa2 Cycles: Use RGBE for denoised guiding buffers to reduce memory usage
Co-authored-by: Brecht Van Lommel <brecht@blender.org>
2025-08-13 10:28:50 +02:00
Weizhen Huang
5cb6014efd Cycles: Volume Scattering Probability Guiding
Guide the probability to scatter in or transmit through the volume.
Only applied for primary rays.

Co-authored-by: Brecht Van Lommel <brecht@blender.org>
2025-08-13 10:28:50 +02:00
Weizhen Huang
b2b2d9a4f3 Cycles: Render volume by ray marching through octrees
One octree per volume per shader based on the density. In preparation
for the null scattering
2025-08-13 10:28:50 +02:00
Campbell Barton
77d6960d24 Cleanup: quiet GCC warning for pointer subtraction
Ref !144032
2025-08-06 20:31:14 +00:00
Campbell Barton
e8501d2f54 Cleanup: grammar corrections, minor improvements to wording
Also back-tick quote some code references in comments
to differentiate them from English text.
2025-08-06 00:20:39 +00:00
Amogh Shivaram
ff4d840cf8 Cycles: Add polarized Fresnel function for conductors
This PR adds a new `fresnel_conductor_polarized` function, which calculates reflectance and phase shift (if requested) for both parallel and perpendicular polarized light. This is needed for applying thin film iridescence to conductors (see !141131).

For consistency, this PR also makes `fresnel_conductor` call `fresnel_conductor_polarized` instead of using a fast approximation of the Fresnel equations that is inaccurate at lower n and k values. This will change the output of some Metallic BSDF renders using Physical Conductor and prevent discrepancies when enabling thin film iridescence.

I didn't do any rigorous performance testing, but from timing the functions outside of Blender, `fresnel_conductor_polarized` is significantly slower than the approximation, between 1.5-3x depending on the compiler. This makes sense because it has three square roots and the approximation has none. In some informal tests with metallic_multiggx_physical.blend modified to have more spheres, the new renders took around 1-2% longer on both CPU and GPU.

There are some avoidable inefficiencies in this approach of just calling `fresnel_conductor_polarized`:

- one of the three square roots could be saved since `fresnel_conductor` never needs the phase shift and there are simplifications possible when only calculating the reflectance
- there are several unnecessary multiplications by 1.0 since `fresnel_conductor` uses relative IOR and `fresnel_conductor_polarized` doesn't, though those could get optimized out if inlined

Pull Request: https://projects.blender.org/blender/blender/pulls/143903
2025-08-04 15:36:36 +02:00
Weizhen Huang
a7042ca30c Fix: warning template-id-cdtor on gcc 2025-07-29 10:41:17 +02:00
Weizhen Huang
ea45c776fd Cycles: introduce dual types
to replace some uses of dfdx/dfdy/differentials.
No functional change expected.

Pull Request: https://projects.blender.org/blender/blender/pulls/143178
2025-07-28 17:34:24 +02:00
Weizhen Huang
345d23bff8 Cleanup: Cycles: add more float3 util functions
and vectorize `wrap` and `safe_fmod`.
2025-07-28 17:34:21 +02:00
Weizhen Huang
9404db8c7c Fix #141388: Cycles: CPU/GPU difference in pow function with 0 base
If the base is 0 and the exponent is non-zero, return 0 for both CPU and GPU.

Pull Request: https://projects.blender.org/blender/blender/pulls/142678
2025-07-21 14:45:30 +02:00
Brecht Van Lommel
f38b4323f9 Fix: Build error with NDEBUG after recent fix for log level macro 2025-07-10 21:10:36 +02:00
Brecht Van Lommel
73fe848e07 Fix: Cycles log levels conflict with macros on some platforms
In particular DEBUG, but prefix all of them to be sure.

Pull Request: https://projects.blender.org/blender/blender/pulls/141749
2025-07-10 19:44:14 +02:00
Campbell Barton
ce7561982a Cleanup: use conventional license formatting
Quiet "make check_licenses" warning.
2025-07-10 00:38:11 +00:00
Lukas Stockner
eaa5f63ba2 Cycles: Replace thin-film basis function approximation with accurate LUTs
Previously, we used precomputed Gaussian fits to the XYZ CMFs, performed
the spectral integration in that space, and then converted the result
to the RGB working space.

That worked because we're only supporting dielectric base layers for
the thin film code, so the inputs to the spectral integration
(reflectivity and phase) are both constant w.r.t. wavelength.

However, this will no longer work for conductive base layers.
We could handle reflectivity by converting to XYZ, but that won't work
for phase since its effect on the output is nonlinear.

Therefore, it's time to do this properly by performing the spectral
integration directly in the RGB primaries. To do this, we need to:
- Compute the RGB CMFs from the XYZ CMFs and XYZ-to-RGB matrix
- Resample the RGB CMFs to be parametrized by frequency instead of wavelength
- Compute the FFT of the CMFs
- Store it as a LUT to be used by the kernel code

However, there's two optimizations we can make:
- Both the resampling and the FFT are linear operations, as is the
  XYZ-to-RGB conversion. Therefore, we can resample and Fourier-transform
  the XYZ CMFs once, store the result in a precomputed table, and then just
  multiply the entries by the XYZ-to-RGB matrix at runtime.
  - I've included the Python script used to compute the table under
    `intern/cycles/doc/precompute`.
- The reference implementation by the paper authors [1] simply stores the
  real and imaginary parts in the LUT, and then computes
  `cos(shift)*real + sin(shift)*imag`. However, the real and imaginary parts
  are oscillating, so the LUT with linear interpolation is not particularly
  good at representing them. Instead, we can convert the table to
  Magnitude/Phase representation, which is much smoother, and do
  `mag * cos(phase - shift)` in the kernel.
  - Phase needs to be unwrapped to handle the interpolation decently,
    but that's easy.
  - This requires an extra trig operation in the kernel in the dielectric case,
    but for the conductive case we'll actually save three.

Rendered output is mostly the same, just slightly different because we're
no longer using the Gaussian approximation.

[1] "A Practical Extension to Microfacet Theory for the Modeling of
    Varying Iridescence" by Laurent Belcour and Pascal Barla,
    https://belcour.github.io/blog/research/publication/2017/05/01/brdf-thin-film.html

Pull Request: https://projects.blender.org/blender/blender/pulls/140944
2025-07-09 22:10:28 +02:00
Brecht Van Lommel
4c25b49875 Refactor: Cycles: Deduplicate 3D texture sampling between devices
Pull Request: https://projects.blender.org/blender/blender/pulls/132908
2025-07-09 21:04:38 +02:00
Brecht Van Lommel
b6c4233b28 Refactor: Cycles: Remove now unused 3D image texture support
Pull Request: https://projects.blender.org/blender/blender/pulls/132908
2025-07-09 21:04:38 +02:00
Brecht Van Lommel
7978799e6f Cycles: Always render volume as NanoVDB
All GPU backends now support NanoVDB, using our own kernel side code
that is easily portable. This simplifies kernel and device code.

Volume bounds are now built from the NanoVDB grid instead of OpenVDB,
to avoid having to keep around the OpenVDB grid after loading.

While this reduces memory usage, it does have a performance impact,
particularly for the Cubic filter. That will be addressed by
another commit.

Pull Request: https://projects.blender.org/blender/blender/pulls/132908
2025-07-09 21:04:38 +02:00
Brecht Van Lommel
8cf031ba95 Fix: Wrong Cycles NanoVDB memory alignment on Windows
This was not a problem in practice so far, but will be with upcoming changes.

Pull Request: https://projects.blender.org/blender/blender/pulls/132908
2025-07-09 20:59:27 +02:00
Brecht Van Lommel
8111152c67 Refactor: Cycles: Add some OpenVDB and NanoVDB functions to util
OpenVDB to NanoVDB was moved, a new NanoVDB to OpenVDB mask grid was
added for future use. Some redundant CMake code was simplified.

Pull Request: https://projects.blender.org/blender/blender/pulls/132908
2025-07-09 20:59:27 +02:00
Brecht Van Lommel
cf36acbc0c Refactor: Cycles: Replace remaining fprintf with logging
Pull Request: https://projects.blender.org/blender/blender/pulls/140244
2025-07-09 20:59:25 +02:00
Brecht Van Lommel
b9d7bab6e6 Refactor: Cycles: Add comments to explain the logging API
Pull Request: https://projects.blender.org/blender/blender/pulls/140244
2025-07-09 20:59:25 +02:00
Brecht Van Lommel
fb4e3c8167 Refactor: Cycles: Remove distinction between severity and verbosity
Only use LOG() and LOG_IS_ON() macros, no more VLOG_.

Pull Request: https://projects.blender.org/blender/blender/pulls/140244
2025-07-09 20:59:24 +02:00
Brecht Van Lommel
8392ca915b Cycles: Remove glog dependency, redirect logs to CLOG
* Add own simple logging system to replace glog, which is no longer
  maintained by Google.
* When building in Blender, integrate with CLOG and print all messages
  through that system instead.
* --log cycles now replaces --debug-cycles. The latter still works but
  is no longer documented.

Pull Request: https://projects.blender.org/blender/blender/pulls/140244
2025-07-09 20:59:24 +02:00
Brecht Van Lommel
cf7f276d49 Refactor: Cycles: Tweak logging to prepare for dropping glog
* Implement own simple ScopedMockLog
* Always use names instead of numbers
* Avoid logging in header files

Pull Request: https://projects.blender.org/blender/blender/pulls/140244
2025-07-09 20:59:24 +02:00
Sergey Sharybin
9ace788faf Merge branch 'blender-v4.5-release' 2025-07-02 10:42:01 +02:00
Michael Jones
681eed7e4d Fix #135659: Some types of motion are incorrect at low step counts with MetalRT
Following #136253, this PR enables decomposed MetalRT motion
interpolation on macOS 15.6. The bounding box issue is fixed
in the latest macOS 15.6 beta (24G5054d).

Pull Request: https://projects.blender.org/blender/blender/pulls/141207
2025-07-02 10:41:42 +02:00
Aras Pranckevicius
68111db969 Nodes: Speedup Voronoi by changing the hash function
The 2D->2D, 3D->3D, 4D->4D hash functions used in Voronoi node were
using quite an expensive hash function. Switch these to dedicated
2D/3D/4D hash functions (pcg2d, pcg3d, pcg4d) -- these are still very
good quality, but the hash function itself is 3x-4x faster.
Which makes Voronoi node calculation overall be around 2x faster. In
some cases when using OSL, the speedup is even larger.

This visibly changes output of the Voronoi noise however. The actual
noise "behaves" the same, just if someone was depending on the noise
pattern being exactly like it was before, this will change the pattern.

Images, more performance results and details wrt OSL are in the PR.

Pull Request: https://projects.blender.org/blender/blender/pulls/139520
2025-06-12 20:07:52 +02:00
Brecht Van Lommel
04e325029f Revert "Cycles: Guiding cleaning up and refactoring the guiding code"
This reverts commit 5abf42012d in the
blender-v4.5-release branch to work around HIP compiler issues. It will
remain in the main branch.

Ref blender/blender#139836
2025-06-11 15:47:06 +02:00
Brecht Van Lommel
501b4641f6 Revert "Cleanup: Unused arguments in Cycles kernel"
This reverts commit 0e7a696819 in the
blender-v4.5-release branch to work around HIP compiler issues. It will
remain in the main branch.

Ref blender/blender#139836
2025-06-11 15:47:06 +02:00
Campbell Barton
07121d44ae Cleanup: use braces (follow own style guide) 2025-06-11 09:05:26 +00:00
Brecht Van Lommel
0e7a696819 Cleanup: Unused arguments in Cycles kernel
And add back the compiler flag that hid them.

Pull Request: https://projects.blender.org/blender/blender/pulls/139497
2025-05-27 21:30:45 +02:00
Lukas Stockner
507267393e Cleanup: Cycles: Restructure camera viewplane calculation
This started with investigating a render issue that appears to be caused by
GCC 15. From what I can tell, it was caused by
`*viewplane = (*viewplane) * bcam->zoom;`.

I'm not entirely sure what the root cause is (potentially pointer aliasing?),
but the restructured code works fine now.

Pull Request: https://projects.blender.org/blender/blender/pulls/139416
2025-05-26 22:24:20 +02:00
Michael Jones
8dd9aeb11e Cycles: Fix occasional failure in path_create_directories
This PR adds a global mutex to `path_create_directories` to fix a thread-safety issue which can occur when concurrently creating multiple subdirectories with common stems.

Pull Request: https://projects.blender.org/blender/blender/pulls/139266
2025-05-22 16:06:51 +02:00
Sebastian Herholz
5abf42012d Cycles: Guiding cleaning up and refactoring the guiding code
In detail:
- Direct accesses of state attributes are replaced with the INTEGRATOR_STATE and INTEGRATOR_STATE_WRITE macros.
- Unified the checks for the __PATH_GUIDING define to use #  if defined (__PATH_GUIDING__).
- Even if __PATH_GUIDING__ is defined, we now check if the feature is enabled using if ((kernel_data.kernel_features & KERNEL_FEATURE_PATH_GUIDING)) {. This is important for later GPU ports.
- The kernel usage of the guiding field, surface, and volume sampling distributions is wrapped behind macros for each specific device (atm only CPU). This will make it easier for a GPU port later.
2025-05-22 13:46:30 +02:00
Brecht Van Lommel
fc686ff257 Fix #139002: Cycles particle object instance appears in center of scene
The particle system generates some particles with NaN values. The
set_if_different mechanism skipped copying those due to a refactor
in the matrix equality test. Revert that part of 689633d802 for now.

A better solution would be to improve handling of NaNs in Cycles,
and to find and fix the cause of the NaN in the particle system.

Pull Request: https://projects.blender.org/blender/blender/pulls/139238
2025-05-22 01:10:19 +02:00
Brecht Van Lommel
59b4842117 Cycles: Adaptive subdivision triangular patches
There is a corner case where one side of a quad needs splitting and the other
side has only one segment. Previously this would produce either gaps or after
recent changes to stitch together geometry, uninitialized memory.

Now solve this by splitting into triangular patches, as suggested in the
DiagSplit paper. These triangular patches can be further subdivided themselves.
Dicing has special cases for 1 or 2 segments on edges. For more segments it
works the same as: quad dicing: A regular inner triangle grid stitched to the
outer edges.

Fix #136973: Inconsistent results with adaptive subdivision

Pull Request: https://projects.blender.org/blender/blender/pulls/139062
2025-05-19 12:04:11 +02:00
Campbell Barton
b3dfde88f3 Cleanup: spelling in comments (check_spelling_* target)
Also uppercase acronyms: API, UTF & ASCII.
2025-05-17 10:17:37 +10:00
Weizhen Huang
1f01a1aee9 Cleanup: remove unnecessary defined(__KERNEL_METAL__)
The top level guard is already `#ifndef __KERNEL_METAL__`, additional
guard is not only unnecessary but also confusing.
2025-05-05 18:35:24 +02:00
Campbell Barton
43af16a4c1 Cleanup: spelling in comments, correct comment block formatting
Also use doxygen comments more consistently.
2025-05-01 11:44:33 +10:00
Lukas Stockner
8bc9f174d3 Fix: Cycles: Wrong derivative handling in OptiX OSL transform()
osl_transform_triple(), osl_transform_dvmdv() and so on are supposed to apply
the given transform in the context of OSL's auto-differentiation system.
Therefore, the given input is a dual vector, containing both the value as v[0]
and its derivatives w.r.t. X and Y in v[1] and v[2].

However, the existing code treats these as a simple list of vectors, applying
the same operation to all three instead of propagating the derivatives.
On top of that, it also treated the given matrix input as if there were three
of them, which isn't the case.

Therefore, this commit replaces the implementation to do the right thing.
The Vector and Normal case are straightforward since the operation is linear,
so applying the same operation to all three vectors works.
The Point case is a bit more complicated, but not too bad when written out.

This bug mostly became apparent when using Object or Camera texture coordinates
with a Bump node, since that node uses OSL differentials and Object/Camera
coordinates are implemented using transform().

I'm pretty sure that all the other builtin functions (e.g. sin) at the bottom
of services_gpu.h have the same problem, but one thing at a time...

Pull Request: https://projects.blender.org/blender/blender/pulls/138045
2025-04-28 12:46:54 +02:00
Brecht Van Lommel
b174e5f0d1 Cycles: Vulkan CUDA graphics interop
* Using CUDA external memory
* Checks that device UUID matches Vulkan

Pull Request: https://projects.blender.org/blender/blender/pulls/137363
2025-04-28 11:38:56 +02:00
Campbell Barton
c90e8bae0b Cleanup: spelling in comments & replace some use of single quotes
Previously spell checker ignored text in single quotes however this
meant incorrect spelling was ignored in text where it shouldn't have
been.

In cases single quotes were used for literal strings
(such as variables, code & compiler flags),
replace these with back-ticks.

In cases they were used for UI labels,
replace these with double quotes.

In cases they were used to reference symbols,
replace them with doxygens symbol link syntax (leading hash).

Apply some spelling corrections & tweaks (for check_spelling_* targets).
2025-04-26 11:17:13 +00:00
Sergey Sharybin
30b962b3d8 Cycles: Optimize 3d and 4d noise
The goal is to reduce the affect of the fmod() used in the noise code,
which was initially reported in the comment:

    https://projects.blender.org/blender/blender/pulls/119884#issuecomment-1258902

Basic idea is to benefit from SIMD vectorization on CPU.

Tested on Linux i9-11900K and macOS on M2 Ultra, in both cases performance
after this change is very close to what it could be with the fmod() commented
out (the call itself, `p = p + precision_correction`).

On macOS the penalty of fmod() was about 10%, on Linux it was closer to 30%
when built with GCC-13. With Linux builds from the buildbot it is more like 18%.

The optimization is only done for 3d and 4d noise. It might be possible to
gain some performance improvement for 1d and 2d cases, but the approach would
need to be different: we'd need to optimize scalar version fmodf(). Maybe
tricks with integer cast will be faster (since we are a bit optimistic in the
kernel and do not guarantee exact behavior in extreme cases such as NaN inputs).

Pull Request: https://projects.blender.org/blender/blender/pulls/137109
2025-04-09 13:40:10 +02:00