Ever since the introduction of GPU OIDN denoising on CPU devices,
using the path_tracing_device info to pick the automatic denoiser has
typically led to incorrect results.
This commit fixes this issue by using the denoising device info to pick
the denoiser.
Pull Request: https://projects.blender.org/blender/blender/pulls/123593
Precompiled Cycles kernels make up a considerable fraction of the total size of
Blender builds nowadays. As we add more features and support for more
architectures, this will only continue to increase.
However, since these kernels tend to be quite compressible, we can save a lot
of storage by storing them in compressed form and decompressing the required
kernel(s) during loading.
By using Zstandard compression with a high level, we can get decent compression
ratios (~5x for the current kernels) while keeping decompression time low
(about 30ms in the worse case in my tests). And since we already require zstd
for Blender, this doesn't introduce a new dependency.
While the main improvement is to the size of the extracted Blender installation
(which is reduced by ~400-500MB currently), this also shrinks the download on
Windows, since .zip's deflate compression is less effective. It doesn't help on
Linux since we're already using .tar.xz there, but the smaller installed size
is still a good thing.
See #123522 for initial discussion.
Pull Request: https://projects.blender.org/blender/blender/pulls/123557
Cycles automatic denoiser picker assumed that OIDN could not be
run on the GPU while the CPU was the render device. So if the user was
using their CPU for rendering, the automatic denoiser picker would
"fallback" to a different denoiser (OptiX or CPU OIDN). This was true
in Blender 4.1, but changed in 4.2. The UI assumed that OIDN could run
on the GPU if there was a compatible OIDN GPU device.
This lead to a issue on systems using the CPU for rendering
while having a NVIDIA GPU installed in the system. The
UI suggested that OIDN would be used, and would switch between
CPU and GPU depending on user preferences. But the automatic
denoiser picker in Cycle's backend said OIDN could not run on
the GPU in this situation and would always "fallback" to the
OptiX denoiser running on the NVIDIA GPU.
This created a mismatch between the UI and what Cycles was
acutally doing. This issue did not effect other GPU vendors because
their "fallback" was the OIDN denoiser.
This commit fixes this issue by aligning the Cycles automatic
denoiser picker in the backend with the UI. Using OIDN if a GPU
is supported, falling back to OptiX if it's not supported,
falling back to OIDN CPU if OptiX isn't supported,
then falling back to no denoiser if that's not supported.
Pull Request: https://projects.blender.org/blender/blender/pulls/123530
Almost certainly not an issue in current codebase (this 'copy' version
of `MEM_cnew` does not seem much used in the first place), but better be
consistent with the 'allocating' version.
Pull Request: https://projects.blender.org/blender/blender/pulls/123445
Sync a bit better the checks on the alignment value between
`MEM_lockfree_mallocN_aligned` and `MEM_guarded_mallocN_aligned`.
The only significant change, in `MEM_guarded_mallocN_aligned`, is the
usage of `ALIGNED_MALLOC_MINIMUM_ALIGNMENT` instead of 'magic value' `8`.
This should not have any effect on 64bits platforms, but on 32bits ones
the minimum alignment would be reduced from `8` to `4` now.
NOTE: we could also consider making these checks part of a utils
function, instead of duplicating them in the codebase.
Cycles runs a check to see if the camera is possibly inside a
volumetric object by seeing if the bounding box of the camera
and volumetric object intersect.
If the calculation is wrong (Cycles says the camera is outside the
volume when it's inside it), then the volume will not render properly.
This commit resolves most of these issues by making the camera
bounding box larger than before, taking into consideration
features like:
1. The impact DOF could have on the camera ray start position and how
that should impact the bounding box size.
2. Taking into consideration near clipping, which was missed from the
orthographic camera due to a oversight in a previous commit
(08cc73a9bb).
Pull Request: https://projects.blender.org/blender/blender/pulls/123341
The callables generated by OSL reference other external functions
(defined in the OSL services module), in which case OptiX cannot
calculate the right stack size just based on the callable alone, it needs to
know all functions linked together in the pipeline to get to an accurate
result. `optixProgramGroupGetStackSize` has an optional pipeline
argument for this purpose, so make use of that to ensure the correct
stack size is calculated.
Ref #122779
Pull Request: https://projects.blender.org/blender/blender/pulls/123368
This makes it more verbose, and a little clearer that devices prior to 8cx Gen3 are not supported in >=v4.0. It makes the error message from #113674 more prominent than just being printed to cout.
Spurred by an email I got from someone trying to run blender on a Surface Pro X, and getting the not very helpful (to old devices) error.
Pull Request: https://projects.blender.org/blender/blender/pulls/122732
The base-4 Owen scrambling hash needs a seed value that's somewhat random-
looking, so the default value of 0 causes problems. Hashing the input seed
avoids this.
To avoid changing the noise pattern in pre-4.2 scenes, this hash is only
applied to blue-noise patterns.
Pull Request: https://projects.blender.org/blender/blender/pulls/123274
`num_distribution` in `KernelIntegrator` has type `int`, which holds a
maximal value of 2147483647. However, when computing the distribution,
`size_t` is used, which can go beyond this value and result in a
negative value when converted to `int`.
This PR handles this case as an error, stops rendering and suggests
alternative solutions.
Also early return when `use_light_tree`. The block was there because
`num_distribution` was needed for light tree before bfd1836861.
Pull Request: https://projects.blender.org/blender/blender/pulls/123177
Extract
- Cycles denoiser enum.
- Extensions user preferences UI.
- Node operator poll message from new node function.
Improve
- Split "(Enabled|Disabled) on startup, overriding the preference."
into two messages.
Disambiguate
- "Add" when describing the action of adding something should use the
Operator context.
- "Dimensions", in noise textures.
- "Transform" as a noun, the matrix transform type of Geometry Nodes,
as opposed to the verb to move things in space.
- "Parent" as a noun or verb (the parent of an object, to parent an
object to another).
Some issues reported by Satoshi Yamasaki, deathblood, and Gabriel Gazzán.
Pull Request: https://projects.blender.org/blender/blender/pulls/122969
* Always define root directories in LIBDIR even when not needed,
to silence some warnings.
* Only show warnings about not finding libs when oneAPI is enabled.
* Prefix message for context.
Light linking was never working correctly in volume segment with light
tree, because `sd->object` was not assigned, thus
`light_link_receiver_nee(kg, sd)` always returned `OBJECT_NONE`, causing
the light tree sample to fail. This problem was revealed by fdc2962beb
since now the same light is used for volume segment and volume.
Also ensure we don't sample position on the light if sampling from
volume segment is failed, by setting `emitter_id` to `EMITTER_NONE` in
such cases.
Pull Request: https://projects.blender.org/blender/blender/pulls/122999
Since #118841 there are more cases where Cycles would check for the
graphics interop support. This could lead to a crash when graphics
interop functions are called without having active graphics context.
This change makes it so there is no graphics interop calls when doing
headless render. In order to achieve this the device creation is now
aware of the headless mode.
Pull Request: https://projects.blender.org/blender/blender/pulls/122844
Additional requirement is to have OpenImageDenoiser, and the devices
should not support OIDN denoiser.
Reproduced here in the studio with a system on Linux with either double
Quadro GP100 cards, and Limnux with Quadro 6000 + Quadro 6000 ADA.
The reason for the crash is that the find_best_device() might return
nullptr, and it was never checked.
Pull Request: https://projects.blender.org/blender/blender/pulls/122823
Previously, Cycles would render up to 4SPP during viewport navigation when
using reduced resolution, even when the overall number of samples was set
lower.
This causes problems with the blue-noise pattern, so ensure that the
number of samples is always clamped to the configured maximum.
This is a proper fix for the issue worked around in 11d311e300.
Previously, an incorrect condition adjustment of the device info
was done for a preferred device. Now, this change reverts that
condition, and the adjustment is done correctly and unconditionally.
This patch implements blue-noise dithered sampling as described by Nathan Vegdahl (https://psychopath.io/post/2022_07_24_owen_scrambling_based_dithered_blue_noise_sampling), which in turn is based on "Screen-Space Blue-Noise Diffusion of Monte Carlo Sampling Error via Hierarchical Ordering of Pixels"(https://repository.kaust.edu.sa/items/1269ae24-2596-400b-a839-e54486033a93).
The basic idea is simple: Instead of generating independent sequences for each pixel by scrambling them, we use a single sequence for the entire image, with each pixel getting one chunk of the samples. The ordering across pixels is determined by hierarchical scrambling of the pixel's position along a space-filling curve, which ends up being pretty much the same operation as already used for the underlying sequence.
This results in a more high-frequency noise distribution, which appears smoother despite not being less noisy overall.
The main limitation at the moment is that the improvement is only clear if the full sample amount is used per pixel, so interactive preview rendering and adaptive sampling will not receive the benefit. One exception to this is that when using the new "Automatic" setting, the first sample in interactive rendering will also be blue-noise-distributed.
The sampling mode option is now exposed in the UI, with the three options being Blue Noise (the new mode), Classic (the previous Tabulated Sobol method) and the new default, Automatic (blue noise, with the additional property of ensuring the first sample is also blue-noise-distributed in interactive rendering). When debug mode is enabled, additional options appear, such as Sobol-Burley.
Note that the scrambling distance option is not compatible with the blue-noise pattern.
Pull Request: https://projects.blender.org/blender/blender/pulls/118479
On a M3 MacBook Pro, this change increases the benchmark score by 8% (with classroom seeing a path-tracing speedup of 15%).
The integrator state is currently store using struct-of-arrays, with one array per field. Such fine grained separation can result in poor GPU cache utilisation in cases where multiple fields of the same parent struct are accessed together. This PR changes the layout of the `ray`, `isect`, `subsurface`, and `shadow_ray` structs so that the data is interleaved (per parent struct) instead of separate. To try and keep this change localised, I encapsulated the layout change by extending the integrator state access macros, however maybe we want to do this more explicitly? (e.g. by updating every bit of code that accesses these parts of the state). Feedback welcome.
Pull Request: https://projects.blender.org/blender/blender/pulls/122015
This is an oversight of #122543, for which benchmarking was done in
the headless mode.
The solution is to tweak policy a little bit, and keep refresh intervals
low for the first 10 seconds of render, after which increase updates to
every 15 seconds. Doing so allows:
- Have quick cancel of complex files when the error is noticed during
the first few samples.
- Have more predictable cancel time after long render.
- Mitigate the performance regression.
This does not fully solve the regression, but it makes it much more
manageable. There are some compromises to be done from the performance
for the UI renders. The interactivity is also not as fantastic, but it
could be solved later by introducing some "Instant Cancel" operations
which would be able to also stop render in the middle of a sample.
Performance measured with the Spring file (path tracing time in seconds):
Samples: 300 1024 2048
Base (prior to #122543): 29.1 85.4 174.1
This patch: 37.0 95.7 180.2
This is measured on M2 Ultra GPU render.
The penalty is close to a constant time (the time within which a more
interactive cancel is possible.
Pull Request: https://projects.blender.org/blender/blender/pulls/122658
A regression since #118841.
It is possible that the selected preference device is not found, in which
case a default-initialized DeviceInfo would have added to the list. This
device is set to CPU, but with differnet other fields (such as description)
compared to the actual CPU device.
Pull Request: https://projects.blender.org/blender/blender/pulls/122701