Currently, during baking each pixel stores a seed input that comes from the
Blender side. This is only needed for vertex color baking, however -
for regular image baking, we can just as well hash the pixel coordinates.
Therefore, we can save some memory (4 byte per pixel) by splitting the seed
info out into a separate pass and only storing it when needed.
Pull Request: https://projects.blender.org/blender/blender/pulls/122806
This patch implements blue-noise dithered sampling as described by Nathan Vegdahl (https://psychopath.io/post/2022_07_24_owen_scrambling_based_dithered_blue_noise_sampling), which in turn is based on "Screen-Space Blue-Noise Diffusion of Monte Carlo Sampling Error via Hierarchical Ordering of Pixels"(https://repository.kaust.edu.sa/items/1269ae24-2596-400b-a839-e54486033a93).
The basic idea is simple: Instead of generating independent sequences for each pixel by scrambling them, we use a single sequence for the entire image, with each pixel getting one chunk of the samples. The ordering across pixels is determined by hierarchical scrambling of the pixel's position along a space-filling curve, which ends up being pretty much the same operation as already used for the underlying sequence.
This results in a more high-frequency noise distribution, which appears smoother despite not being less noisy overall.
The main limitation at the moment is that the improvement is only clear if the full sample amount is used per pixel, so interactive preview rendering and adaptive sampling will not receive the benefit. One exception to this is that when using the new "Automatic" setting, the first sample in interactive rendering will also be blue-noise-distributed.
The sampling mode option is now exposed in the UI, with the three options being Blue Noise (the new mode), Classic (the previous Tabulated Sobol method) and the new default, Automatic (blue noise, with the additional property of ensuring the first sample is also blue-noise-distributed in interactive rendering). When debug mode is enabled, additional options appear, such as Sobol-Burley.
Note that the scrambling distance option is not compatible with the blue-noise pattern.
Pull Request: https://projects.blender.org/blender/blender/pulls/118479
On a M3 MacBook Pro, this change increases the benchmark score by 8% (with classroom seeing a path-tracing speedup of 15%).
The integrator state is currently store using struct-of-arrays, with one array per field. Such fine grained separation can result in poor GPU cache utilisation in cases where multiple fields of the same parent struct are accessed together. This PR changes the layout of the `ray`, `isect`, `subsurface`, and `shadow_ray` structs so that the data is interleaved (per parent struct) instead of separate. To try and keep this change localised, I encapsulated the layout change by extending the integrator state access macros, however maybe we want to do this more explicitly? (e.g. by updating every bit of code that accesses these parts of the state). Feedback welcome.
Pull Request: https://projects.blender.org/blender/blender/pulls/122015
This fixes#69535 and #98930.
We use a equi-solid-angle sampling algorithm for rectangular area lights,
but it is not particularly robust for small area lights (either small
in general and/or small because it's being viewed from grazing angles).
The actual sampling part is fine since it just gets clamped into the
valid area anyways, and the difference isn't notable for small lights.
However, we also need to compute the solid angle to get the sampling PDF,
and that computation is quite sensitive to numerical issues for small
values.
Therefore, this commit adds a fallback path for small values, which instead
uses the classic equi-area sampling PDF term times the area-to-solid-angle
Jacobian term. This approximation assumes that all points on the light have
the same distance and angle to the sampling point, which is of course not
strictly the case, but it's close enough for small area lights and better
than failing altogether.
Pull Request: https://projects.blender.org/blender/blender/pulls/122323
Reformulates some terms in the equi-solid-angle rectangle sampling code to
handle small area lamps better, and allows for some rounding error in the
check whether the sampled position is inside the area light.
Pull Request: https://projects.blender.org/blender/blender/pulls/122323
In the original paper, the falloff inside `bcone.theta_e` is assumed to
be `pi/2`, which is too large for spot light and resulted in an
overestimation near the cone boundary.
To address this issue, attenuate the energy of a spot light using the
minimal possible angle formed by the light axis and the shading point
when traversing the light tree.
Ref: #122362
Pull Request: https://projects.blender.org/blender/blender/pulls/122667
Since the previous fix to properly support volumes and transparent objects
it became very easy to make it so the intersection loop takes all 1024
iterations to find intersections.
This change makes it so the number of intersection is limited by the max
number of volume/transparent bounces.
This should minimize possible performance impact of the previous fix.
Pull Request: https://projects.blender.org/blender/blender/pulls/122448
We can't do the optimization to shorten the ray when we might still need
to go through transparent surfaces or volumes to reach the light.
This issue was not light tree specific, however in the test file it was
more noticable because the light tree poorly handles some areas. This in
in turn causes MIS weights for forward path tracing to become higher,
which is where the error was.
Pull Request: https://projects.blender.org/blender/blender/pulls/122404
This enables scenes with all textures not fitting in GPU
memory to finally render. For scenes that are fitting,
no functional change or performance change is expected.
Pull Request: https://projects.blender.org/blender/blender/pulls/122385
One of the properties of Perlin noise is that it always evaluates to 0.0
when not normalized (or 0.5 when normalized) when the input consists of
only whole integers in all vector components.
Blender's Perlin noise implementation uses single precision floats with
a machine epsilon of 1.19e-07 meaning that for numbers that are greater
than 1/(1.19e-07) = 8.40e6 there mantissa doesn't have any bits left to
store a rational part of the number, effectively meaning that any number
greater than 8.40e6 is a whole integer as far as Blender is concerned.
Therefore when evaluating Perlin noise for any coordinates greater than
that it always results in 0.0 (or 0.5 when normalized).
This fix works as follows: If the original input number is larger than
1.0e6 it is offset by 0.5 after it underwent modulo, which always outputs
numbers in a [0.0, 1.0e5) range leaving the mantissa room for a rational
part. This way the quantization error still persists however the outputs
are random again instead of a constant 0.0.
Pull Request: https://projects.blender.org/blender/blender/pulls/122112
The refactor in 97d9bbbc97 changed the way q is computed in the spherical triangle sampling code. While the new approach is more efficient and saves a few operations, it introduces numerical precision issues for skinny/small (spherical) triangles.
Therefore, this change moves the computation of q back to the method from the paper, while keeping the more efficient solid angle computation.
Pull Request: https://projects.blender.org/blender/blender/pulls/119224
when ray exceeds `max_bounce`, we do not allocate any closure at
intersection. However, Ray Portal BSDF still added `SD_BSDF` flag,
resulting in undefined behavior in
`integrate_surface_bsdf_bssrdf_bounce()`.
This part of code was similar to Transparent BSDF, however, Transparent
closure was still allocated in this case.
To fix the undefined behavior, add `SD_BSDF` flag only when the Ray
Portal closure was allocated.
This PR contains optimisations and a general tidy-up of the MetalRT backend.
- Currently `scene_intersect` is used for both normal and (opaque) shadow rays, however the usage patterns are different enough to warrant specialisation. Shadow intersection tests (flagged with `PATH_RAY_SHADOW_OPAQUE`) only need a bool result, but need a larger "self" payload in order to exclude hits against target lights. By specialising we can minimise the payload size in each case (which is helps performance) and avoid some dynamic branching. This PR introduces a new `scene_intersect_shadow` function which is specialised in Metal, and currently redirects to `scene_intersect` in the other backends.
- Currently `scene_intersect_local` is implemented for worst-case payload requirements as demanded by `subsurface_disk` (where `max_hits` is 4). The random_walk case only demands 1 hit result which we can retrieve directly from the intersector object (rather than stashing it in the payload). By specialising, we significantly reduce the payload size for random_walk queries, which has a big impact on performance. Additionally, we only need to use a custom intersection function for the first ray test in a random walk (for self-primitive filtering), so this PR forces faster `opaque` intersection testing for all but the first random walk test.
- Currently `scene_intersect_volume` has a lot of redundant code to handle non-triangle primitives despite volumes only being enclosed by trimeshes. This PR removes this code.
Additionally, this PR tidies up the convoluted intersection function linking code, removes some redundant intersection handlers, and uses more consistent naming of intersection functions.
On a M3 MacBook Pro, these changes give 2-3% performance increase on typical scenes with opaque trimesh materials (e.g. barbershop, classroom junkshop), but can give over 15% performance increase for certain scenes using random walk SSS (e.g. monster).
Pull Request: https://projects.blender.org/blender/blender/pulls/121397
This is an implementation of thin film iridescence in the Principled BSDF based on "A Practical Extension to Microfacet Theory for the Modeling of Varying Iridescence".
There are still several open topics that are left for future work:
- Currently, the thin film only affects dielectric Fresnel, not metallic. Properly specifying thin films on metals requires a proper conductive Fresnel term with complex IOR inputs, any attempt of trying to hack it into the F82 model we currently use for the Principled BSDF is fundamentally flawed. In the future, we'll add a node for proper conductive Fresnel, including thin films.
- The F0/F90 control is not very elegantly implemented right now. It fundamentally works, but enabling thin film while using a Specular Tint causes a jump in appearance since the models integrate it differently. Then again, thin film interference is a physical effect, so of course a non-physical tweak doesn't play nicely with it.
- The white point handling is currently quite crude. In short: The code computes XYZ values of the reflectance spectrum, but we'd need the XYZ values of the product of the reflectance spectrum and the neutral illuminant of the working color space. Currently, this is addressed by just dividing by the XYZ values of the illuminant, but it would be better to do a proper chromatic adaptation transform or to use the proper reference curves for the working space instead of the XYZ curves from the paper.
Pull Request: https://projects.blender.org/blender/blender/pulls/118477
This PR fixes the (currently unused) scene-based selective feature compilation macros. These feature based macros haven't been used for a few years, and enabling them currently results in compilation errors.
The only functional change in this PR is in geom/primitive.h where undef-ing `__HAIR__` had exposed an inconsistency in how pointcloud attributes were being fetched. Using the more general `primitive_surface_attribute_float4` (instead of `curve_attribute_float4`) fixed a compilation error that occurred when rendering pointcloud unit test scenes with adaptive compilation enabled.
Pull Request: https://projects.blender.org/blender/blender/pulls/121216
Transport rays that enter to another location in the scene, with
specified ray position and normal. This may be used to render portals
for visual effects, and other production rendering tricks.
This acts much like a Transparent BSDF. Render passes are passed
through, and this is affected by light path max transparent bounces.
Pull Request: https://projects.blender.org/blender/blender/pulls/114386
Clamp some of the inputs of the Glossy BSDF, Glass BSDF, Sheen BSDF,
and Subsurface Scattering nodes to improve consistency between render
engines and to avoid unexpected results.
* Clamp roughness to 0..1
* Clamp subsurface radius to 0..inf
* Clamp colors to 0..inf
Pull Request: https://projects.blender.org/blender/blender/pulls/120390
This replaces the fixed Tangent input in BsdfNode::compile
with a custom input.
This is done because very few nodes actually use the tangent input
and it would be better to have this slot available for other inputs
on different nodes in the future.
Pull Request: https://projects.blender.org/blender/blender/pulls/119042
by ensuring `KernelLight.lightgroup` is properly assigned in
`device_update_light()`. The value is later retrieved via
`lamp_lightgroup(kg, lamp)`.
`light_manager->device_update()` is called after
`background->device_update()`, so the background light group should
already been assigned when `device_update_lights()` is called.
Pull Request: https://projects.blender.org/blender/blender/pulls/120714
use available `film_pass_pixel_render_buffer()` to access the pointer
to the render buffer.
For shadow state, a similar function `film_pass_pixel_render_buffer_shadow()`
is created, because `shadow_path` instead of `path` is needed.
it is difficult to keep in mind when MIS weight is needed, better to
handle this logic in the lower-level functions.
This reduces code duplication in many places.
This is a regression since fdc2962beb
The size of state can not be different between CPU and GPU.
This change replaces compile-time condition with a kernel feature
check, which solves the render regression on AMD Metal. It also
minimizes the state size on other GPUs when Light Tree is disabled.
Pull Request: https://projects.blender.org/blender/blender/pulls/120476
This is a regression in 4.1, caused by 36e603c430.
For unbiased MIS weight in light tree, we should use sd->N for
mis_origin_n, since sc->N is not available in NEE.
The change also makes it so we do not sample lights below sd->N even
when bump map correction is disabled. This diverges from the original
idea of giving full control to artists, but ensures the internal math
is happy.
Pull Request: https://projects.blender.org/blender/blender/pulls/120216
fdc2962beb indirectly introduced a change
in inlining (light_tree_pdf started getting inlined) that led to a 5-10%
drop in performance for most scenes.
Dropping the noinline keyword for oneAPI device recovers it.
It however brings another performance regression to MNEE and Raytrace
kernels, that we'll look into separately.
The Perlin noise algorithms suffer from precision issues when a coordinate
is greater than about 250000.
To fix this the Perlin noise texture is repeated every 100000 on each axis.
This causes discontinuities every 100000, however at such scales this
usually shouldn't be noticeable.
Pull Request: https://projects.blender.org/blender/blender/pulls/119884
it is not clear from which point the `cos_theta_u` should be computed in
volume segment, so the original implementation was mixing the closest point
and the point where the minimal angle is formed.
Use the closest point on segment as a conservative measure.
Pull Request: https://projects.blender.org/blender/blender/pulls/119965
in the test scene `all_light_types_in_volume.blend`, `theta - theta_o -
theta_u` is slightly above the threshold. Even if we do a strict check
with `acos` on the failing cases, it will go to the other branch and
deliver a result which is also 1.0f. Better to relax the threshold.
During the some of the shading for volumetrics, Cycles would try to
write to a variable that does not exist if the device has the
light tree disabled.
At the moment this only impacts AMD GPUs with the Metal backend.
Pull Request: https://projects.blender.org/blender/blender/pulls/119906
The same random number was used when sampling from the volume segment
and from the direct scattering position, causing correlation issues with
light tree.
To solve this problem, we ensure the same light is picked for
volume segment/direct scattering, equiangular/distance sampling by
sampling the light tree only once in volume segment. From the direct
scattering position in volume, we sample a position on the picked light
as usual. If sampling from the light tree fails, we continue with
indirect scattering.
For unbiased MIS weight for forward sampling, we retrieve the `P`, `D`
and `t` used in volume segment for traversing the light tree.
The main changes are:
1. `light_tree_sample()` and `light_distribution_sample()` now only pick
lights. Sampling a position on light is done separately via
`light_sample()`.
2. `light_tree_sample()` is now only called only once from volume
segment. For direct lighting we call `light_sample()`.
3. `light_tree_pdf()` now has a template `<in_volume_segment>`.
4. A new field `emitter_id` is added to struct `LightSample`, which just
stores the picked emitter index.
5. Additional field `previous_dt = ray->tmax - ray->tmin` is added to
`state->ray`, because we need this quantity for computing the pdf.
6. Distant/Background lights are also picked by light tree in volume
segment now, because we have no way to pick them afterwards. The direct
sample event for these lights will be handled by
`VOLUME_SAMPLE_DISTANCE`.
7. Original paper suggests to use the maximal importance, this results
in very poor sampling probability for distant and point lights therefore
excessive noise. We have a minimal importance for surface to balance, we
could do the same for volume but I do not want to spend much time on
this now. Just doing `min_importance = 0.0f` seems to do the job
okayish. This way we still won't sample the light with zero
`max_importance`.
The current solution might perform worse with distance sampling, because
the light tree measure is biased towards equiangular sampling. However,
it is difficult to perform MIS between equiangular and distance sampling
if different lights are picked for each method. This is something we can
look into in the future if proved to be a serious regression.
Pull Request: https://projects.blender.org/blender/blender/pulls/119389