This is an implementation of thin film iridescence in the Principled BSDF based on "A Practical Extension to Microfacet Theory for the Modeling of Varying Iridescence".
There are still several open topics that are left for future work:
- Currently, the thin film only affects dielectric Fresnel, not metallic. Properly specifying thin films on metals requires a proper conductive Fresnel term with complex IOR inputs, any attempt of trying to hack it into the F82 model we currently use for the Principled BSDF is fundamentally flawed. In the future, we'll add a node for proper conductive Fresnel, including thin films.
- The F0/F90 control is not very elegantly implemented right now. It fundamentally works, but enabling thin film while using a Specular Tint causes a jump in appearance since the models integrate it differently. Then again, thin film interference is a physical effect, so of course a non-physical tweak doesn't play nicely with it.
- The white point handling is currently quite crude. In short: The code computes XYZ values of the reflectance spectrum, but we'd need the XYZ values of the product of the reflectance spectrum and the neutral illuminant of the working color space. Currently, this is addressed by just dividing by the XYZ values of the illuminant, but it would be better to do a proper chromatic adaptation transform or to use the proper reference curves for the working space instead of the XYZ curves from the paper.
Pull Request: https://projects.blender.org/blender/blender/pulls/118477
This was caused by the 2 transmission event approximation we
do for object with non-null thickness. This follows cycles
by using the square root of the color.
Resolves offset error when building index buffers on device for
cuves/hair strips using Triangles instead of TriangleStrips.
Authored by Apple: Michael Parkin-White
Pull Request: https://projects.blender.org/blender/blender/pulls/121218
Transport rays that enter to another location in the scene, with
specified ray position and normal. This may be used to render portals
for visual effects, and other production rendering tricks.
This acts much like a Transparent BSDF. Render passes are passed
through, and this is affected by light path max transparent bounces.
Pull Request: https://projects.blender.org/blender/blender/pulls/114386
Clamp some of the inputs of the Glossy BSDF, Glass BSDF, Sheen BSDF,
and Subsurface Scattering nodes to improve consistency between render
engines and to avoid unexpected results.
* Clamp roughness to 0..1
* Clamp subsurface radius to 0..inf
* Clamp colors to 0..inf
Pull Request: https://projects.blender.org/blender/blender/pulls/120390
This adds a new `Transform` type similar to cycles that reduces
the amount of data passed for a typical affine 3D transform.
This then applies this type to the light data and cleanup
all usage of the former `object_mat`. This also changes the axes
macros into utility accessor functions.
Pull Request: https://projects.blender.org/blender/blender/pulls/121089
This contains two thing:
- default (nothing connected to socket) uses the bounding box min axis.
- transform the value plugged to the socket to world space.
We arbitrarly choose to output the axis with the minimum extent since
it is the axis along which the object is usually viewed at.
Rel #120384
Pull Request: https://projects.blender.org/blender/blender/pulls/120607
- Expected results should come before actual result.
- Add test case for 8192 bytes as apple has a push constants size of 4096.
- Add more variation to the first order test data.
Improvements detected when working on vulkan backend and validated they
work on metal and opengl as well.
Pull Request: https://projects.blender.org/blender/blender/pulls/120557
The Perlin noise algorithms suffer from precision issues when a coordinate
is greater than about 250000.
To fix this the Perlin noise texture is repeated every 100000 on each axis.
This causes discontinuities every 100000, however at such scales this
usually shouldn't be noticeable.
Pull Request: https://projects.blender.org/blender/blender/pulls/119884
This limits the number of tilemaps per LOD that can be fed to avoid the
easy to hit "Too many shadow updates" (#119757).
This allows for a max 64 tilemaps to be updated at once at their lowest
requested LOD (so ~10.6667 point lights if every faces of the punctual
shadow map is needed, but likely more in practice).
Unfortunately this is still quite low and will surely be hit quite soon
with directional shadow added to it. One idea to workaround this would
be to time slice the update of some lights, but this opens a whole can
of worms that I'm not ready to open for now so I created #119890 for
future reference.
Some notes, most lights seems to request around 3 LODs. It might help
to allow requesting at least 2 LODs if we are rendering since volumes
might want lower LOD available for volumes.
I added a very simplistic heuristic that also lowers the max tilemaps
when transforming, animation playback or navigating the 3D view to
improve the responsiveness of the engine. Note that this doesn't
only lowers the resolution to the minimum requested one. So it should
be good enough in most cases.
Pull Request: https://projects.blender.org/blender/blender/pulls/119889
Resolves an issue with stroke rendering in
Metal using the geometry shader fallback
path. Stroke rendering now matches OpenGL
which should enable the GPencil fill tool to
function correctly at all zoom levels.
Authored by Apple: Michael Parkin-White
Pull Request: https://projects.blender.org/blender/blender/pulls/119660
This uses Spherical Harmonics to store the indirect lighting and
distant lighting visibility.
We can then reuse this information for each closure which divide
the cost of it by 2 or 3 in many cases, doing the scanning once.
The storage cost is higher than previous method, so we split the
resolution scaling to be independant of raytracing.
The spatial filtering has been split to its own pass for performance
reason. Upsampling now only uses 4 bilinearly interpolated samples
(instead of 9) using bilateral weights to avoid bleeding.
This also add a missing dot product (which soften the lighting
around corners) and fixes the blocky artifacts seen at lower
resolution.
Pull Request: https://projects.blender.org/blender/blender/pulls/118924
Simplifies/optimizes the "font" shader. It runs faster now too, but primarily
this is so that it loads/initializes faster.
* Instead of doing blur via individual bilinear samples (where each sample is 4
texel fetches), do raw texel fetches of the kernel footprint and compute final
result by shifting the kernel weights according to bilinear fraction weight.
For 5x5 blur, this reduces number of texel fetches from 64 down to 36.
* Instead of checking "is the texel inside the glyph box? if so, then fetch it",
first fetch it, and then set result to zero if it was outside. Simplifies the
branching code flow in the compiled GPU shader.
* Avoid costly integer modulo/division for "unwrapping" the font texture. The
texture width is always power of two size, so division/modulo can be replaced
by masking and a shift. Setup uniforms to contain the needed data.
### Fixes
* The 3x3 blur was not doing a 3x3 blur, due to a copy-pasta typo (one of the
sample offsets was repeated twice, and thus another sample offset was
missing).
* Blur towards left/top edges of the glyphs had artifacts, because float->int
casting in GLSL rounds towards zero, but the code actually wanted to round
towards floor.
Image of how the blur has changed in the PR.
### First time initialization
* Windows 10, NVIDIA RTX 3080Ti, OpenGL: 274.4ms -> 51.3ms
* macOS, Apple M1 Max, Metal: 456ms -> 289ms (this is including PSO creation
time).
### Shader performance/complexity
Performance I only measured on macOS (M1 Max), by making a BLF text that is
scaled up to cover most of screen via Python. Using Xcode Metal profiler,
drawing that text with 5x5 shadow blur: 1.5ms -> 0.3ms.
More performance analysis details in PR.
Pull Request: https://projects.blender.org/blender/blender/pulls/119653
Cryptomatte passes would generate a feathered outline
in Metal due to missing texture fence in chained
read->modify->write->read->... patterns.
Added imageFence function to explicitly state that
imageStore's should be visible to future imageLoad's.
Authored by Apple: Michael Parkin-White
Pull Request: https://projects.blender.org/blender/blender/pulls/119163
EEVEE-Next performes less on integrated GPUs then discrete GPUs.
Most shaders have been analyzed, but there will always be bottlenecks
related to architectural differences.
In order to make EEVEE-Next run smooth on integrated GPUs this change
will implement viewport pixel size option similar to Cycles. The main difference
is that the samples will still be weighted and up-sampled to the final film
resolution. This makes the pixels not look squared in the viewport but will
resolve to something close to the results without up-scaling.
This improves the performance especially on integrated GPUs. The improvement
for discrete GPUs are less noticeable. See here the stats when playing
`rain_restaurant.blend` back on a RAPHAEL_MENDOCINO iGPU.
| Pixel size | Frames per second |
|------------|-------------------|
| 1x | 0.25 FPS |
| 2x | 4.14 FPS |
| 4x | 6.90 FPS |
| 8x | 9.95 FPS |
Related to: #114597
See PR for some example images.
Pull Request: https://projects.blender.org/blender/blender/pulls/118903
This implement the design of #118961.
- Add aliases in GLSL since theses types are
not supported.
- Add detection mechanism that prevents usage
inside shader shared code.
Check is only done in debug build to avoid slowing down
application startup.
Pull Request: https://projects.blender.org/blender/blender/pulls/119226
This define all aliases for supported types,
document which one to use in C++ shared code,
move relevant defines to their backend file.
Rename `bool1` to `bool32_t` and cleanup
its usage as mentioned in #118961.
Rel. #118961
Pull Request: https://projects.blender.org/blender/blender/pulls/119098
This optimizes a few loops that become significant bottlenecks during
viewport rendering of scenes with large numbers of curves.
To render a curves object, Blender needs to generate a potentially
very large (but trivial) index buffer. As previously implemented,
this index buffer is generated in an extremely inefficient manner,
with a single-threaded loop and an explicit function call per entry.
The buffer then needs to be pushed onto the GPU, which is also a fairly
slow task.
The PR generates the index buffer directly on the GPU with compute
shader.
Pull Request: https://projects.blender.org/blender/blender/pulls/116617
The voronoi texture node only sets the first 3 components of the
color. The alpha value is never set. Normally this is covered
when using it in a shader node, but when directly connected to
the AOV output, the color was stored as a pure emissive color.
This resulted in incorrect colors in the viewport and image renders.
This is a partial fix for #118494
Pull Request: https://projects.blender.org/blender/blender/pulls/118497
The output of the Color Ramp node in the GPU compositor and EEVEE is
slightly off. That's because the factor is evaluated directly at the
sampler without proper half pixel offsets to account for the sampler's
linear interpolation, which this patch adds.
Pull Request: https://projects.blender.org/blender/blender/pulls/117677
Even if related, they don't have the same performance
impact.
To avoid any performance hit, we replace the Diffuse
by a Subsurface Closure for legacy EEVEE and
use the subsurface closure only where needed for
EEVEE-Next leveraging the random sampling.
This increases the compatibility with cycles that
doesn't modulate the radius of the subsurface anymore.
This change is only present in EEVEE-Next.
This commit changes the principled BSDF code so that
it is easier to follow the flow of data.
For legacy EEVEE, the SSS switch is moved to a
`radius == -1` check.
Along with the 4.1 libraries upgrade, we are bumping the clang-format
version from 8-12 to 17. This affects quite a few files.
If not already the case, you may consider pointing your IDE to the
clang-format binary bundled with the Blender precompiled libraries.
The GPU compositor incorrectly extrapolates values of RGBA curves node.
That's because the code introduces a half-pixel offset to the color
values since they will be used to sample the curve maps. Those same
values are then used for extrapolation, which shouldn't take the
half-pixel value into account.
This patch fixes that by computing sampler coordinate in a separate
step.
Pull Request: https://projects.blender.org/blender/blender/pulls/116586
Adds API to allow usage of specialization constants in shaders.
Specialization constants are dynamic runtime constants which can
be compiled into a shader pipeline state object (PSO) to improve
runtime performance by reducing shader complexity through
shader compiler constant-folding.
This API allows specialization constant values to be specified
along with a default value if no constant value has been declared.
Each GPU backend is then responsible for caching PSO permutations
against the current specialization configuration.
This patch adds support for specialization constants in the
Metal backend and provides a generalised high-level solution
which can be adopted by other graphics APIs supporting
this feature.
Authored by Apple: Michael Parkin-White
Authored by Blender: Clément Foucault (files in gpu/test folder)
Pull Request: https://projects.blender.org/blender/blender/pulls/115193
This modify the GBuffer layout to store less bits per closures.
This allows packing all closures into 64 bits or 96 bits.
In turn, this reduces the amount of data stored for most
usual materials.
Moreover, this contain some groundwork for the getting rid of the
hard-coded closure type. But evaluation shaders still use
the hard-coded types.
This adds tests for checking packing and unpacking of the gbuffer
doesn't loose any data.
Related to #115966
Pull Request: https://projects.blender.org/blender/blender/pulls/116476