EEVEE crashes when it is not able to allocate buffers. Previously we had a
message showing to the user that it tries to allocate a texture larger than
supported by the GPU. But was not implemented for EEVEE-next. This fix will
add back this error message.

Pull Request: https://projects.blender.org/blender/blender/pulls/134725
This patch adds support for multi-pass compositing for EEVEE. This is
done by copying the passes used by the compositor node tree to the DRW
view data, which can then be accessed by the viewport compositor.
The viewport compositor will fallback to the viewport texture or an
invalid output of the passes were not initialized, this is currently the
case for any render engine that is not EEVEE.
A future optimization that we can do is eliminate the film pass copy
shaders and only copy the data that EEVEE rendered, which can be a
subset of the viewport for border rendering. This is not done at the
moment because not all engines support passes at the moment, so the
compositor expects full viewport passes.
Depends on: #123685, #123817, #123815.
Pull Request: https://projects.blender.org/blender/blender/pulls/123378
This fixes a few issues and clear up some confusion
in the code.
Note that this changes the behavior of render region;
they now reduce the internal render size. This is
matching the new design documentation.
- Data passes have correct accumulation.
- Adhere to naming conventions for extents, film and render pixels.
- Jitter over final pixels first before doing random sampling
in order to speed up convergence.
- Ensure enough sample to cover at least all the film pixels once.
- Always include the four neighbor pixels in case one is nearer.
- Fix projection matrix computation to align overscan pixels.
Pull Request: https://projects.blender.org/blender/blender/pulls/124735
EEVEE Film accumulation workaround for Metal/Intel iGPUs.
On Metal the Intel iGPUs do not support image read write
on array textures. However this limitation doesn't show
any artifacts when using the compute shader.
This PR is a work around that uses the film_comp shader
to process the film samples, but uses a separate film_copy_frag
shader to read the result and copy them to the frame buffer.
I deliberately didn't include the fix to the film_frag shader
as that would change the read/write resources and could lead
to performance issues for other platforms. Writable resources
are typically slower compared to read only resources.
Some code needed to be duplicated (and not added to `*_lib.glsl`)
as compilers would still raise compilation errors due to imageStore/Load
on incompatible resource access.
The Metal/Intel iGPU is also marked to have limited support as
raytracing and probes still produces big artifacts.
This workaround can be tested on any platform just by setting
`use_compute_ = true` in `Film::sync`
Related to #122361
Pull Request: https://projects.blender.org/blender/blender/pulls/123330
EEVEE has a transparent render pass that renders materials where
the render mode is set to blended. This was introduced when EEVEE-Next
was still in development, but was never fully backported.
This PR adds the missing pieces.
Pull Request: https://projects.blender.org/blender/blender/pulls/123298
EEVEE-Next performes less on integrated GPUs then discrete GPUs.
Most shaders have been analyzed, but there will always be bottlenecks
related to architectural differences.
In order to make EEVEE-Next run smooth on integrated GPUs this change
will implement viewport pixel size option similar to Cycles. The main difference
is that the samples will still be weighted and up-sampled to the final film
resolution. This makes the pixels not look squared in the viewport but will
resolve to something close to the results without up-scaling.
This improves the performance especially on integrated GPUs. The improvement
for discrete GPUs are less noticeable. See here the stats when playing
`rain_restaurant.blend` back on a RAPHAEL_MENDOCINO iGPU.
| Pixel size | Frames per second |
|------------|-------------------|
| 1x | 0.25 FPS |
| 2x | 4.14 FPS |
| 4x | 6.90 FPS |
| 8x | 9.95 FPS |
Related to: #114597
See PR for some example images.
Pull Request: https://projects.blender.org/blender/blender/pulls/118903
On lower end hardware the film accumulation has bad performance. Sometimes
upto 10ms. This PR improves the performance somewhat by adding a
specialization constant around the renderpasses that are actually needed for
rendering, the number of samples and if reprojection is enabled.
`enabled_categories`: Based on the enabled render passes some outer loops are
enabled/disabled that handle the specific render passes. This improves the performance
as no memory will be reserved for branches that are never accessed.
`samples_len` & `use_reprojection`: GPU compilers tend to optimize texture fetches
when they to the outer loop. This is only possible when the inner loop can be unrolled.
In the case of the film accumulation the inner loop couldn't be unrolled. By adding a
specialization constant would allow unrolling of the inner loop.
On old or low-end devices the improvement is around 40%. On newer devices
the improvement is 50+%. Performance of this shader is similar to
the godot.
| GPU | Before | New |
|----------------------|--------|-------|
| NVIDIA GTX 760 | 3.5ms | 2.4ms |
| GFX1036 (RDNA2 iGPU) | 9.9ms | 6.2ms |
| AMD Radeon Pro W7500 | 2.1ms | 0.9ms |
Pull Request: https://projects.blender.org/blender/blender/pulls/118385
Adds API to allow usage of specialization constants in shaders.
Specialization constants are dynamic runtime constants which can
be compiled into a shader pipeline state object (PSO) to improve
runtime performance by reducing shader complexity through
shader compiler constant-folding.
This API allows specialization constant values to be specified
along with a default value if no constant value has been declared.
Each GPU backend is then responsible for caching PSO permutations
against the current specialization configuration.
This patch adds support for specialization constants in the
Metal backend and provides a generalised high-level solution
which can be adopted by other graphics APIs supporting
this feature.
Authored by Apple: Michael Parkin-White
Authored by Blender: Clément Foucault (files in gpu/test folder)
Pull Request: https://projects.blender.org/blender/blender/pulls/115193
The `VIEWLAYER_PT_eevee_next_layer_passes_data` class name was
re-used by mistake for Workbench Next in ba982119cd,
and the actual EEVEE Next class was then removed in 678dc456e3.
This adds back the UI as it was, and the missing passes (Vector and
Position) it referenced.
Pull Request: https://projects.blender.org/blender/blender/pulls/112162
Merge all the small UBOs used by the engine to save binding slots.
Each module is still responsible for filling its own data (by storing a
reference to it at construction) and this data is still private for other
modules.
The engine instance pushes the data to the GPU at the end of
`end_sync`, so only the modules that modify their data outside of the
sync functions need to manually call `Instance::push_uniform_data()`.
Pull Request: https://projects.blender.org/blender/blender/pulls/112046
Fix render regions and make them work with overscan:
- Clarify and improve the naming used by the Film module.
- Convert Display texel coordinates to Film texel coordinates before
calling `film_process_data`, and convert from Film texel coordinates to
Render texel coordinates in `film_sample_get`.
- Returns the actual display extent (and not the film extent) in
`Film::display_extent_get` so the overscan camera matrix is computed
correctly when using render regions.
Regression caused by 567a2e5a6f.
Pull Request: https://projects.blender.org/blender/blender/pulls/111691
Including <iostream> or similar headers is quite expensive, since it
also pulls in things like <locale> and so on. In many BLI headers,
iostreams are only used to implement some sort of "debug print",
or an operator<< for ostream.
Change some of the commonly used places to instead include <iosfwd>,
which is the standard way of forward-declaring iostreams related
classes, and move the actual debug-print / operator<< implementations
into .cc files.
This is not done for templated classes though (it would be possible
to provide explicit operator<< instantiations somewhere in the
source file, but that would lead to hard-to-figure-out linker error
whenever someone would add a different template type). There, where
possible, I changed from full <iostream> include to only the needed
<ostream> part.
For Span<T>, I just removed print_as_lines since it's not used by
anything. It could be moved into a .cc file using a similar approach
as above if needed.
Doing full blender build changes include counts this way:
- <iostream> 1986 -> 978
- <sstream> 2880 -> 925
It does not affect the total build time much though, mostly because
towards the end of it there's just several CPU cores finishing
compiling OpenVDB related source files.
Pull Request: https://projects.blender.org/blender/blender/pulls/111046
Listing the "Blender Foundation" as copyright holder implied the Blender
Foundation holds copyright to files which may include work from many
developers.
While keeping copyright on headers makes sense for isolated libraries,
Blender's own code may be refactored or moved between files in a way
that makes the per file copyright holders less meaningful.
Copyright references to the "Blender Foundation" have been replaced with
"Blender Authors", with the exception of `./extern/` since these this
contains libraries which are more isolated, any changed to license
headers there can be handled on a case-by-case basis.
Some directories in `./intern/` have also been excluded:
- `./intern/cycles/` it's own `AUTHORS` file is planned.
- `./intern/opensubdiv/`.
An "AUTHORS" file has been added, using the chromium projects authors
file as a template.
Design task: #110784
Ref !110783.
Port Ambient Occlusion to EEVEE Next.
Add support for the AO Node and the AO RenderPass.
AO shading integration is still missing.
This also fixes the Shadow RenderPass.
Pull Request: https://projects.blender.org/blender/blender/pulls/108398
A lot of files were missing copyright field in the header and
the Blender Foundation contributed to them in a sense of bug
fixing and general maintenance.
This change makes it explicit that those files are at least
partially copyrighted by the Blender Foundation.
Note that this does not make it so the Blender Foundation is
the only holder of the copyright in those files, and developers
who do not have a signed contract with the foundation still
hold the copyright as well.
Another aspect of this change is using SPDX format for the
header. We already used it for the license specification,
and now we state it for the copyright as well, following the
FAQ:
https://reuse.software/faq/
This change adds cryptomatte render passes to EEVEE-Next. Due to the upcoming viewport
compositor we also improved cryptomatte so it will be real-time. This also allows viewing
the cryptomatte passes in the viewport directly.
{F13482749}
A surface shader would store any active cryptomatte layer to a texture. Object hash is stored
as R, Asset hash as G and Material hash as B. Hashes are only calculated when the cryptomatte
layer is active to reduce any unneeded work.
During film accumulation the hashes are separated and stored in a texture array that matches
the cryptomatte standard. For the real-time use case sorting is skipped. For final rendering
the samples are sorted and normalized.
NOTE: Eventually we should also do sample normalization in the viewport in order to extract the correct
mask when using the viewport compositor.
Reviewed By: fclem
Maniphest Tasks: T99390
Differential Revision: https://developer.blender.org/D15753
This is a port of the previous implementation but using compute
shaders instead of using the raster pipeline for every steps.
Only the scatter passes is kept as a raster pass for obvious performance
reasons.
Many steps have been rewritten to take advantage of LDS which allows faster
and simpler downsampling and filtering for some passes.
A new stabilize phase has been separated from another setup pass in order
to improve it in the future with better stabilization.
The scatter pass shaders and pipeline also changed. We now use indirect
drawcall to draw quads using triangle strips primitives. This reduces
fragment shader invocation count & overdraw compared to a bounding
triangle. This also reduces the amount of vertex shader invocation
drastically to the bare minimum instead of having always 3 verts per
4 pixels (for each ground).
The new implementation leverage compute shaders to reduce the
number of passes and complexity.
The max blur amount is now detected automatically, replacing the property
in the render panel by a simple checkbox.
The dilation algorithm has also been rewritten from scratch into a 1 pass
algorithm that does the dilation more efficiently and more precisely.
Some differences with the old implementation can be observed in areas with
complex motion.
A few offsets were missing.
Reminder that this does not change the actual render resolution but it
reduces the VRAM consumption of accumulation buffers.
The swaps during accumulation were ignored because of the way the
`SwapChain<>` implementation works.
Using external references and updating them fixes the issue.
The improvements over the old implementation are:
- Improved history reprojection filter (catmull-rom)
- Use proper velocity for history reprojection.
- History clipping is now done in YCoCg color space using better algorithm.
- Velocity is dilated to keep correct edge anti-aliasing on moving objects.
As a result, the 3x3 blocks that made the image smoother in the previous
implementation are no longer visible is replaced by correct antialiasing.
This removes the velocity resolve pass in order to reduce the bandwidth
usage. The velocities are just resolved as they are loadded in the film
pass.
This modules handles renderpasses allocation and filling. Also handles
blitting to viewport framebuffer and render result reading.
Changes against the old implementation:
- the filling of the renderpasses happens all at once requiring
only 1 geometry pass.
- The filtering is optimized with weights precomputed on CPU and
reuse of neighboor pixels.
- Only one accumulation buffer for renderpasses (no ping-pong).
- Accumulation happens in one pass for every passes using a single
dispatch or fullscreen triangle pass.
TAA and history reprojection is not yet implemented.
AOVs support is present but with a 16 AOV limit for now.
Cryptomatte is not yet implemented.