Blender crashes when the user scales up an image into huge scales. This
is due to integer overflow when doing memory allocations. So we fix this
by using larger integers when doing math on memory allocations and
indexing.
Note that while this solves the random crashes, users scaling up images
to huge scales will face things like the OOM killer activating or at
best significant slow downs due to swapping.
It is not clear how to handle such cases, but something like a global
maximum size option that is set to 16k by default might be worth adding.
Pull Request: https://projects.blender.org/blender/blender/pulls/120921
The File Output node writes single elements as full images in 4.1, while
such values were skipped in 4.0. This included invalid outputs, for
instance, when the Render Layers node does not have a result for the
selected view layer. Which would then just write an image with an
arbitrary color.
To fix this, we detect single element values and skip writing file
outputs for them.
Pull Request: https://projects.blender.org/blender/blender/pulls/120749
This patch replaces the is_set_operation flag with the
is_constant_operation flag to allow input constants to propagate
through the node tree using the constant folder.
This PR adds a context function to consider all
buffer bindings obsolete. This is in order to
track missing binds and invalid lingering states
accross `draw::Pass`es.
The functions `GPU_storagebuf_debug_unbind_all`
and `GPU_uniformbuf_debug_unbind_all` do nothing
more than resetting the internal debug slot bits
to zero. This is what OpenGL backend does as it
doesn't track the bindings themselves.
Other backends might have other way to detect
missing bindings. If not they should be
implemented separately anyway.
I renamed the function to `debug_unbind_all` to
denote that it actually does something related to
debugging.
This also add SSBO binding check for OpenGL as it
was also missing.
#### Future
This error checking logic is pretty much backend
agnostic. While it would be nice to move it at
`gpu::Context` level, we don't have the resources
for that now.
Pull Request: https://projects.blender.org/blender/blender/pulls/120716
The compositor execute functions have a `rendering` argument to specify
if the compositor is executing as part of the render pipeline. But the
render context argument is null if we are not rendering, so the
`rendering` arguement is redundant and can be removed.
Additionally, we no longer use use_file_output as a hack to detect
rendering.
Pull Request: https://projects.blender.org/blender/blender/pulls/120659
This patch improves the interactivity of the GPU compositor for
interactive node tree edits by waiting on GPU work to finish to support
more granular canceling.
This does have a performance penalty, but it does not affect final
rendering and it seems worth it because even though it is slower it will
feel faster for users.
Pull Request: https://projects.blender.org/blender/blender/pulls/120656
This patch adds support for inter-operation canceling. Though it should
be noted that canceling will actually not take place except in certain
circumstances because operations are already submitted to the GPU by
this point and can't be canceled.
However, for operations that do CPU<->GPU transfers, like OIDN denoise,
which is one of the most expensive operations, this will actually cancel
the evaluation and greatly improve interactivity.
Pull Request: https://projects.blender.org/blender/blender/pulls/119917
The Vector Blur implementation slightly differs between CPU and GPU due
to different precision in the sqrt2 constant, so use the float variant
in CPU to make it closer to GPU.
This patch ports the GPU Vector Blur node to the CPU, which is in turn
ported from EEVEE. This is a breaking change since it produces different
motion blur results that are more similar to EEVEE's motion blur.
Further, the Curved, Minimum, and Maximum options were removed on the
user level since they are not used in the new implementation.
There are no significant changes to the code, except in the max velocity
computation as well as the velocity dilation passes. The GPU code uses
atomic indirection buffers, while the CPU runs single threaded for the
dilation pass, since it is a fast pass anyways. However, we impose
artificial constraints on the precision of the dilation process for
compatibility with the atomic implementation.
There are still tiny differences between CPU and GPU that I haven't been
able to solve, but I shall solve them in a later patch.
Pull Request: https://projects.blender.org/blender/blender/pulls/120135
Set the w component to 1 in Float to Vector implicit conversion for the
Realtime Compositor. That's because the Vector Output node expects a
vector for the velocity input, but the CPU compositor fakes it as a
color input, while assumes a fourth component of 1.
Retain the alpha channel in Color to Vector implicit conversion for the
Realtime Compositor. That's because speed passes are exposed as color
sockets, so connecting them to a node like Vector Blur will lose the
y component of the next vectors.
The CPU compositor handles that by faking vector sockets as color
sockets internally, while the realtime compositor does not do that.
This patch unifies the Defocus node between the CPU and GPU compositors.
Both nodes now use a variable sized bokeh kernel which is always odd
sized for a center pixel guarantee. Further the CPU implementation now
properly handles half pixel offsets when doing interpolation, and always
sets the threshold to zero similar to the GPU implementation.
This patch unifies the CPU and GPU implementation for the Bokeh Image
node. The bokeh is now evaluated at the center of pixels and allows
different sizes for the output kernel.
The Sun Beams node is off by half a pixel. That's because we add a half
pixel offset to the initial coordinates, but then sample the texture
with the half pixel. To fix thus, use the texture_bilinear_extend
utility to sample the image, which takes care of the half pixel offset.
The Lens Distortion node is off by half a pixel because their normalized
coordinates were at the pixel corners as opposed to their centers, where
this patch changes the behavior to the latter.
The Plane Track and Corner Pin nodes are off by half a pixel because
their mask is computed at the pixel corners as opposed to their center,
where this patch changes the behavior to the latter.
The Directional Blur node is off by half a pixel because it transforms
the pixels at their corner as opposed to their center, where this patch
changes the behavior to the latter.
The Movie Distort node is off by half a pixel because it evaluates the
distortion at the corner of pixels as opposed to their center, where
this patch changes the behavior to the latter.
The compositor does not correctly free, wrap, or allocate images. That's
because the resetting mechanism didn't reset all members. This patch
rests all members and only retains the important ones.
This patch refactors the backdrop offset to be stored as a float instead
of an int and to be stored in the image runtime structure instead of the
image itself.
Pull Request: https://projects.blender.org/blender/blender/pulls/119877
In the tiled compositor ensure_delta() can be called from multiple threads,
but without any threading synchronization. This worked fine when the node
only supported absolute transform: multiple threads would do the same work
and assign delta to the same values.
With the addition of relative transform in #115947 a code which adjusts
previously calculated delta was added, leading to possible double-applying
relative transform.
The solution is to avoid multiple threads modifying the same data by using
a double-locked check.
This issue does not happen in 4.2 (main branch) because it switched to full
frame compositor, which works differently.
Pull Request: https://projects.blender.org/blender/blender/pulls/119883
This patch ports the GLSL SMAA library to the CPU compositor in order to
unify the anti-aliasing behavior between the CPU and GPU compositor.
Additionally, the SMAA texture generator was removed since it is now
unused.
Previously, we used an external C++ library for SMAA anti-aliasing,
which is itself a port of the GLSL SMAA library. However, the code
structure and results of the library were different, which made it quite
difficult to match results between CPU and GPU, hence the decision to
port the library ourselves.
The port was performed through a complete copy of the library to C++,
retaining the same function and variable names, even if they are
different from Blender's naming conversions. The necessary code changes
were done to make it work in C++, including manually doing swizzling
which changes the code structure a bit.
Even after porting the library, there were still major differences
between CPU and GPU, due to different arithmetic precision. To fix this
some of the bilinear samplers used in branches and selections were
carefully changed to use point samplers to avoid discontinuities around
branches, also resulting in a nice performance improvement. Some slight
differences still exist due to different bilinear interpolation, but
they shall be looked into later once we have a baseline implementation.
The new implementation is slower than the existing implementation, most
likely due to the liberal use of bilinear interpolation, since it is
quite cheap on GPUs and the code even does more work to use bilinear
interpolation to avoid multiple texture fetches, except this causes a
slow down on CPUs. Some of those were alleviated as mentioned in the
previous section, but we can probably look into optimizing it further.
Pull Request: https://projects.blender.org/blender/blender/pulls/119414
This patch unifies the sRGB to Linear color space conversion between the
CPU and GPU compositors. This is because CPU uses an optimized path that
produces values that are very slightly off. To fix this, for the GPU, we
do the conversion CPU side instead of doing it in a shader. Since images
are cached, the performance implications are not significant.
Another added benefit is that we no longer get differences due to the
order of alpha pre-multiplication and sRGB conversion, demonstrated in
#114305. And we no longer require any preprocessing of the images.
This patch adds some new utilities to the Image Buffer module to assign
float, byte, and compressed buffers along with their color spaces. It
also adds an ownership flag to compressed data. Those were added as a
way to facilitate the implementation.
Pull Request: https://projects.blender.org/blender/blender/pulls/118624
This patch adds a utility function for nearest interpolation with
clamped boundaries and normalized coordinates to the MemoryBuffer class.
Similar to the GLSL texture() function.
This patch adds clamped boundaries variants of the nearest interpolation
functions in the BLI module. The naming convention used by the bilinear
functions were followed.
Needed by #119414.
Pull Request: https://projects.blender.org/blender/blender/pulls/119732