This patch optimizes the Distance Dilate code by limiting the search
window to areas within the image, avoiding fallback image sampling and
looping redundant pixels. Gives a 1.8x improvement.
The GPU and new CPU compositors leak memory when an output is connected
to multiple inputs inside the same pixel operation. That's because nodes
do not know that their multiple outgoing links are in fact going to the
same operation, so their initial reference count is more than the actual
reference count.
To fix this, we keep track of the reference count of pixel operation
inputs and release the inputs based on that reference count.
This patch specializes the symmetric separable variable size blur code
for different types. Additionally, now-unused generic type functions
were removed, and unused GPU specialization was removed since they are
no longer free due to CPU support. Gives a 2x improvement.
This patch specializes the symmetric separable blur code for different
types. Additionally, now-unused generic type functions were removed, and
unused float2 specialization was removed since it is no longer free due
to CPU support. Gives a 2x improvement.
Move the Gamma Correction pass of blur nodes into its own algorithm to
avoid code duplication and optimize pixel access, since gamma is now
applied for each pixel in the filter window. Gives a 15% improvement.
Pull Request: https://projects.blender.org/blender/blender/pulls/131480
Optimize pixel access in the new CPU compositor by specializing pixel
load and store for the type of the result that is being loaded or
stored. Gives up to 10% improvement.
Pull Request: https://projects.blender.org/blender/blender/pulls/131441
The new CPU compositor ignores group inputs that are unlinked. This
patch fixes that by considering all origin sockets in multi-function
procedures, be it an input or an output.
Transform node in the new CPU compositor crash in background mode
because of a call to GPU_max_texture_size where the GPU module is not
initialized. To fix this, restrict this call to GPU device and use 2^16
as an upper limit for CPU.
Implements #130836:
interpolate_*_wrapmode_fl take InterpWrapMode wrap_u and wrap_v arguments.
U and V coordinate axes can have different wrap modes: clamp/extend,
border/zero, wrap/repeat.
Note that this removes inconsistency where cubic interpolation was
returning zero for samples completely outside the image, but all other
functions were not, and the behavior was not matching the function
documentation either.
Use the new functions in the new compositor CPU backend.
Possible performance impact for other places (e.g. VSE): measured on
4K resolution, transformed (scaled and rotated) 4K EXR image:
- Nearest filter: no change,
- Bilinear filter: no change,
- Cubic BSpline filter: slight performance decrease, IMB_transform
19.5 -> 20.7 ms (Ryzen 5950X, VS2022). Feels acceptable.
Pull Request: https://projects.blender.org/blender/blender/pulls/130893
We can't use the `threshold` uniform name in Metal because it is used as
a local variable in one of the library files, because uniforms are
defines in Metal, so it causes an error. Change the name to
`color_threshold` as a fix.
This implements the proposal from #124512. For that it contains the following
changes:
* Remove the global override of `new`/`delete` when `WITH_CXX_GUARDEDALLOC` was
enabled.
* Always use `MEM_CXX_CLASS_ALLOC_FUNCS` where it is currently used. This used
to be guarded by `WITH_CXX_GUARDEDALLOC` in some but not all cases. This means
that a few classes which didn't use our guarded allocator by default before,
are now using it.
Pull Request: https://projects.blender.org/blender/blender/pulls/130181
NOTE: This also required some changes to Cycles code itself, who is now
directly including `BKE_image.hh` instead of declaring a few prototypes
of these functions in its `blender/utils.h` header (due to C++ functions
names mangling, this was not working anymore).
Pull Request: https://projects.blender.org/blender/blender/pulls/130174