Move all header file into namespace.
Unnecessary namespaces was removed from implementations file.
Part of forward declarations in header was moved in the top part
of file just to do not have a lot of separate namespaces.
Pull Request: https://projects.blender.org/blender/blender/pulls/121637
It was meant to be included into the previous commit in the area,
but was forgotten due to some technicalities.
Also remove the DisplaceSimpleOperation, which is now not used.
Pull Request: https://projects.blender.org/blender/blender/pulls/121580
The setting was only affecting some of the blur operations, which
does not typically results in a measurable performance boost in real
compositor setups.
For the simplicity of settings on user level remove setting which
potentially makes compositor output worse, without much benefit.
There are better ways to gain performance, like compositing on a
lower resolution, exposing "preview" as an input to the node tree
(similar to the geometry nodes) etc.
Pull Request: https://projects.blender.org/blender/blender/pulls/121576
This is because sse2neon.h might be used to emulate SSE intrinsics
on ARM64 architecture, and it uses some preprocessor which is not
available for C language when using MSVC.
The old-style math file math_matrix.c uses this header, so needed
to become C++. Simple rename did not work since there is a new math
utility math_matrix.cc exists. Following some existing convention
the math_matrix.c is renamed to math_matrix_c.cc. Eventually all the
code should switch to use C++ style math, and the C style removed,
so it seems reasonable to not mix old and new style of API in the
same file.
There should be no functional changes.
Pull Request: https://projects.blender.org/blender/blender/pulls/121335
This patch matches the size of the Fast Gaussian mode of blur with the
standard Gaussian mode. The sigma value was computed as half the radius,
while it should be third of the radius, since Blender's Gaussian
function is truncated at 3 of the standard deviation of the unit
Gaussian. The patch include versioning to adjust the size of existing
files.
Pull Request: https://projects.blender.org/blender/blender/pulls/121211
The compositor used to have a feature that would calculate tiles for the viewer based on a custom order. Since the removal of the tile based compositor, this code is unused.
Pull Request: https://projects.blender.org/blender/blender/pulls/121176
The File Output node writes single elements as full images in 4.1, while
such values were skipped in 4.0. This included invalid outputs, for
instance, when the Render Layers node does not have a result for the
selected view layer. Which would then just write an image with an
arbitrary color.
To fix this, we detect single element values and skip writing file
outputs for them.
Pull Request: https://projects.blender.org/blender/blender/pulls/120749
This patch replaces the is_set_operation flag with the
is_constant_operation flag to allow input constants to propagate
through the node tree using the constant folder.
The Vector Blur implementation slightly differs between CPU and GPU due
to different precision in the sqrt2 constant, so use the float variant
in CPU to make it closer to GPU.
This patch ports the GPU Vector Blur node to the CPU, which is in turn
ported from EEVEE. This is a breaking change since it produces different
motion blur results that are more similar to EEVEE's motion blur.
Further, the Curved, Minimum, and Maximum options were removed on the
user level since they are not used in the new implementation.
There are no significant changes to the code, except in the max velocity
computation as well as the velocity dilation passes. The GPU code uses
atomic indirection buffers, while the CPU runs single threaded for the
dilation pass, since it is a fast pass anyways. However, we impose
artificial constraints on the precision of the dilation process for
compatibility with the atomic implementation.
There are still tiny differences between CPU and GPU that I haven't been
able to solve, but I shall solve them in a later patch.
Pull Request: https://projects.blender.org/blender/blender/pulls/120135
This patch unifies the Defocus node between the CPU and GPU compositors.
Both nodes now use a variable sized bokeh kernel which is always odd
sized for a center pixel guarantee. Further the CPU implementation now
properly handles half pixel offsets when doing interpolation, and always
sets the threshold to zero similar to the GPU implementation.
This patch unifies the CPU and GPU implementation for the Bokeh Image
node. The bokeh is now evaluated at the center of pixels and allows
different sizes for the output kernel.
The Sun Beams node is off by half a pixel. That's because we add a half
pixel offset to the initial coordinates, but then sample the texture
with the half pixel. To fix thus, use the texture_bilinear_extend
utility to sample the image, which takes care of the half pixel offset.
The Plane Track and Corner Pin nodes are off by half a pixel because
their mask is computed at the pixel corners as opposed to their center,
where this patch changes the behavior to the latter.
The Directional Blur node is off by half a pixel because it transforms
the pixels at their corner as opposed to their center, where this patch
changes the behavior to the latter.
The Movie Distort node is off by half a pixel because it evaluates the
distortion at the corner of pixels as opposed to their center, where
this patch changes the behavior to the latter.
This patch refactors the backdrop offset to be stored as a float instead
of an int and to be stored in the image runtime structure instead of the
image itself.
Pull Request: https://projects.blender.org/blender/blender/pulls/119877
In the tiled compositor ensure_delta() can be called from multiple threads,
but without any threading synchronization. This worked fine when the node
only supported absolute transform: multiple threads would do the same work
and assign delta to the same values.
With the addition of relative transform in #115947 a code which adjusts
previously calculated delta was added, leading to possible double-applying
relative transform.
The solution is to avoid multiple threads modifying the same data by using
a double-locked check.
This issue does not happen in 4.2 (main branch) because it switched to full
frame compositor, which works differently.
Pull Request: https://projects.blender.org/blender/blender/pulls/119883
This patch ports the GLSL SMAA library to the CPU compositor in order to
unify the anti-aliasing behavior between the CPU and GPU compositor.
Additionally, the SMAA texture generator was removed since it is now
unused.
Previously, we used an external C++ library for SMAA anti-aliasing,
which is itself a port of the GLSL SMAA library. However, the code
structure and results of the library were different, which made it quite
difficult to match results between CPU and GPU, hence the decision to
port the library ourselves.
The port was performed through a complete copy of the library to C++,
retaining the same function and variable names, even if they are
different from Blender's naming conversions. The necessary code changes
were done to make it work in C++, including manually doing swizzling
which changes the code structure a bit.
Even after porting the library, there were still major differences
between CPU and GPU, due to different arithmetic precision. To fix this
some of the bilinear samplers used in branches and selections were
carefully changed to use point samplers to avoid discontinuities around
branches, also resulting in a nice performance improvement. Some slight
differences still exist due to different bilinear interpolation, but
they shall be looked into later once we have a baseline implementation.
The new implementation is slower than the existing implementation, most
likely due to the liberal use of bilinear interpolation, since it is
quite cheap on GPUs and the code even does more work to use bilinear
interpolation to avoid multiple texture fetches, except this causes a
slow down on CPUs. Some of those were alleviated as mentioned in the
previous section, but we can probably look into optimizing it further.
Pull Request: https://projects.blender.org/blender/blender/pulls/119414
This patch adds clamped boundaries variants of the nearest interpolation
functions in the BLI module. The naming convention used by the bilinear
functions were followed.
Needed by #119414.
Pull Request: https://projects.blender.org/blender/blender/pulls/119732
Previously, all information outside the render area did not get considered because transform operations such as translation and rotation might cause the output area to be smaller.
In this patch, the whole image gets flipped regardless of the output area, which makes the behavior consistent with the GPU compositor.
Pull Request: https://projects.blender.org/blender/blender/pulls/119278
Rendering with F12 causes some inconsistencies between canvases and areas of interest. This is due to some corrections of composite node canvas to fit to the render size, which are not necessary for the Viewer Node. This patch fixes the issue by considering the input image as a whole before translating.
Note: this is still consistent with immediate realization of transform nodes in GPU compositor, where nodes are to be evaluated from left to right.
Pull Request: https://projects.blender.org/blender/blender/pulls/119276
This patch implements the GPU Bloom glare for the CPU compositor, and
adds a new option for it, leaving the Fog Glow option unimplemented once
again for the GPU compositor.
Pull Request: https://projects.blender.org/blender/blender/pulls/119128
This patch unifies the anti-aliasing of plane deforms between the CPU
and GPU compositors. The CPU used a multi-sample approach, where the
mask was computed 8 times with a jitter, then averaged to get smooth
edges. The GPU relied on the anisotropic filtering with zero boundaries
to smooth the edges.
Furthermore, the CPU implementation ignored the anti-aliasing for the
deformed image and also relied anisotropic filtering like the GPU, so
its outputs were different.
To unify both implementation, we use the existing SMAA anti-aliasing
algorithm instead, and use the anti-aliased mask for the image output as
well. This affects both the Corner Pin and Plane Deform nodes.
A consequence of this change for the Plane Deform node is that motion
blur will appear to have less samples, that's because it was sampled
8-times more in the previous implementation. But users can just increase
the samples in the node to account for that.
Pull Request: https://projects.blender.org/blender/blender/pulls/118853
This patch unifies the variable size blur between the CPU and GPU
compositor. The difference is due to how weights are computed and used.
The CPU computed a nested array of weights for every possible size, that
is, from size 1 to the base size. Then, it assumed the kernel was
separable and reconstructed a 2D kernel by selecting two 1D weights
array and multiplying them for every pixel of the blur window.
The GPU on the other hand computes a single quadrant of the 2D weights
kernel and sampled it directly in the blur window. We favor the GPU
implementation since it makes no assumptions about the separability of
the weights kernel and since the CPU has no performance advantage even
with the assumption in place.
Pull Request: https://projects.blender.org/blender/blender/pulls/118834
New ("fullframe") CPU compositor backend is being used now, and all the code
related to "tiled" CPU compositor is just never used anymore. The new backend
is faster, uses less memory, better matches GPU compositor, etc.
TL;DR: 20 thousand lines of code gone.
This commit:
- Removes various bits and pieces related to "tiled" compositor (execution
groups, one-pixel-at-a-time node processing, read/write buffer operations
related to node execution groups).
- "GPU" (OpenCL) execution device, that was only used by several nodes of
the tiled compositor.
- With that, remove CLEW external library too, since nothing within Blender
uses OpenCL directly anymore.
Pull Request: https://projects.blender.org/blender/blender/pulls/118819
Blender would crash if the input of the Classic Kuwahara node is
translated. This is due to an out of bound access due to the miss-use of
IndexRange, where it was assumed to have a start-end constructor, while
it was in fact a start-size one. Fix this by computing the size from the
area and supplying it to the constructor.
This patch unifies the Keying node between the CPU and GPU. The
difference was due to imprecision when computing pixel saturation in the
CPU code. This typically happens when the pixel is gray-scale, and
should have a zero saturation, but it came out negative due to
imprecision. To fix this, reorder the mix expression so that the minimum
and maximum values are differenced first.
Ideally, the code shouldn't have such abrupt conditionals and should be
more robust to tiny imprecisions, but that would break the behavior and
should be evaluated separately.
Pull Request: https://projects.blender.org/blender/blender/pulls/118625