The Glare node shifts the color of the highlights when the threshold is
high. That's because the thresholding algorithm simply subtracts the
threshold from the RGB data, which is not expected to retain the same hue
of the color.
To fix this, we do the thresholding only on the luminance of the color
in HSV color space. This eliminates the color shifting and also helps to
smooth the edges of the highlights.
This is a breaking change, but it is more of a fix rather than a change
of behavior.
Pull Request: https://projects.blender.org/blender/blender/pulls/122570
This change fixes:
- Constant folder uses rect of size INT_MIN .. INT_MAX, which overflows integer
when calculating size.
- Calculation of an offset inside of input can lead to integer overflow for big
resolutions.
- texture_bilinear_extend() and texture_nearest_extend() do not seem to handle
single element buffers correctly.
- Kuwahara has division of zero when the input size is 0.
Pull Request: https://projects.blender.org/blender/blender/pulls/122674
The issue is visible when adding an assert in the delta accessors of
the TranslateOperation operation (get_delta_x and get_delta_y), and
rendering compositor-nodes-desintegrate-wipe-01.blend (either command
line or F12, doesn't matter).
Seems that under certain circumstances the system might skip determining
the area of interest. For such cases ensure delta from the beginning of
the threaded code.
Pull Request: https://projects.blender.org/blender/blender/pulls/122686
This patch also fixes a crash when image input of `Stabilize2DNode` is unconnected and interpolation is set to bicubic.
Differences to GPU compositor:
- ~1px difference is observed due to different rounding of domain size vs. canvas size. This difference is constant for all image sizes.
- If image input is unconnected but other inputs are, CPU compositor doesn't consider the operation to be constant, whereas GPU compositor still outputs a constant result.
Pull Request: https://projects.blender.org/blender/blender/pulls/122288
This patch adds support for timing GPU compositor executions. This was
previously not possible since there was no mechanism to measure GPU
calls, which is still the case. However, since 2cf8b5c4e1, we now flush
GPU calls immediately for interactive editing, so we can now measure the
GPU evaluation on the host, which is not a very accurate method, but it
is better than having no timing information. Therefore, timing is only
implemented for interactive editing.
This is different from the CPU implementation in that it measures the
total evaluation time, including any preprocessing of the inputs like
implicit type conversion as well as things like previews.
The profiling implementation was moved to the realtime compositor since
the compositor module is optional.
Pull Request: https://projects.blender.org/blender/blender/pulls/122230
The issue was caused by a non-initialized ibuf_ used to set the non-color
space to. For now limit the tweaks to the image data-block, which is always
ensured. This makes it consistent with the GPU compositor which does not
have access to ibuf at the moment when the meta-data is being checked.
Ideally we do need to set the color space, but it needs to happen consistently,
and in a thread-safe manner. The way how the CPU compositor releases the
ibuf right after acquisition does not feel safe, so less we rely on it is
better.
The issue originates to the change in default view transform from Filmic
to AgX, which does slightly different clipping, and clips color to black
if there is any negative values.
This change implements an idea of skipping view transform for viewer
node when it is connected to the Pick output of the cryptomatte node.
It actually goes a bit deeper than this and any operation can tag its
result as a non-color data, and the viewer node will respect that.
It is achieved by passing some extra meta-data along the evaluation
pipeline. For the CPU compositor it is done via MetaData, and for the
GPU compositor it is done as part of Result.
Connecting any other node in-between of viewer and Cryptomatte's Pick
will treat the result as color values, and apply color management.
Connecting Pick to the Composite output will also consider it as color,
since there is no concept of non-color-managed render result.
An alternative approaches were tested, including:
- Doing negative value clamping at the viewer node.
It does not work for legacy cryptomatte node, as it needs to have
access to original non-modified Pick result.
- Change the order of components, and store ID in another channel.
Using one of other of Green or Blue channels might work for some view
transforms, but it does not work for AgX.
Using Alpha channel seemingly works better, but it is has different
issues caused by the fact that display transform de-associates alpha,
leading to over-exposed regions which are hard to see in the file from
the report. And might lead to the similar issues as the initial report
with other objects or view transforms.
- Use positive values in the Pick channel.
It does make things visible, but they are all white due to the nature
of how AgX works, making it not so useful as a result.
Pull Request: https://projects.blender.org/blender/blender/pulls/122177
Conversion of compositor node tree to operation is done in a job thread,
and the main thread might modify the image data-block at the same time.
This change fixes it by making it so compositor uses acquire/release
semantic for the image data-block, and making it so the image locks its
render result, preventing other threads from modifying it.
Ref #121761
Pull Request: https://projects.blender.org/blender/blender/pulls/122105
Image's render result might get freed from another thread while the
compositor is running.
Add an utility function which invokes callback on the image's stamp
data from a thread-guarded block.
Ref #118337, #121761
Pull Request: https://projects.blender.org/blender/blender/pulls/121907
Image operation's get_im_buf() function was not thread-safe:
- It had TOCTOU issue around calculating multi-layer indices and
requesting to load the image buffer.
- It accessed render result, render layer and pass pointers without
any thread guards.
This change moves all the logic needed to access the image buffer
into a single function with proper guards around the access. The
result is user-counted, so it is usable in a thread even if another
thread modifies the image.
The is still potential TOCTOU in the compositor since the image is
acquired twice: once from init_execution(), and once from the
determine_canvas(). It could cause issues if image resolution is
changed between these calls. It is still to be looked into.
Ref #118337, #121761
This is avoids reference to data which can potentially be freed from the main
thread while the compositor job is running.
There is still some direct access to RenderResult and access to its layers
and passes in the operation implementation, but is is all internal and will
be worked on later. The purpose of this patch is to avoid unsafe pointers in
the API of the operation.
Should be no functional changes.
Ref #118337, #121761
There are a couple of goals achieved with this change:
- The logic itself is de-duplicated between the Image and Cryptomatte
nodes.
- The logic which accesses render results, images, etc is more local
to the place where it needs to be used. Currently it does not matter
too much, but it allows to properly guard the access to be thread
safe.
Ref #118337, #121761
Allows to modify the user without worrying to store/restore old values,
potentially resolving threading conflicts.
Should be no changes on user level.
Ref #118337, #121761
This patch implements the Fog Glow Glare node by porting the CPU
implementation, so it is not GPU accelerated and is not expected to be
realtime. However, after d4bf23771d, it is now fast enough to be usable,
see that commit for more information on the implementation.
The only difference is that the kernel part of the convolution is cached
in the realtime compositor, so it should be about 30% faster than CPU
for interactive editing.
In the future, this implementation will be replaced by a proper GPU
implementation, likely based on VkFFT.
Optimize the Fog Glow glare code by making sure TBB is used for
threading, it only uses the needed space for the frequency domain, and
only load the TLD storage once for every threaded invocation.
The Sun Beams node produces NaNs when the ray length option is zero.
This is due to zero division in the code, which we avoid by skipping
computation altogether when the ray length is zero.
This patches optimizes the Fog Glow Glare node to be about 25x faster
for 4K images. This is mainly achieved by utilizing the FFTW library and
multi-threading support code. Further improvements are still possible by
caching kernels, but the CPU compositor does not support caching yet.
The old Hartley transform was removed, so the node no longer works when
FFTW is disabled as a build time option, much like the OIDN node. A new
BLI library was introduced for FFTW, it includes some helper routines
relevant for FFTW as well as an initialization routine that sets up
multithreading using TBB as well as thread safety.
Build system support for threaded FFTW was also added, which defines the
relevant variables to detect threading support as well as add the
relevant libraries.
We do not currently have the threaded FFTW libs in our precompiled libs,
so the threading code is disabled until the libs lands in the coming
weeks. So currently, the code is only about 9x faster.
The only functional change is that the kernel is now odd sized, which
should produce more accurate results, but the final result is almost
identical and mostly undetectable.
The plan is to port this to the GPU as well similar to how we implement
OIDN until we have a GPU FFT implementation. GPU compositor can also do
caching, so it should be faster, being able to compute a 4K image in
under half a second.
Pull Request: https://projects.blender.org/blender/blender/pulls/121653
Effectively, make GPU compositor available without need to enable
an experimental feature set.
The compositor device is now exposed in the Performance panel of
Render Buttons. It is also still available in the compositor's
N-panel, together with some other options which are more about how
editing works, and not exactly related to render performance.
Pull Request: https://projects.blender.org/blender/blender/pulls/121398
Move all header file into namespace.
Unnecessary namespaces was removed from implementations file.
Part of forward declarations in header was moved in the top part
of file just to do not have a lot of separate namespaces.
Pull Request: https://projects.blender.org/blender/blender/pulls/121637
This allows to expose these settings in the Performance panel in the
render buttons. Also moves compositor-specific options away from the
generic node tree structure.
For the backwards-compatibility the options are still present in the
DNA for the bNodeTree. This is to minimize the impact on the Studio
which has used the GPU compositor for a while now. They can be
removed in a future release.
There is no functional changes expected.
Pull Request: https://projects.blender.org/blender/blender/pulls/121583
It was meant to be included into the previous commit in the area,
but was forgotten due to some technicalities.
Also remove the DisplaceSimpleOperation, which is now not used.
Pull Request: https://projects.blender.org/blender/blender/pulls/121580
The setting was only affecting some of the blur operations, which
does not typically results in a measurable performance boost in real
compositor setups.
For the simplicity of settings on user level remove setting which
potentially makes compositor output worse, without much benefit.
There are better ways to gain performance, like compositing on a
lower resolution, exposing "preview" as an input to the node tree
(similar to the geometry nodes) etc.
Pull Request: https://projects.blender.org/blender/blender/pulls/121576
There are few issues with the logic and implementation of this option:
- While the first pass is faster in the terms of a wall-clock time, it
is often not giving usable results to artists, as the final look of
the result is so much different from what it is expected to be.
- It is not supported by the GPU compositor.
- It is based on some static rules based on the node type, rather than
on the apparent computational complexity.
The performance settings are planned to be moved to the RenderData, and
it is unideal to carry on such limited functionality to more places. There
are better approaches to quickly provide approximated results, which we can
look into later.
Pull Request: https://projects.blender.org/blender/blender/pulls/121558
The node uses multiple transform operations, each using its own sampling. Using bicubic sampling in the translate node causes undesired offsets.
These changes were intentional (see test updates in `c4e1be73`), but the offsets were thought to be too small for users to notice.
Pull Request: https://projects.blender.org/blender/blender/pulls/121495
The Map UV and Displace nodes produce unexpected outputs. That's because
the derivatives computed for anisotropic filter were computed in the
sampler's space, while it should be in texel space, as expected bu the
textureGrad function.
The Corner Pin node as well as the Plane Track Deform nodes always
return a single color that appears to be the average of the input.
That's because the derivatives were computed in the sampler's space,
while they should be in texel space. Large derivatives meant that the
textureGrad function would always sample the lowest MIP level, hence the
constant average color.
This might be an issue with other uses of textureGrad in the compositor,
so their use should be investigated.
The Blur node takes too long to execute even though it is in a simple
configuration. That's because the CPU compositor uses variable size
blurring even if the size is constant. So ensure the input size is
actually variable before using variable size blurring.
This is because sse2neon.h might be used to emulate SSE intrinsics
on ARM64 architecture, and it uses some preprocessor which is not
available for C language when using MSVC.
The old-style math file math_matrix.c uses this header, so needed
to become C++. Simple rename did not work since there is a new math
utility math_matrix.cc exists. Following some existing convention
the math_matrix.c is renamed to math_matrix_c.cc. Eventually all the
code should switch to use C++ style math, and the C style removed,
so it seems reasonable to not mix old and new style of API in the
same file.
There should be no functional changes.
Pull Request: https://projects.blender.org/blender/blender/pulls/121335
This patch implements the Fast Gaussian blur mode for the Realtime
Compositor. This is a faster but less accurate implementation of
Gaussian blur.
This is implemented as a recursive Gaussian blur algorithm based on the
general method outlined in the following paper:
Hale, Dave. "Recursive gaussian filters." CWP-546 (2006).
In particular, based on the table in Section 5 Conclusion, for very low
radius blur, we use a direct separable Gaussian convolution. For medium
blur radius, we use the fourth order IIR Deriche filter based on the
following paper:
Deriche, Rachid. Recursively implementating the Gaussian and its
derivatives. Diss. INRIA, 1993.
For high radius blur, we use the fourth order IIR Van Vliet filter based
on the following paper:
Van Vliet, Lucas J., Ian T. Young, and Piet W. Verbeek. "Recursive
Gaussian derivative filters." Proceedings. Fourteenth International
Conference on Pattern Recognition (Cat. No. 98EX170). Vol. 1. IEEE,
1998.
That's because direct convolution is faster and more accurate for very
low radius, while the Deriche filter is more accurate for medium blur
radius, while Van Vliet is more accurate for high blur radius. The
criteria suggested by the paper is a sigma value threshold of 3 and 32
for the Deriche and Van Vliet filters respectively, which we apply on
the larger of the two dimensions.
Both the Deriche and Van Vliet filters are numerically unstable for high
blur radius. So we decompose the Van Vliet filter into a parallel bank
of smaller second order filters based on the method of partial fractions
discussed in the book:
Oppenheim, Alan V. Discrete-time signal processing. Pearson Education
India, 1999.
We leave the Deriche filter as is since it is only used for low radii
anyways.
Compared to the CPU implementation, this implementation is more
accurate, but less numerically stable, since CPU uses doubles, which is
not feasible for the GPU.
The only change of behavior between CPU and this implementation is that
this implementation uses the same radius, so Fast Gaussian will match
normal Gaussian, while the CPU implementation has a radius that is 1.5x
the size of normal Gaussian. A patch to change the CPU behavior #121211.
Pull Request: https://projects.blender.org/blender/blender/pulls/120431
This patch matches the size of the Fast Gaussian mode of blur with the
standard Gaussian mode. The sigma value was computed as half the radius,
while it should be third of the radius, since Blender's Gaussian
function is truncated at 3 of the standard deviation of the unit
Gaussian. The patch include versioning to adjust the size of existing
files.
Pull Request: https://projects.blender.org/blender/blender/pulls/121211
Blender crashes when a File Output was created when the Use Preview
option of the render output settings was enabled. This is a regression
introduced in 931c188ce5, where the image format of the node was
initialized from the scene image settings, so the preview option was
carried over, yet it is not supported nor exposed to the user in the
File Output node. To fix this, ensure the file output code will ignore
previews.
The compositor used to have a feature that would calculate tiles for the viewer based on a custom order. Since the removal of the tile based compositor, this code is unused.
Pull Request: https://projects.blender.org/blender/blender/pulls/121176
Blender crashes when the user scales up an image into huge scales. This
is due to integer overflow when doing memory allocations. So we fix this
by using larger integers when doing math on memory allocations and
indexing.
Note that while this solves the random crashes, users scaling up images
to huge scales will face things like the OOM killer activating or at
best significant slow downs due to swapping.
It is not clear how to handle such cases, but something like a global
maximum size option that is set to 16k by default might be worth adding.
Pull Request: https://projects.blender.org/blender/blender/pulls/120921
The File Output node writes single elements as full images in 4.1, while
such values were skipped in 4.0. This included invalid outputs, for
instance, when the Render Layers node does not have a result for the
selected view layer. Which would then just write an image with an
arbitrary color.
To fix this, we detect single element values and skip writing file
outputs for them.
Pull Request: https://projects.blender.org/blender/blender/pulls/120749
This patch replaces the is_set_operation flag with the
is_constant_operation flag to allow input constants to propagate
through the node tree using the constant folder.
This PR adds a context function to consider all
buffer bindings obsolete. This is in order to
track missing binds and invalid lingering states
accross `draw::Pass`es.
The functions `GPU_storagebuf_debug_unbind_all`
and `GPU_uniformbuf_debug_unbind_all` do nothing
more than resetting the internal debug slot bits
to zero. This is what OpenGL backend does as it
doesn't track the bindings themselves.
Other backends might have other way to detect
missing bindings. If not they should be
implemented separately anyway.
I renamed the function to `debug_unbind_all` to
denote that it actually does something related to
debugging.
This also add SSBO binding check for OpenGL as it
was also missing.
#### Future
This error checking logic is pretty much backend
agnostic. While it would be nice to move it at
`gpu::Context` level, we don't have the resources
for that now.
Pull Request: https://projects.blender.org/blender/blender/pulls/120716