Commit Graph

2077 Commits

Author SHA1 Message Date
Campbell Barton
7f7648c6ed Cleanup: spelling in code comments & minor edits
- Use uppercase NOTE: tags.
- Correct bNote -> bNode.
- Use colon after parameters.
- Use doxy-style doc-strings.
2024-06-06 09:55:13 +10:00
Omar Emara
4f21b26675 Fix #121639: Glare shifts the colors of highlights
The Glare node shifts the color of the highlights when the threshold is
high. That's because the thresholding algorithm simply subtracts the
threshold from the RGB data, which is not expected to retain the same hue
of the color.

To fix this, we do the thresholding only on the luminance of the color
in HSV color space. This eliminates the color shifting and also helps to
smooth the edges of the highlights.

This is a breaking change, but it is more of a fix rather than a change
of behavior.

Pull Request: https://projects.blender.org/blender/blender/pulls/122570
2024-06-04 20:59:32 +02:00
Sergey Sharybin
44ce0e01cd Fix #122669: Compositor: Corrupted output and occasional crash
This change fixes:
- Constant folder uses rect of size INT_MIN .. INT_MAX, which overflows integer
  when calculating size.
- Calculation of an offset inside of input can lead to integer overflow for big
  resolutions.
- texture_bilinear_extend() and texture_nearest_extend() do not seem to handle
  single element buffers correctly.
- Kuwahara has division of zero when the input size is 0.

Pull Request: https://projects.blender.org/blender/blender/pulls/122674
2024-06-04 15:48:48 +02:00
Sergey Sharybin
5401a69627 Fix: Compositor translate uses un-initialized delta
The issue is visible when adding an assert in the delta accessors of
the TranslateOperation operation (get_delta_x and get_delta_y), and
rendering compositor-nodes-desintegrate-wipe-01.blend (either command
line or F12, doesn't matter).

Seems that under certain circumstances the system might skip determining
the area of interest. For such cases ensure delta from the beginning of
the threaded code.

Pull Request: https://projects.blender.org/blender/blender/pulls/122686
2024-06-04 09:43:34 +02:00
Habib Gahbiche
192d4dc9dc Compositor: implement bicubic interpolation for Stabilize2DNode
This patch also fixes a crash when image input of `Stabilize2DNode` is unconnected and interpolation is set to bicubic.

Differences to GPU compositor:
- ~1px difference is observed due to different rounding of domain size vs. canvas size. This difference is constant for all image sizes.
- If image input is unconnected but other inputs are, CPU compositor doesn't consider the operation to be constant, whereas GPU compositor still outputs a constant result.

Pull Request: https://projects.blender.org/blender/blender/pulls/122288
2024-05-30 17:38:41 +02:00
Campbell Barton
c5a27f011e Cleanup: spelling in comments 2024-05-29 12:49:07 +10:00
Omar Emara
0dca908191 Compositor: Add GPU per-node execution time report
This patch adds support for timing GPU compositor executions. This was
previously not possible since there was no mechanism to measure GPU
calls, which is still the case. However, since 2cf8b5c4e1, we now flush
GPU calls immediately for interactive editing, so we can now measure the
GPU evaluation on the host, which is not a very accurate method, but it
is better than having no timing information. Therefore, timing is only
implemented for interactive editing.

This is different from the CPU implementation in that it measures the
total evaluation time, including any preprocessing of the inputs like
implicit type conversion as well as things like previews.

The profiling implementation was moved to the realtime compositor since
the compositor module is optional.

Pull Request: https://projects.blender.org/blender/blender/pulls/122230
2024-05-28 08:13:46 +02:00
Sergey Sharybin
f4a7c42255 Fix crash in viewer node with quick toggle of frame in Compositor
The issue was caused by a non-initialized ibuf_ used to set the non-color
space to. For now limit the tweaks to the image data-block, which is always
ensured. This makes it consistent with the GPU compositor which does not
have access to ibuf at the moment when the meta-data is being checked.

Ideally we do need to set the color space, but it needs to happen consistently,
and in a thread-safe manner. The way how the CPU compositor releases the
ibuf right after acquisition does not feel safe, so less we rely on it is
better.
2024-05-24 19:26:50 +02:00
Sergey Sharybin
1b18e07232 Fix #121480: Cryptomatte shows some objects as black
The issue originates to the change in default view transform from Filmic
to AgX, which does slightly different clipping, and clips color to black
if there is any negative values.

This change implements an idea of skipping view transform for viewer
node when it is connected to the Pick output of the cryptomatte node.
It actually goes a bit deeper than this and any operation can tag its
result as a non-color data, and the viewer node will respect that.
It is achieved by passing some extra meta-data along the evaluation
pipeline. For the CPU compositor it is done via MetaData, and for the
GPU compositor it is done as part of Result.

Connecting any other node in-between of viewer and Cryptomatte's Pick
will treat the result as color values, and apply color management.

Connecting Pick to the Composite output will also consider it as color,
since there is no concept of non-color-managed render result.

An alternative approaches were tested, including:

- Doing negative value clamping at the viewer node.
  It does not work for legacy cryptomatte node, as it needs to have
  access to original non-modified Pick result.

- Change the order of components, and store ID in another channel.

  Using one of other of Green or Blue channels might work for some view
  transforms, but it does not work for AgX.

  Using Alpha channel seemingly works better, but it is has different
  issues caused by the fact that display transform de-associates alpha,
  leading to over-exposed regions which are hard to see in the file from
  the report. And might lead to the similar issues as the initial report
  with other objects or view transforms.

- Use positive values in the Pick channel.

  It does make things visible, but they are all white due to the nature
  of how AgX works, making it not so useful as a result.

Pull Request: https://projects.blender.org/blender/blender/pulls/122177
2024-05-24 17:25:57 +02:00
Sergey Sharybin
cccc1ea0eb Fix non-thread-safe access to multilayer image in Compositor
Conversion of compositor node tree to operation is done in a job thread,
and the main thread might modify the image data-block at the same time.

This change fixes it by making it so compositor uses acquire/release
semantic for the image data-block, and making it so the image locks its
render result, preventing other threads from modifying it.

Ref #121761

Pull Request: https://projects.blender.org/blender/blender/pulls/122105
2024-05-22 18:15:19 +02:00
Sergey Sharybin
a12bd38853 Fix: Non-thread-safe access to metadata in Compositor
Image's render result might get freed from another thread while the
compositor is running.

Add an utility function which invokes callback on the image's stamp
data from a thread-guarded block.

Ref #118337, #121761

Pull Request: https://projects.blender.org/blender/blender/pulls/121907
2024-05-21 17:29:59 +02:00
Sergey Sharybin
58e0ca99a9 Fix: Non-thread-safe access to image in Compositor
Image operation's get_im_buf() function was not thread-safe:
- It had TOCTOU issue around calculating multi-layer indices and
  requesting to load the image buffer.
- It accessed render result, render layer and pass pointers without
  any thread guards.

This change moves all the logic needed to access the image buffer
into a single function with proper guards around the access. The
result is user-counted, so it is usable in a thread even if another
thread modifies the image.

The is still potential TOCTOU in the compositor since the image is
acquired twice: once from init_execution(), and once from the
determine_canvas(). It could cause issues if image resolution is
changed between these calls. It is still to be looked into.

Ref #118337, #121761
2024-05-21 17:29:51 +02:00
Sergey Sharybin
fb6b759513 Cleanup: Avoid reference of RenderLayer and RenderPass in multilayer operation
This is avoids reference to data which can potentially be freed from the main
thread while the compositor job is running.

There is still some direct access to RenderResult and access to its layers
and passes in the operation implementation, but is is all internal and will
be worked on later. The purpose of this patch is to avoid unsafe pointers in
the API of the operation.

Should be no functional changes.

Ref #118337, #121761
2024-05-21 17:29:51 +02:00
Sergey Sharybin
def1e8154e Cleanup: Move multi-layer view logic to the operation
There are a couple of goals achieved with this change:

- The logic itself is de-duplicated between the Image and Cryptomatte
  nodes.

- The logic which accesses render results, images, etc is more local
  to the place where it needs to be used. Currently it does not matter
  too much, but it allows to properly guard the access to be thread
  safe.

Ref #118337, #121761
2024-05-21 17:29:51 +02:00
Sergey Sharybin
9f28189c28 Cleanup: Store ImageUser by value in the image operation
Allows to modify the user without worrying to store/restore old values,
potentially resolving threading conflicts.

Should be no changes on user level.

Ref #118337, #121761
2024-05-21 17:29:51 +02:00
Omar Emara
f0c379e1d3 Realtime Compositor: Implement Fog Glow Glare node
This patch implements the Fog Glow Glare node by porting the CPU
implementation, so it is not GPU accelerated and is not expected to be
realtime. However, after d4bf23771d, it is now fast enough to be usable,
see that commit for more information on the implementation.

The only difference is that the kernel part of the convolution is cached
in the realtime compositor, so it should be about 30% faster than CPU
for interactive editing.

In the future, this implementation will be replaced by a proper GPU
implementation, likely based on VkFFT.
2024-05-21 18:11:01 +03:00
Omar Emara
429c3f3c7f Compositor: Optimize Fog Glow glare code
Optimize the Fog Glow glare code by making sure TBB is used for
threading, it only uses the needed space for the frequency domain, and
only load the TLD storage once for every threaded invocation.
2024-05-21 17:49:23 +03:00
Omar Emara
9fd37cf31b Fix #122005: Sun Beams node produces NaNs
The Sun Beams node produces NaNs when the ray length option is zero.
This is due to zero division in the code, which we avoid by skipping
computation altogether when the ray length is zero.
2024-05-21 09:53:15 +03:00
Campbell Barton
096eed9d7f Cleanup: spelling in comments 2024-05-20 10:23:54 +10:00
Omar Emara
d4bf23771d Compositor: Optimize Fog Glow Glare node
This patches optimizes the Fog Glow Glare node to be about 25x faster
for 4K images. This is mainly achieved by utilizing the FFTW library and
multi-threading support code. Further improvements are still possible by
caching kernels, but the CPU compositor does not support caching yet.

The old Hartley transform was removed, so the node no longer works when
FFTW is disabled as a build time option, much like the OIDN node. A new
BLI library was introduced for FFTW, it includes some helper routines
relevant for FFTW as well as an initialization routine that sets up
multithreading using TBB as well as thread safety.

Build system support for threaded FFTW was also added, which defines the
relevant variables to detect threading support as well as add the
relevant libraries.

We do not currently have the threaded FFTW libs in our precompiled libs,
so the threading code is disabled until the libs lands in the coming
weeks. So currently, the code is only about 9x faster.

The only functional change is that the kernel is now odd sized, which
should produce more accurate results, but the final result is almost
identical and mostly undetectable.

The plan is to port this to the GPU as well similar to how we implement
OIDN until we have a GPU FFT implementation. GPU compositor can also do
caching, so it should be faster, being able to compute a 4K image in
under half a second.

Pull Request: https://projects.blender.org/blender/blender/pulls/121653
2024-05-17 12:45:21 +02:00
Sergey Sharybin
727a90a0f1 Compositor: Make GPU compositor an official feature
Effectively, make GPU compositor available without need to enable
an experimental feature set.

The compositor device is now exposed in the Performance panel of
Render Buttons. It is also still available in the compositor's
N-panel, together with some other options which are more about how
editing works, and not exactly related to render performance.

Pull Request: https://projects.blender.org/blender/blender/pulls/121398
2024-05-14 15:49:20 +02:00
Iliya Katueshenock
75d17b1db5 Cleanup: Move BKE_node to namespace
Move all header file into namespace.
Unnecessary namespaces was removed from implementations file.
Part of forward declarations in header was moved in the top part
of file just to do not have a lot of separate namespaces.

Pull Request: https://projects.blender.org/blender/blender/pulls/121637
2024-05-13 16:07:12 +02:00
Sergey Sharybin
7b4232e8aa Compositor: Move Execution Mode and Precision from bNodeTree to Scene
This allows to expose these settings in the Performance panel in the
render buttons. Also moves compositor-specific options away from the
generic node tree structure.

For the backwards-compatibility the options are still present in the
DNA for the bNodeTree. This is to minimize the impact on the Studio
which has used the GPU compositor for a while now. They can be
removed in a future release.

There is no functional changes expected.

Pull Request: https://projects.blender.org/blender/blender/pulls/121583
2024-05-10 18:08:33 +02:00
Omar Emara
b42e973320 Cleanup: Clarify Fog Glow Glare code 2024-05-09 11:44:21 +03:00
Sergey Sharybin
149e547de6 Compositor: Remove quality setting from DNA and UI
It was meant to be included into the previous commit in the area,
but was forgotten due to some technicalities.

Also remove the DisplaceSimpleOperation, which is now not used.

Pull Request: https://projects.blender.org/blender/blender/pulls/121580
2024-05-08 16:56:32 +02:00
Sergey Sharybin
9532ea3f8c Compositor: Remove Render/Edit Quality setting
The setting was only affecting some of the blur operations, which
does not typically results in a measurable performance boost in real
compositor setups.

For the simplicity of settings on user level remove setting which
potentially makes compositor output worse, without much benefit.

There are better ways to gain performance, like compositing on a
lower resolution, exposing "preview" as an input to the node tree
(similar to the geometry nodes) etc.

Pull Request: https://projects.blender.org/blender/blender/pulls/121576
2024-05-08 16:41:23 +02:00
Sergey Sharybin
6db8091f45 Compositor: Remove Two Pass option
There are few issues with the logic and implementation of this option:

- While the first pass is faster in the terms of a wall-clock time, it
  is often not giving usable results to artists, as the final look of
  the result is so much different from what it is expected to be.

- It is not supported by the GPU compositor.

- It is based on some static rules based on the node type, rather than
  on the apparent computational complexity.

The performance settings are planned to be moved to the RenderData, and
it is unideal to carry on such limited functionality to more places. There
are better approaches to quickly provide approximated results, which we can
look into later.

Pull Request: https://projects.blender.org/blender/blender/pulls/121558
2024-05-08 14:34:11 +02:00
Habib Gahbiche
6f2afc7390 Fix #121436: Compositor: 2d Stabilization translates image when set to bicubic
The node uses multiple transform operations, each using its own sampling. Using bicubic sampling in the translate node causes undesired offsets.

These changes were intentional (see test updates in `c4e1be73`), but the offsets were thought to be too small for users to notice.

Pull Request: https://projects.blender.org/blender/blender/pulls/121495
2024-05-07 21:08:28 +02:00
Omar Emara
73aea76202 Fix: Map UV and Displace nodes produce wrong outputs
The Map UV and Displace nodes produce unexpected outputs. That's because
the derivatives computed for anisotropic filter were computed in the
sampler's space, while it should be in texel space, as expected bu the
textureGrad function.
2024-05-06 15:26:20 +03:00
Omar Emara
7f222d0fe2 Fix #121305: Corner Pin node returns single color
The Corner Pin node as well as the Plane Track Deform nodes always
return a single color that appears to be the average of the input.
That's because the derivatives were computed in the sampler's space,
while they should be in texel space. Large derivatives meant that the
textureGrad function would always sample the lowest MIP level, hence the
constant average color.

This might be an issue with other uses of textureGrad in the compositor,
so their use should be investigated.
2024-05-06 13:36:03 +03:00
Omar Emara
9a3c7290a9 Fix #121447: Simple Blur node takes too long
The Blur node takes too long to execute even though it is in a simple
configuration. That's because the CPU compositor uses variable size
blurring even if the size is constant. So ensure the input size is
actually variable before using variable size blurring.
2024-05-06 10:46:05 +03:00
Campbell Barton
9918488bb1 Cleanup: use uppercase tags, following own style guide 2024-05-03 11:33:21 +10:00
Sergey Sharybin
1b0012d51c Refactor: Require C++ for users of BLI_simd.h
This is because sse2neon.h might be used to emulate SSE intrinsics
on ARM64 architecture, and it uses some preprocessor which is not
available for C language when using MSVC.

The old-style math file math_matrix.c uses this header, so needed
to become C++. Simple rename did not work since there is a new math
utility math_matrix.cc exists. Following some existing convention
the math_matrix.c is renamed to math_matrix_c.cc. Eventually all the
code should switch to use C++ style math, and the C style removed,
so it seems reasonable to not mix old and new style of API in the
same file.

There should be no functional changes.

Pull Request: https://projects.blender.org/blender/blender/pulls/121335
2024-05-02 16:22:19 +02:00
Campbell Barton
530999fb76 Cleanup: add missing SPDX-FileCopyrightText 2024-05-02 16:46:26 +10:00
Habib Gahbiche
483c854612 Compositor: implement interpolation methods for Translate node
Compositor: Expose interpolation methods Nearest, Bilinear and Bicubic to the user for translate node

This is part of #119592.

Pull Request: https://projects.blender.org/blender/blender/pulls/119603
2024-05-01 14:44:01 +02:00
Omar Emara
74300f88a0 Cleanup: Missing include in recursive blur algorithm 2024-05-01 12:08:08 +03:00
Omar Emara
382131fef2 Realtime Compositor: Implement Fast Gaussian blur
This patch implements the Fast Gaussian blur mode for the Realtime
Compositor. This is a faster but less accurate implementation of
Gaussian blur.

This is implemented as a recursive Gaussian blur algorithm based on the
general method outlined in the following paper:

 Hale, Dave. "Recursive gaussian filters." CWP-546 (2006).

In particular, based on the table in Section 5 Conclusion, for very low
radius blur, we use a direct separable Gaussian convolution. For medium
blur radius, we use the fourth order IIR Deriche filter based on the
following paper:

  Deriche, Rachid. Recursively implementating the Gaussian and its
  derivatives. Diss. INRIA, 1993.

For high radius blur, we use the fourth order IIR Van Vliet filter based
on the following paper:

  Van Vliet, Lucas J., Ian T. Young, and Piet W. Verbeek. "Recursive
  Gaussian derivative filters." Proceedings. Fourteenth International
  Conference on Pattern Recognition (Cat. No. 98EX170). Vol. 1. IEEE,
  1998.

That's because direct convolution is faster and more accurate for very
low radius, while the Deriche filter is more accurate for medium blur
radius, while Van Vliet is more accurate for high blur radius. The
criteria suggested by the paper is a sigma value threshold of 3 and 32
for the Deriche and Van Vliet filters respectively, which we apply on
the larger of the two dimensions.

Both the Deriche and Van Vliet filters are numerically unstable for high
blur radius. So we decompose the Van Vliet filter into a parallel bank
of smaller second order filters based on the method of partial fractions
discussed in the book:

  Oppenheim, Alan V. Discrete-time signal processing. Pearson Education
  India, 1999.

We leave the Deriche filter as is since it is only used for low radii
anyways.

Compared to the CPU implementation, this implementation is more
accurate, but less numerically stable, since CPU uses doubles, which is
not feasible for the GPU.

The only change of behavior between CPU and this implementation is that
this implementation uses the same radius, so Fast Gaussian will match
normal Gaussian, while the CPU implementation has a radius that is 1.5x
the size of normal Gaussian. A patch to change the CPU behavior #121211.

Pull Request: https://projects.blender.org/blender/blender/pulls/120431
2024-05-01 09:57:30 +02:00
Omar Emara
f595345f52 Compositor: Match size of Fast Gaussian with Gaussian
This patch matches the size of the Fast Gaussian mode of blur with the
standard Gaussian mode. The sigma value was computed as half the radius,
while it should be third of the radius, since Blender's Gaussian
function is truncated at 3 of the standard deviation of the unit
Gaussian. The patch include versioning to adjust the size of existing
files.

Pull Request: https://projects.blender.org/blender/blender/pulls/121211
2024-05-01 09:01:53 +02:00
Omar Emara
98043d3fc3 Fix #121264: Crash when File Output has preview
Blender crashes when a File Output was created when the Use Preview
option of the render output settings was enabled. This is a regression
introduced in 931c188ce5, where the image format of the node was
initialized from the scene image settings, so the preview option was
carried over, yet it is not supported nor exposed to the user in the
File Output node. To fix this, ensure the file output code will ignore
previews.
2024-04-30 17:44:02 +03:00
Aaron Carlisle
f5157b00a9 Compositor: Remove left over code from tile based compositor
The compositor used to have a feature that would calculate tiles for the viewer based on a custom order. Since the removal of the tile based compositor, this code is unused.

Pull Request: https://projects.blender.org/blender/blender/pulls/121176
2024-04-30 13:47:45 +02:00
Omar Emara
0eacf3cfef Cleanup: Clarify variable names in Fast Gaussian 2024-04-29 15:15:37 +03:00
Omar Emara
3d2a2a9bd2 Cleanup: Remove unused FastGaussianBlurValueOperation 2024-04-29 14:52:18 +03:00
Campbell Barton
c0def6c93d Cleanup: spelling in comments 2024-04-27 11:58:02 +10:00
Campbell Barton
019d3ef939 Cleanup: spelling in comments 2024-04-24 10:48:45 +10:00
Omar Emara
614e5ecf7d Fix #120857: Blender crashes for huge compositor images
Blender crashes when the user scales up an image into huge scales. This
is due to integer overflow when doing memory allocations. So we fix this
by using larger integers when doing math on memory allocations and
indexing.

Note that while this solves the random crashes, users scaling up images
to huge scales will face things like the OOM killer activating or at
best significant slow downs due to swapping.

It is not clear how to handle such cases, but something like a global
maximum size option that is set to 16k by default might be worth adding.

Pull Request: https://projects.blender.org/blender/blender/pulls/120921
2024-04-23 09:44:33 +02:00
Habib Gahbiche
f1d4859e2a Compositor: implement bicubic spline for CPU compositor
Affected nodes:
- Rotate
- Translate
- Stabilize 2D node

Pull Request: https://projects.blender.org/blender/blender/pulls/120886
2024-04-22 20:58:43 +02:00
Campbell Barton
fd589fdca4 Cleanup: various non functional C++ changes 2024-04-20 13:46:14 +10:00
Omar Emara
dd321a47a6 Fix #120715: File Output node writes empty outputs
The File Output node writes single elements as full images in 4.1, while
such values were skipped in 4.0. This included invalid outputs, for
instance, when the Render Layers node does not have a result for the
selected view layer. Which would then just write an image with an
arbitrary color.

To fix this, we detect single element values and skip writing file
outputs for them.

Pull Request: https://projects.blender.org/blender/blender/pulls/120749
2024-04-19 12:38:31 +02:00
Omar Emara
5e3932595f Compositor: Allow constant folding of set operations
This patch replaces the is_set_operation flag with the
is_constant_operation flag to allow input constants to propagate
through the node tree using the constant folder.
2024-04-19 12:34:43 +02:00
Clément Foucault
f2ae04db10 GPU: Implement missing UBO/SSBO bind tracking
This PR adds a context function to consider all
buffer bindings obsolete. This is in order to
track missing binds and invalid lingering states
accross `draw::Pass`es.

The functions `GPU_storagebuf_debug_unbind_all`
and `GPU_uniformbuf_debug_unbind_all` do nothing
more than resetting the internal debug slot bits
to zero. This is what OpenGL backend does as it
doesn't track the bindings themselves.

Other backends might have other way to detect
missing bindings. If not they should be
implemented separately anyway.

I renamed the function to `debug_unbind_all` to
denote that it actually does something related to
debugging.

This also add SSBO binding check for OpenGL as it
was also missing.

#### Future

This error checking logic is pretty much backend
agnostic. While it would be nice to move it at
`gpu::Context` level, we don't have the resources
for that now.

Pull Request: https://projects.blender.org/blender/blender/pulls/120716
2024-04-17 11:06:39 +02:00