Commit Graph

44 Commits

Author SHA1 Message Date
Omar Emara
317cf37680 Compositor: Implement CPU domain realization
This patch implements the domain realization algorithm for the new CPU
compositor. Only nearest interpolation with no wrapping is implemented
at the moment.

A new sampling method was added to the result class and some relevant
methods were moved into inline functions.
2024-10-11 12:12:24 +03:00
Omar Emara
c55edd1310 Compositor: Implement previews for new CPU compositor
This patch moves the preview computation code to its own algorithm and
implements it on the CPU as well to support the new CPU compositor.
2024-10-10 18:28:34 +03:00
Omar Emara
5930e0404a Compositor: Support Image node in new CPU compositor
This patch supports the Image node in the new CPU compositor.
2024-08-19 18:38:50 +03:00
Omar Emara
edc4d0d84a Compositor: Support CPU storage in result class
This patch adds support for CPU side buffers for the result class. A new
storage type member was added to identify the type of buffer storage,
and allocation will either allocate a GPU texture or a CPU buffer based
on the context's GPU usage.
2024-08-13 16:13:22 +03:00
Omar Emara
885db0986c Refactor: Remove concept of temporary result
Temporary results are essentially results with a default reference count
of 1, so we default to 1 for all results and set the initial reference
count differently as need.
2024-08-06 19:37:18 +03:00
Sergey Sharybin
89ededeec4 Cleanup: Code style 2024-08-02 14:41:33 +02:00
Bill-Spitzak
2042669e23 Refactor: Reformulate math in domain realization
Directly calculate the transformation matrix by multiplying and
inverting the Domain matrices. This removes a double-invert and
decomposition of the matrices so it should be more accurate, and I think
makes the math a lot easier to figure out.

This also moves the "bias" for Nearest to be done in the input space
rather than output. This should make it select the same pixels from the
input even if the image is rotated 180 degrees.

Co-authored-by: Bill Spitzak <bills@sidefx.com>
Pull Request: https://projects.blender.org/blender/blender/pulls/125543
2024-08-02 13:26:46 +02:00
Bill Spitzak
0d70481d15 Refactor: Optimize compositor realization shader
This patch optimizes the compositor realization shader by moving the
centring offset to the host code as opposed to the shader.
2024-07-25 18:58:29 +03:00
Bill Spitzak
ab51d879c3 Fix #124339: Compositor artifacts with 0.5 translation
The compositor translate node produces artifacts when its fractional
part is 0.5. That's because GPUs do round-to-even for nearest neighbour
sampling in case samples were at pixel boundaries.

To fix this, we bias translations by a small value to break the
rounding and ensure predictable rounding direction.
2024-07-24 19:22:00 +03:00
Omar Emara
74300f88a0 Cleanup: Missing include in recursive blur algorithm 2024-05-01 12:08:08 +03:00
Omar Emara
382131fef2 Realtime Compositor: Implement Fast Gaussian blur
This patch implements the Fast Gaussian blur mode for the Realtime
Compositor. This is a faster but less accurate implementation of
Gaussian blur.

This is implemented as a recursive Gaussian blur algorithm based on the
general method outlined in the following paper:

 Hale, Dave. "Recursive gaussian filters." CWP-546 (2006).

In particular, based on the table in Section 5 Conclusion, for very low
radius blur, we use a direct separable Gaussian convolution. For medium
blur radius, we use the fourth order IIR Deriche filter based on the
following paper:

  Deriche, Rachid. Recursively implementating the Gaussian and its
  derivatives. Diss. INRIA, 1993.

For high radius blur, we use the fourth order IIR Van Vliet filter based
on the following paper:

  Van Vliet, Lucas J., Ian T. Young, and Piet W. Verbeek. "Recursive
  Gaussian derivative filters." Proceedings. Fourteenth International
  Conference on Pattern Recognition (Cat. No. 98EX170). Vol. 1. IEEE,
  1998.

That's because direct convolution is faster and more accurate for very
low radius, while the Deriche filter is more accurate for medium blur
radius, while Van Vliet is more accurate for high blur radius. The
criteria suggested by the paper is a sigma value threshold of 3 and 32
for the Deriche and Van Vliet filters respectively, which we apply on
the larger of the two dimensions.

Both the Deriche and Van Vliet filters are numerically unstable for high
blur radius. So we decompose the Van Vliet filter into a parallel bank
of smaller second order filters based on the method of partial fractions
discussed in the book:

  Oppenheim, Alan V. Discrete-time signal processing. Pearson Education
  India, 1999.

We leave the Deriche filter as is since it is only used for low radii
anyways.

Compared to the CPU implementation, this implementation is more
accurate, but less numerically stable, since CPU uses doubles, which is
not feasible for the GPU.

The only change of behavior between CPU and this implementation is that
this implementation uses the same radius, so Fast Gaussian will match
normal Gaussian, while the CPU implementation has a radius that is 1.5x
the size of normal Gaussian. A patch to change the CPU behavior #121211.

Pull Request: https://projects.blender.org/blender/blender/pulls/120431
2024-05-01 09:57:30 +02:00
Campbell Barton
a00843d560 Cleanup: use const references instead of values 2024-04-05 11:31:53 +11:00
Hans Goudey
8b514bccd1 Cleanup: Move remaining GPU headers to C++
Pull Request: https://projects.blender.org/blender/blender/pulls/119807
2024-03-23 01:24:18 +01:00
Campbell Barton
e6cb2ca380 Cleanup: unused includes in source/blender/compositor
Remove 5 inclues.
2024-02-13 19:21:03 +11:00
Omar Emara
9743d6992c Compositor: Unify Variable Size Bokeh Blur node
This patch adjusts the Variable Size Bokeh Blur node such that it
matches between CPU and GPU. The GPU implementation is mostly followed
for the reasons stated below.

The first difference is a bug in the CPU implementation, where the upper
limit of the blur window is not considered, but the lower limit is.

The second difference is due to CPU ignoring outside pixels instead of
clamping them to border, which is done until an option is added to the
node to control the boundary condition.

The third difference is due to CPU ignoring the bounding box input.

The fourth difference is that CPU doesn't allow zero maximum blur
radius, which is a valid option.

The fifth difference is that the threshold option, which is only used
for the Defocus node, was considered in a greater than manner, while it
should be greater than or equal. Since the default threshold of one
should allow a blur size of one.

The GPU implementation now considers the maximum size of its input,
following the CPU implementation.

Pull Request: https://projects.blender.org/blender/blender/pulls/117947
2024-02-09 11:52:41 +01:00
Aras Pranckevicius
a705259b4b Cleanup: move imbuf .h files to .hh 2024-01-19 20:29:38 +01:00
Omar Emara
e055db6605 Realtime Compositor: Implement Defocus node
This patch implements the defocus node for the Realtime Compositor. The
implementation does not match the CPU compositor implementation, but
uses a new formulation that is more physically accurate and consistent
with Blender's render engines.

The existing CPU implementation is questionable starting from its circle
of confusion calculation, to the morphological operations applied on the
CoC radius, to ignoring the maximum CoC radius in the search kernel, and
ending with the threshold parameter used to reduce artifacts. Therefore,
it should be reimplemented along with this same implementation using a
more consistent methodology.

EEVEE and Workbench already have a GPU defocus method, which can be
ported to the compositor and used as the preview defocus algorithm.
While this implementation will be updated to be a more accurate method
that produces the same structure as the ported EEVEE implementation.

The new formulation ignores the threshold parameter for now, as well as
the preview parameter.

Pull Request: https://projects.blender.org/blender/blender/pulls/116391
2023-12-21 12:20:38 +01:00
Omar Emara
48d7d60c96 Realtime Compositor: Rewrite inpaint node
This patch rewrites the Inpaint node in the Realtime Compositor. The old
method suffered from discontinuities and singularities in the inpainting
regions. Furthermore, it ignored semi-transparent areas.

The new method is inspired by a two pass method described by the paper:

  Rosner, Jakub, et al. "Fast GPU-based image warping and inpainting for
  frame interpolation." International Conferences on Computer Graphics,
  Vision and Mathematics. 2010.

In particular, we first fill the inpainting region using jump flooding,
then we apply a variable size blur pass whose size is proportional to
the distance to the inpainting boundary. The smoothed region is then
mixed with the input using its alpha.

The new method is much closer to the Bertalmio-style diffusion-based
inpainting methods, and thus can more accurately close holes than
existing methods.

The aforementioned method requires variable size blur, which is quite
expensive for this use case, so a new implementation was added that
approximates the method using a separable implementation, which provides
a visually pleasing result assuming a sufficiently smooth radius field,
which is true for our case since the field is an SDF.

Fixes: #114422

Pull Request: https://projects.blender.org/blender/blender/pulls/114849
2023-11-22 13:23:02 +01:00
Omar Emara
474b6fa070 Realtime Compositor: Support full precision compositing
This patch adds support for full precision compositing for the Realtime
Compositor. A new precision option was added to the compositor to change
between half and full precision compositing, where the Auto option uses
half for the viewport compositor and the interactive render compositor,
while full is used for final renders.

The compositor context now need to implement the get_precision() method
to indicate its preferred precision. Intermediate results will be stored
using the context's precision, with a number of exceptions that can use
a different precision regardless of the context's precision. For
instance, summed area tables are always stored in full float results
even if the context specified half float. Conversely, jump flooding
tables are always stored in half integer results even if the context
specified full. The former requires full float while the latter has no
use for it.

Since shaders are created for a specific precision, we need two variants
of each compositor shader to account for the context's possible
precision. However, to avoid doubling the shader info count and reduce
boilerplate code and development time, an automated mechanism was
employed. A single shader info of whatever precision needs to be added,
then, at runtime, the shader info can be adjusted to change the
precision of the outputs. That shader variant is then cached in the
static cache manager for future processing-free shader retrieval.
Therefore, the shader manager was removed in favor of a cached shader
container in the static cache manager.

A number of utilities were added to make the creation of results as well as
the retrieval of shader with the target precision easier. Further, a
number of precision-specific shaders were removed in favor of more
generic ones that utilizes the aforementioned shader retrieval
mechanism.

Pull Request: https://projects.blender.org/blender/blender/pulls/113476
2023-11-08 08:32:00 +01:00
Omar Emara
1500a594ad Realtime Compositor: Immediately realize wrapped translations
This patch changes how wrapped translations are handled by the Realtime
Compositor. Previously, translations were always stored on the result
and delayed until automatically realized later. The wrapping status was
also stored to control this later automatic realization.

This patch changes that such that translations are immediately realized
for the axes that has enabled wrapping. Consequently, the image will not
get translated, but its content will, in a clip on one side, wrap on the
opposite side manner.

Another change is that wrapping information is no longer propagated to
future automatic realizations, so tilling or repeating an image is no
longer possible. An alternative method of repetition will be introduced
in a later patch.

Pull Request: https://projects.blender.org/blender/blender/pulls/113669
2023-11-02 12:39:41 +01:00
Omar Emara
3f49d286cc Fix #113864: Compositor translation is doubled
Translations in the realtime compositor are doubled where there is a
rotation or scale component.

That was due to applying translations even after domain realization. So
this patch restricts the translation to the case where domain
realization doesn't take place.
2023-10-25 16:28:09 +03:00
Omar Emara
444241fbcf Fix: GPU Compositor rotation transposes the image
The image is transposed when rotated in the GPU compositor. This was a
typo in the rotation bounding box equation, which is fixed in this
patch.
2023-10-12 15:22:22 +03:00
Omar Emara
e592763940 Realtime Compositor: Immediately realize transformations
This patch immediately realizes the scale and rotation components of
transformations at the point of transform nodes. The translate component is
still delayed and only realized when really needed to avoid clipping.

Transformed results are always realized in an expanded domain that avoids
clipping due to rotation or scaling. The size of the transformed domain is
clipped to the GPU texture size limit for now until we have support for huge
textures, that limit is typically 16k.

A potential optimization is to join all consecutive transform and realize
operations into a single realize operation.

Fixes #112332.

Pull Request: https://projects.blender.org/blender/blender/pulls/112332
2023-10-12 11:04:50 +02:00
Omar Emara
f217380128 Realtime Compositor: Use Int2 images in JFA algorithm
This patch changes the image type used in the Jump Flooding Algorithm to
be Int2 instead of Float4. That's because we used to store the distance
along with the texel location, which we no longer do, so we are left
with the 2D texel location only which can be stored in an Int2 image.

We no longer store the distance because it is not necessarily needed, it
introduces a sqrt in each of the JFA passes, and it is less precise due
to storage in 16F images. Developers should compute the distance in the
user shader instead.

This is a non-functional change, but results in less memory usage,
higher performance, and higher precision.

Pull Request: https://projects.blender.org/blender/blender/pulls/112941
2023-09-27 18:34:14 +02:00
Omar Emara
6b53fd5cf6 Realtime Compositor: Allow more result types
Previously, the Result class was reserved for inputs and outputs of
operations, so its allowed types were naturally those exposed to the
user. However, we now use the Result class internally for intermediate
results, so it now makes sense to expend the allowed types.

The types are now divided into two categories, those that are user
facing and need to be handled in implicit operations and those that
are internal and can be exempt from such handling. Internal types are
reserved for texture results, as the single value mechanism is only
useful for user facing results.

The patch merely adjusts the switch cases across the code base, adding
one new internal type as an example.

Pull Request: https://projects.blender.org/blender/blender/pulls/112414
2023-09-25 15:12:52 +02:00
Omar Emara
5008938a1c Realtime Compositor: Implement Double Edge Mask node
This patch implements the Double Edge Mask node for the Realtime
Compositor. The implementation is primarily based on the 1+JFA Jump
Flooding algorithm, which was also introduced in this commit.

Pull Request: https://projects.blender.org/blender/blender/pulls/112223
2023-09-25 08:35:42 +02:00
Campbell Barton
e955c94ed3 License Headers: Set copyright to "Blender Authors", add AUTHORS
Listing the "Blender Foundation" as copyright holder implied the Blender
Foundation holds copyright to files which may include work from many
developers.

While keeping copyright on headers makes sense for isolated libraries,
Blender's own code may be refactored or moved between files in a way
that makes the per file copyright holders less meaningful.

Copyright references to the "Blender Foundation" have been replaced with
"Blender Authors", with the exception of `./extern/` since these this
contains libraries which are more isolated, any changed to license
headers there can be handled on a case-by-case basis.

Some directories in `./intern/` have also been excluded:

- `./intern/cycles/` it's own `AUTHORS` file is planned.
- `./intern/opensubdiv/`.

An "AUTHORS" file has been added, using the chromium projects authors
file as a template.

Design task: #110784

Ref !110783.
2023-08-16 00:20:26 +10:00
Omar Emara
a79680b4d1 Fix: End of non void function error in previous commit 2023-07-19 15:51:36 +03:00
Omar Emara
817a1a5c8f Fix: End of non void function error in last commit 2023-07-19 15:20:01 +03:00
Omar Emara
940558f9ac Realtime Compositor: Implement Classic Kuwahara
This patch implements the Classic Kuwahara node for the Realtime Compositor.

A naive O(radius^2) implementation is used for radii up to 5 pixels, and a
constant O(1) implementation based on summed area tables is used for higher
radii at the cost of building and storing the tables.

This is different from the CPU implementation in that it computes the variance
as the average of the variance of each of the individual channels. This is done
to avoid computing yet another SAT table for luminance. The CPU implementation
will be adapted to match this in a future commit.

The SAT implementation is based on the algorithm described in:

Nehab, Diego, et al. "GPU-efficient recursive filtering and summed-area tables."

Additionally, the Result class now allows full precision texture allocation, which
was necessary for storing the SAT tables.

Pull Request: https://projects.blender.org/blender/blender/pulls/109292
2023-07-19 14:04:18 +02:00
Campbell Barton
85029da289 Cleanup: compiler warnings 2023-06-25 11:39:25 +10:00
Omar Emara
c9e6399fe1 Realtime Compositor: Implement Keying node
This patch implements the Keying node for the realtime compositor. To
ease the implementation, some morphological operators were moved into
algorithms and a mechanism to steal data between results was added to
the Result class.

Pull Request: https://projects.blender.org/blender/blender/pulls/108393
2023-06-24 13:02:33 +02:00
Sergey Sharybin
c1bc70b711 Cleanup: Add a copyright notice to files and use SPDX format
A lot of files were missing copyright field in the header and
the Blender Foundation contributed to them in a sense of bug
fixing and general maintenance.

This change makes it explicit that those files are at least
partially copyrighted by the Blender Foundation.

Note that this does not make it so the Blender Foundation is
the only holder of the copyright in those files, and developers
who do not have a signed contract with the foundation still
hold the copyright as well.

Another aspect of this change is using SPDX format for the
header. We already used it for the license specification,
and now we state it for the copyright as well, following the
FAQ:

    https://reuse.software/faq/
2023-05-31 16:19:06 +02:00
Omar Emara
1c3d67d4cc Realtime Compositor: Split cache into containers
This patch refactors the static cache manager to be split into multiple
smaller Cached Resources Containers. This is a non factional change, and
was done to simplify future implementations of cached resources as they
become more elaborate.
2023-04-25 13:20:00 +02:00
Omar Emara
b939b60c3f Realtime Compositor: Implement Z Combine node
This patch implements the Z Combine node for the realtime compositor.
The patch also extends the SMAA implementation to work with float
textures as a prerequisite to the Z Combine implementation. Moreover, a
mechanism for computing multi-output operations was implemented, in
which unneeded outputs will allocate a dummy 1x1 texture for a correct
shader invocation, then those dummy textures will be cleaned up by
calling a routine right after evaluation.

This is different from the CPU implementation in that the while combine
mask is anti-aliased, including the alpha mask, which is not considered
in the CPU case.

The node can be implemented as a GPU shader operation when the
anti-aliasing option is disabled, which is something we should do when
the evaluator allows nodes be executed as both standard and GPU shader
operations.

Pull Request: https://projects.blender.org/blender/blender/pulls/106637
2023-04-09 09:06:41 +02:00
Omar Emara
3d06905ac4 Realtime Compositor: Use luminance coefficients in SMAA
This patch utilizes luminance coefficients in the SMAA edge detection
phase to match the existing implementation in the CPU compositor.
2023-03-27 09:10:51 +02:00
Omar Emara
2f4a7d67b7 Realtime Compositor: Implement Anti-Aliasing node
This patch implements the Anti-Aliasing node by porting SMAA from
Workbench into a generic library that can be used by the realtime
compositor and potentially other users. SMAA was encapsulated in an
algorithm to prepare it for use by other nodes that require SMAA
support.

Pull Request: https://projects.blender.org/blender/blender/pulls/106114
2023-03-26 16:59:13 +02:00
Clément Foucault
164f591033 Cleanup: GPU: Rename some functions for consistency 2023-02-13 11:22:38 +01:00
Clément Foucault
8f44c37f5c Cleanup: Rename BLI_math_vec_types* files to BLI_math_vector_types
This is for the sake of consistency and clarity.
2023-01-06 20:09:51 +01:00
Omar Emara
fa27a5d066 Realtime Compositor: Implement Ghost Glare node
This patch implements the Ghost Glare node. It is implemented using
direct convolution as opposed to a recursive one, which produces
slightly different results---more accurate ones, however, since the
ghosts are attenuated where it matters, the difference is barely
visible and is acceptable as far as I can tell.

A possible performance improvement is to implement all passes in a
single shader dispatch, where an array of all scales and color
modulators is computed recursively on the host then used in the shader
to add all ghosts, avoiding usage of global memory and unnecessary
copies. This optimization will be implemented separately.

Differential Revision: https://developer.blender.org/D16641

Reviewed By: Clement Foucault
2022-12-09 16:50:52 +02:00
Omar Emara
58e25f11ae Realtime Compositor: Implement normalize node
This patch implements the normalize node for the realtime compositor.

Differential Revision: https://developer.blender.org/D16279

Reviewed By: Clement Foucault
2022-10-20 16:32:28 +02:00
Omar Emara
7f2cd2d969 Realtime Compositor: Implement Tone Map node
This patch implements the tone map node for the realtime compositor
based on the two papers:

Reinhard, Erik, et al. "Photographic tone reproduction for digital
images." Proceedings of the 29th annual conference on Computer graphics
and interactive techniques. 2002.

Reinhard, Erik, and Kate Devlin. "Dynamic range reduction inspired by
photoreceptor physiology." IEEE transactions on visualization and
computer graphics 11.1 (2005): 13-24.

The original implementation should be revisited later due to apparent
incompatibilities with the reference papers, which makes the operation
less useful.

Differential Revision: https://developer.blender.org/D16306

Reviewed By: Clement Foucault
2022-10-20 15:02:41 +02:00
Omar Emara
009acfa477 Cleanup: Add missing include for parallel reduction
The parallel reduction file didn't include its own header, which can
yield "no previous declaration" warnings. This patch includes the header
to fix the warning.
2022-10-11 16:22:14 +02:00
Omar Emara
0037411f55 Realtime Compositor: Implement parallel reduction
This patch implements generic parallel reduction for the realtime
compositor and implements the Levels operation as an example. This patch
also introduces the notion of a "Compositor Algorithm", which is a
reusable operation that can be used to construct other operations.

Differential Revision: https://developer.blender.org/D16184

Reviewed By: Clement Foucault
2022-10-11 13:22:52 +02:00