This patch implements the domain realization algorithm for the new CPU
compositor. Only nearest interpolation with no wrapping is implemented
at the moment.
A new sampling method was added to the result class and some relevant
methods were moved into inline functions.
This patch adds support for CPU side buffers for the result class. A new
storage type member was added to identify the type of buffer storage,
and allocation will either allocate a GPU texture or a CPU buffer based
on the context's GPU usage.
Temporary results are essentially results with a default reference count
of 1, so we default to 1 for all results and set the initial reference
count differently as need.
Directly calculate the transformation matrix by multiplying and
inverting the Domain matrices. This removes a double-invert and
decomposition of the matrices so it should be more accurate, and I think
makes the math a lot easier to figure out.
This also moves the "bias" for Nearest to be done in the input space
rather than output. This should make it select the same pixels from the
input even if the image is rotated 180 degrees.
Co-authored-by: Bill Spitzak <bills@sidefx.com>
Pull Request: https://projects.blender.org/blender/blender/pulls/125543
The compositor translate node produces artifacts when its fractional
part is 0.5. That's because GPUs do round-to-even for nearest neighbour
sampling in case samples were at pixel boundaries.
To fix this, we bias translations by a small value to break the
rounding and ensure predictable rounding direction.
This patch implements the Fast Gaussian blur mode for the Realtime
Compositor. This is a faster but less accurate implementation of
Gaussian blur.
This is implemented as a recursive Gaussian blur algorithm based on the
general method outlined in the following paper:
Hale, Dave. "Recursive gaussian filters." CWP-546 (2006).
In particular, based on the table in Section 5 Conclusion, for very low
radius blur, we use a direct separable Gaussian convolution. For medium
blur radius, we use the fourth order IIR Deriche filter based on the
following paper:
Deriche, Rachid. Recursively implementating the Gaussian and its
derivatives. Diss. INRIA, 1993.
For high radius blur, we use the fourth order IIR Van Vliet filter based
on the following paper:
Van Vliet, Lucas J., Ian T. Young, and Piet W. Verbeek. "Recursive
Gaussian derivative filters." Proceedings. Fourteenth International
Conference on Pattern Recognition (Cat. No. 98EX170). Vol. 1. IEEE,
1998.
That's because direct convolution is faster and more accurate for very
low radius, while the Deriche filter is more accurate for medium blur
radius, while Van Vliet is more accurate for high blur radius. The
criteria suggested by the paper is a sigma value threshold of 3 and 32
for the Deriche and Van Vliet filters respectively, which we apply on
the larger of the two dimensions.
Both the Deriche and Van Vliet filters are numerically unstable for high
blur radius. So we decompose the Van Vliet filter into a parallel bank
of smaller second order filters based on the method of partial fractions
discussed in the book:
Oppenheim, Alan V. Discrete-time signal processing. Pearson Education
India, 1999.
We leave the Deriche filter as is since it is only used for low radii
anyways.
Compared to the CPU implementation, this implementation is more
accurate, but less numerically stable, since CPU uses doubles, which is
not feasible for the GPU.
The only change of behavior between CPU and this implementation is that
this implementation uses the same radius, so Fast Gaussian will match
normal Gaussian, while the CPU implementation has a radius that is 1.5x
the size of normal Gaussian. A patch to change the CPU behavior #121211.
Pull Request: https://projects.blender.org/blender/blender/pulls/120431
This patch adjusts the Variable Size Bokeh Blur node such that it
matches between CPU and GPU. The GPU implementation is mostly followed
for the reasons stated below.
The first difference is a bug in the CPU implementation, where the upper
limit of the blur window is not considered, but the lower limit is.
The second difference is due to CPU ignoring outside pixels instead of
clamping them to border, which is done until an option is added to the
node to control the boundary condition.
The third difference is due to CPU ignoring the bounding box input.
The fourth difference is that CPU doesn't allow zero maximum blur
radius, which is a valid option.
The fifth difference is that the threshold option, which is only used
for the Defocus node, was considered in a greater than manner, while it
should be greater than or equal. Since the default threshold of one
should allow a blur size of one.
The GPU implementation now considers the maximum size of its input,
following the CPU implementation.
Pull Request: https://projects.blender.org/blender/blender/pulls/117947
This patch implements the defocus node for the Realtime Compositor. The
implementation does not match the CPU compositor implementation, but
uses a new formulation that is more physically accurate and consistent
with Blender's render engines.
The existing CPU implementation is questionable starting from its circle
of confusion calculation, to the morphological operations applied on the
CoC radius, to ignoring the maximum CoC radius in the search kernel, and
ending with the threshold parameter used to reduce artifacts. Therefore,
it should be reimplemented along with this same implementation using a
more consistent methodology.
EEVEE and Workbench already have a GPU defocus method, which can be
ported to the compositor and used as the preview defocus algorithm.
While this implementation will be updated to be a more accurate method
that produces the same structure as the ported EEVEE implementation.
The new formulation ignores the threshold parameter for now, as well as
the preview parameter.
Pull Request: https://projects.blender.org/blender/blender/pulls/116391
This patch rewrites the Inpaint node in the Realtime Compositor. The old
method suffered from discontinuities and singularities in the inpainting
regions. Furthermore, it ignored semi-transparent areas.
The new method is inspired by a two pass method described by the paper:
Rosner, Jakub, et al. "Fast GPU-based image warping and inpainting for
frame interpolation." International Conferences on Computer Graphics,
Vision and Mathematics. 2010.
In particular, we first fill the inpainting region using jump flooding,
then we apply a variable size blur pass whose size is proportional to
the distance to the inpainting boundary. The smoothed region is then
mixed with the input using its alpha.
The new method is much closer to the Bertalmio-style diffusion-based
inpainting methods, and thus can more accurately close holes than
existing methods.
The aforementioned method requires variable size blur, which is quite
expensive for this use case, so a new implementation was added that
approximates the method using a separable implementation, which provides
a visually pleasing result assuming a sufficiently smooth radius field,
which is true for our case since the field is an SDF.
Fixes: #114422
Pull Request: https://projects.blender.org/blender/blender/pulls/114849
This patch adds support for full precision compositing for the Realtime
Compositor. A new precision option was added to the compositor to change
between half and full precision compositing, where the Auto option uses
half for the viewport compositor and the interactive render compositor,
while full is used for final renders.
The compositor context now need to implement the get_precision() method
to indicate its preferred precision. Intermediate results will be stored
using the context's precision, with a number of exceptions that can use
a different precision regardless of the context's precision. For
instance, summed area tables are always stored in full float results
even if the context specified half float. Conversely, jump flooding
tables are always stored in half integer results even if the context
specified full. The former requires full float while the latter has no
use for it.
Since shaders are created for a specific precision, we need two variants
of each compositor shader to account for the context's possible
precision. However, to avoid doubling the shader info count and reduce
boilerplate code and development time, an automated mechanism was
employed. A single shader info of whatever precision needs to be added,
then, at runtime, the shader info can be adjusted to change the
precision of the outputs. That shader variant is then cached in the
static cache manager for future processing-free shader retrieval.
Therefore, the shader manager was removed in favor of a cached shader
container in the static cache manager.
A number of utilities were added to make the creation of results as well as
the retrieval of shader with the target precision easier. Further, a
number of precision-specific shaders were removed in favor of more
generic ones that utilizes the aforementioned shader retrieval
mechanism.
Pull Request: https://projects.blender.org/blender/blender/pulls/113476
This patch changes how wrapped translations are handled by the Realtime
Compositor. Previously, translations were always stored on the result
and delayed until automatically realized later. The wrapping status was
also stored to control this later automatic realization.
This patch changes that such that translations are immediately realized
for the axes that has enabled wrapping. Consequently, the image will not
get translated, but its content will, in a clip on one side, wrap on the
opposite side manner.
Another change is that wrapping information is no longer propagated to
future automatic realizations, so tilling or repeating an image is no
longer possible. An alternative method of repetition will be introduced
in a later patch.
Pull Request: https://projects.blender.org/blender/blender/pulls/113669
Translations in the realtime compositor are doubled where there is a
rotation or scale component.
That was due to applying translations even after domain realization. So
this patch restricts the translation to the case where domain
realization doesn't take place.
This patch immediately realizes the scale and rotation components of
transformations at the point of transform nodes. The translate component is
still delayed and only realized when really needed to avoid clipping.
Transformed results are always realized in an expanded domain that avoids
clipping due to rotation or scaling. The size of the transformed domain is
clipped to the GPU texture size limit for now until we have support for huge
textures, that limit is typically 16k.
A potential optimization is to join all consecutive transform and realize
operations into a single realize operation.
Fixes#112332.
Pull Request: https://projects.blender.org/blender/blender/pulls/112332
This patch changes the image type used in the Jump Flooding Algorithm to
be Int2 instead of Float4. That's because we used to store the distance
along with the texel location, which we no longer do, so we are left
with the 2D texel location only which can be stored in an Int2 image.
We no longer store the distance because it is not necessarily needed, it
introduces a sqrt in each of the JFA passes, and it is less precise due
to storage in 16F images. Developers should compute the distance in the
user shader instead.
This is a non-functional change, but results in less memory usage,
higher performance, and higher precision.
Pull Request: https://projects.blender.org/blender/blender/pulls/112941
Previously, the Result class was reserved for inputs and outputs of
operations, so its allowed types were naturally those exposed to the
user. However, we now use the Result class internally for intermediate
results, so it now makes sense to expend the allowed types.
The types are now divided into two categories, those that are user
facing and need to be handled in implicit operations and those that
are internal and can be exempt from such handling. Internal types are
reserved for texture results, as the single value mechanism is only
useful for user facing results.
The patch merely adjusts the switch cases across the code base, adding
one new internal type as an example.
Pull Request: https://projects.blender.org/blender/blender/pulls/112414
This patch implements the Double Edge Mask node for the Realtime
Compositor. The implementation is primarily based on the 1+JFA Jump
Flooding algorithm, which was also introduced in this commit.
Pull Request: https://projects.blender.org/blender/blender/pulls/112223
Listing the "Blender Foundation" as copyright holder implied the Blender
Foundation holds copyright to files which may include work from many
developers.
While keeping copyright on headers makes sense for isolated libraries,
Blender's own code may be refactored or moved between files in a way
that makes the per file copyright holders less meaningful.
Copyright references to the "Blender Foundation" have been replaced with
"Blender Authors", with the exception of `./extern/` since these this
contains libraries which are more isolated, any changed to license
headers there can be handled on a case-by-case basis.
Some directories in `./intern/` have also been excluded:
- `./intern/cycles/` it's own `AUTHORS` file is planned.
- `./intern/opensubdiv/`.
An "AUTHORS" file has been added, using the chromium projects authors
file as a template.
Design task: #110784
Ref !110783.
This patch implements the Classic Kuwahara node for the Realtime Compositor.
A naive O(radius^2) implementation is used for radii up to 5 pixels, and a
constant O(1) implementation based on summed area tables is used for higher
radii at the cost of building and storing the tables.
This is different from the CPU implementation in that it computes the variance
as the average of the variance of each of the individual channels. This is done
to avoid computing yet another SAT table for luminance. The CPU implementation
will be adapted to match this in a future commit.
The SAT implementation is based on the algorithm described in:
Nehab, Diego, et al. "GPU-efficient recursive filtering and summed-area tables."
Additionally, the Result class now allows full precision texture allocation, which
was necessary for storing the SAT tables.
Pull Request: https://projects.blender.org/blender/blender/pulls/109292
This patch implements the Keying node for the realtime compositor. To
ease the implementation, some morphological operators were moved into
algorithms and a mechanism to steal data between results was added to
the Result class.
Pull Request: https://projects.blender.org/blender/blender/pulls/108393
A lot of files were missing copyright field in the header and
the Blender Foundation contributed to them in a sense of bug
fixing and general maintenance.
This change makes it explicit that those files are at least
partially copyrighted by the Blender Foundation.
Note that this does not make it so the Blender Foundation is
the only holder of the copyright in those files, and developers
who do not have a signed contract with the foundation still
hold the copyright as well.
Another aspect of this change is using SPDX format for the
header. We already used it for the license specification,
and now we state it for the copyright as well, following the
FAQ:
https://reuse.software/faq/
This patch refactors the static cache manager to be split into multiple
smaller Cached Resources Containers. This is a non factional change, and
was done to simplify future implementations of cached resources as they
become more elaborate.
This patch implements the Z Combine node for the realtime compositor.
The patch also extends the SMAA implementation to work with float
textures as a prerequisite to the Z Combine implementation. Moreover, a
mechanism for computing multi-output operations was implemented, in
which unneeded outputs will allocate a dummy 1x1 texture for a correct
shader invocation, then those dummy textures will be cleaned up by
calling a routine right after evaluation.
This is different from the CPU implementation in that the while combine
mask is anti-aliased, including the alpha mask, which is not considered
in the CPU case.
The node can be implemented as a GPU shader operation when the
anti-aliasing option is disabled, which is something we should do when
the evaluator allows nodes be executed as both standard and GPU shader
operations.
Pull Request: https://projects.blender.org/blender/blender/pulls/106637
This patch implements the Anti-Aliasing node by porting SMAA from
Workbench into a generic library that can be used by the realtime
compositor and potentially other users. SMAA was encapsulated in an
algorithm to prepare it for use by other nodes that require SMAA
support.
Pull Request: https://projects.blender.org/blender/blender/pulls/106114
This patch implements the Ghost Glare node. It is implemented using
direct convolution as opposed to a recursive one, which produces
slightly different results---more accurate ones, however, since the
ghosts are attenuated where it matters, the difference is barely
visible and is acceptable as far as I can tell.
A possible performance improvement is to implement all passes in a
single shader dispatch, where an array of all scales and color
modulators is computed recursively on the host then used in the shader
to add all ghosts, avoiding usage of global memory and unnecessary
copies. This optimization will be implemented separately.
Differential Revision: https://developer.blender.org/D16641
Reviewed By: Clement Foucault
This patch implements the normalize node for the realtime compositor.
Differential Revision: https://developer.blender.org/D16279
Reviewed By: Clement Foucault
This patch implements the tone map node for the realtime compositor
based on the two papers:
Reinhard, Erik, et al. "Photographic tone reproduction for digital
images." Proceedings of the 29th annual conference on Computer graphics
and interactive techniques. 2002.
Reinhard, Erik, and Kate Devlin. "Dynamic range reduction inspired by
photoreceptor physiology." IEEE transactions on visualization and
computer graphics 11.1 (2005): 13-24.
The original implementation should be revisited later due to apparent
incompatibilities with the reference papers, which makes the operation
less useful.
Differential Revision: https://developer.blender.org/D16306
Reviewed By: Clement Foucault
The parallel reduction file didn't include its own header, which can
yield "no previous declaration" warnings. This patch includes the header
to fix the warning.
This patch implements generic parallel reduction for the realtime
compositor and implements the Levels operation as an example. This patch
also introduces the notion of a "Compositor Algorithm", which is a
reusable operation that can be used to construct other operations.
Differential Revision: https://developer.blender.org/D16184
Reviewed By: Clement Foucault