Use a default case for blur type switch case, that's because adding all
types is not necessary, and I am trying to differentiate between switch
cases that need full type enumeration and those that do not.
This patch formalizes the process of data updates of single values.
Previously, this was done internally in the set_single_value method, but
there was a hack in multi-function procedure code to do the same. This
patch adds a new method for single value data updates and uses it
internally in set_single_value.
This patch refactors how values of unlinked sockets are provided to
nodes. Previously, the GPU node stack values were initialized at
construction time and linked by the node's compile methods. This meant
that we needed to handle implicit conversion if the socket is linked to
an unlinked node group input.
Alternatively, we now insert a GPU setter node for each unlinked socket
that carries the value and type of the origin socket, and let the GPU
code generator do implicit conversion at the shader level. This has
three advantages:
- It makes it easier to add new types since we no longer have to handle
those types in shader node code and it reduces code duplication.
- It makes the code more inline with how we implement multi-function
procedures. So refactoring is easier.
- It opens the door to implement things like implicit inputs, which will
be needed later for things like texture nodes.
Blender crashes when canceling a compositor job if a transform node is
used. This is because freeing shared data didn't reset data members, so
it still thinks it has allocated data, which will be double freed
causing crashes. To fix this, we simply clear data members if data is
still shared.
This replaces the deprecated DrawData mechanism by the
usage of the update timestamp `last_update`.
The compositor keeps the `last_update` value of the cached ID
and compares it with the value on the ID at the time of evaluation.
Rel #134690
Pull Request: https://projects.blender.org/blender/blender/pulls/134878
This patch refactors how single values are passed to MF operations by
utilizing generic pointers. This avoids boilerplate and makes it easier
to add new types.
This patch provides const variants to the cpu_data accessors that
returns GSpan. This is done to better const correctness. Some code need
non-const data even for read-only data, so we have to const cast those
for the moment.
This patch refactors GPU implicit conversion code by generating the name
of the shaders on the fly using fmt. This reduces boilerplate and makes
it easier to add new types.
This patch refactors the result class to replace proxy results with the
possibility of doing data sharing through a shared heap allocated data
reference count. This is more robust and simpler since proxy results no
longer need to be handled as a special case in a lot of the results
code. Additionally, it allows stronger const correctness since inputs to
operations can now be const.
This is somewhat similar to implicit sharing used in other parts of
Blender, so we can look into using that in the future.
Pull Request: https://projects.blender.org/blender/blender/pulls/135778
The general idea is to keep the 'old', C-style MEM_callocN signature, and slowly
replace most of its usages with the new, C++-style type-safer template version.
* `MEM_cnew<T>` allocation version is renamed to `MEM_callocN<T>`.
* `MEM_cnew_array<T>` allocation version is renamed to `MEM_calloc_arrayN<T>`.
* `MEM_cnew<T>` duplicate version is renamed to `MEM_dupallocN<T>`.
Similar templates type-safe version of `MEM_mallocN` will be added soon
as well.
Following discussions in !134452.
NOTE: For now static type checking in `MEM_callocN` and related are slightly
different for Windows MSVC. This compiler seems to consider structs using the
`DNA_DEFINE_CXX_METHODS` macro as non-trivial (likely because their default
copy constructors are deleted). So using checks on trivially
constructible/destructible instead on this compiler/system.
Pull Request: https://projects.blender.org/blender/blender/pulls/134771
This patch removes the use of auto_resource_location during shader
operation material compilation. Instead, we assign slot locations
explicitly. Output images are just assigned incremental indices, while
input samplers are also assigned incremental indices, but starting from
the number of textures in the material, because color bands might be
reserving slots already.
Pull Request: https://projects.blender.org/blender/blender/pulls/135453
This patch refactors how the compositor deals with outputs that are
allocated but not needed. Previously we allowed such allocations and
implicitly release them right after operations are computed. This is a
weak design, so we now require developers to only allocate outputs that
are actually needed and assert otherwise. The release mechanism is
therefore removed.
The Z Combine node asserts if one of its outputs are unused. That's
because we compete both outputs even if they are not needed. To fix
this, we skip outputs that are not needed by splitting the shaders to
compute each output independently.
This patch refactors static cache invalidation of images by tracking an
update count. Images now store a runtime update count that is updated
every time the image is tagged for update. Cached images store a copy of
the update count at the moment they were cached, and are invalidated if
if it changed.
Compared to #134878, this is simpler and more robust, since update IDs
are isolated to images only and not to the DEG update count. Though this
only supports images specifically because they are not covered by the
copy-on-evaluation system, which means #134878 will cause multiple
depsgraph to fight over images.
Pull Request: https://projects.blender.org/blender/blender/pulls/134905
The Keying node asserts if its Edges output is unused. That's because we
compete the outputs even if it not needed. To fix this, we skip that
output if not needed.
Replace `bNodeInstanceHash` with a `Map`. Move it to the node tree
runtime data. Simplify some code by removing the tag from the hash
value and collecting unused previews directly. Then just remove a
bunch of code that's now unused.
Note that texture node previews haven't been working for a while
anyway, and the experimental shader node previews seem to use
a different system (this one is a remnant of Blender Internal).
Pull Request: https://projects.blender.org/blender/blender/pulls/135310
The compositor asserts if an unsupported unavailable socket exists. This
assert should not exist, because the GPU material compiler will itself
gracefully handle such sockets when their type is GPU_NONE, which is
already the case. So remove the assert and add a note about the
behavior.
This patches removes common_math_utils includes from compositor shaders
and replaces them with math lib includes. This involves moving some
functions from that file to to the math lib files.
Pull Request: https://projects.blender.org/blender/blender/pulls/135157
This patch uses the BKE implicit conversion rules to implement implicit
conversion in multi-function procedure operations in the compositor.
Since conversions use ColorSceneLinear4f to represent colors instead of
float4, we need to insert extra implicit conversions around color
sockets.
Special attention was given to variable destruction in the
implementation because it is now possible for implicit variables to be
outputs, so we need to make sure they are not destructed.
Blender crashes when using the Denoise node. That's because the code
assumes normal input would have 4-channels, while this may not be the
case. To fix this, use the channels count from the result or the GPU
texture directly.
Some passes are now interpreted as vectors by the compositor Image node.
This is because it assumes 3-channel passes are always vector, but this
is not the case for passes that are RGB without an alpha channel. To fix
this, we also consider channel IDs to disambiguate the type of the pass.
Previously, the vector type in the compositor had 4-components to
accommodated float4 types, while the last component was ignored for the
rest of the vector types. But now that we have a dedicated type for
float4 in #134486. We can reduce that vector type to 3-components.
Pull Request: https://projects.blender.org/blender/blender/pulls/134570
The compositor previously overloaded the vector type to represent
multiple dimensions that are always stored in a 4D float vector. This
patch introduce a dedicated type for float4, leaving the vector type to
always represent a 3D vector, which will be done in a later commit.
This is not exposed to the user as a separate socket type with a
different color, it is only an internal type that uses the same vector
socket shape and color.
Since the vector socket represents both 4D and 3D vectors, code
generally assumes that such sockets represents 3D vectors, and the
developer is expected to set it to a 4D vector if needed in the node
operation constructor, or use the newly added skip_type_conversion flag
for nodes that do not care about types, like the File Output node.
Though this should be redundant once we add a dimension property for
vector sockets.
Pull Request: https://projects.blender.org/blender/blender/pulls/134486
The new CPU compositor in v4.4 is much slower than the old CPU
compositor in v4.3 on Windows. This is because MSVC does not inline many
of the core methods in the Result class of the compositor. To fix this,
we force inline those methods, adding a new macro for inlining methods
in the process, since the existing macro has the static keyword, which
only works for functions, not methods.
Pull Request: https://projects.blender.org/blender/blender/pulls/134748
The compositor leaks memory in certain setups where a transformed result
is linked to two inputs in the same pixel node. This happens due to an
overestimation in the computation of reference counts of those results.
Since pixel operations might share the same input for multiple links in
the node tree, the reference count should be corrected to take that
sharing into account.
This sharing was previously accounted for as part of releasing inputs in
the pixel operation, but input processors didn't take that into account,
so the realize on domain input processor would leak memory. So to fix
this, we correct the reference count at the evaluator level instead,
such that input processors can safely operate on the correct reference
count.
Pull Request: https://projects.blender.org/blender/blender/pulls/134666
This patch removes the compositor texture pool implementation which
relies on the DRW texture pool, and replaces it with the new texture
pool implementation from the GPU module.
Since the GPU module texture pool does not rely on the global DST, we
can use it for both the viewport compositor engine and the GPU
compositor, so the virtual texture pool implementation is removed and
the GPU texture pool is used directly.
The viewport compositor engine does not need to reset the pool because
that is done by the draw manager. But the GPU compositor needs to reset
the pool every evaluation. The pool is deleted directly after rendering
using the render pipeline or through RE_FreeUnusedGPUResources for the
interactive compositor.
Pull Request: https://projects.blender.org/blender/blender/pulls/134437
The Keying Screen and the Anti-Aliasing nodes produce bad results in the
viewport compositor. This is because their cached resources are
allocated from the GPU texture pool, so they might get overwritten.
To fix this, we make sure those are not allocated from the texture pool.
The symmetric separate blur operation in the compositor is two times
slower in 4.4 compared to 4.3. On Linux, this only happens when Blender
is compiled with GCC, because Clang inlines a small function that GCC
doesn't.
To fix this, we specialize an if statement using templates to help GCC
inline the function. This results in a 3.5 times faster execution.
Pull Request: https://projects.blender.org/blender/blender/pulls/134336
This patch refactors the ShaderNode class to be a concrete class that
is implemented in terms of the node type gpu_fn. This is done to make it
easier to reuse existing nodes in other parts of Blender.
Pull Request: https://projects.blender.org/blender/blender/pulls/134210
This patch refactors the Result class in the compositor to use
GMutableSpan and std::variant to wrap the result's data. This reduces
the complexity of the code and slightly optimizes performance. This will
also make it easier to add new types and interface with other code like
multi-function procedures.
Pull Request: https://projects.blender.org/blender/blender/pulls/134112