This patch implements the multi-function procedure operation for the new
CPU compositor, which is a concrete implementation of the PixelOperation
abstraction, much like ShaderOperation, but uses the FN system to more
efficiently evaluate a group of pixel-wise operations.
A few changes were done to FN to support development. The multi-function
builder now allows retrieving the built function. A new builder method
construct_and_set_matching_fn_cb was added to allow using the SI_SO
builders with non static functions. A few other SI_SO were added to. And
a CPP type for float4 was added.
Additionally, the Gamma, Math, Brightness, and Normal nodes were
implemented as an example. The Math node implementation reused the
existing GN math node implementation, so the code was moved to a common
file.
Reference #125968.
Pull Request: https://projects.blender.org/blender/blender/pulls/126988
This adds a new type of zone to Geometry Nodes that allows executing some nodes
for each element in a geometry.
## Features
* The `Selection` input allows iterating over a subset of elements on the set
domain.
* Fields passed into the input node are available as single values inside of the
zone.
* The input geometry can be split up into separate (completely independent)
geometries for each element (on all domains except face corner).
* New attributes can be created on the input geometry by outputting a single
value from each iteration.
* New geometries can be generated in each iteration.
* All of these geometries are joined to form the final output.
* Attributes from the input geometry are propagated to the output
geometries.
## Evaluation
The evaluation strategy is similar to the one used for repeat zones. Namely, it
dynamically builds a `lazy_function::Graph` once it knows how many iterations
are necessary. It contains a separate node for each iteration. The inputs for
each iteration are hardcoded into the graph. The outputs of each iteration a
passed to a separate lazy-function that reduces all the values down to the final
outputs. This final output can have a huge number of inputs and that is not
ideal for multi-threading yet, but that can still be improved in the future.
## Performance
There is a non-neglilible amount of overhead for each iteration. The overhead is
way larger than the per-element overhead when just doing field evaluation.
Therefore, normal field evaluation should be preferred when possible. That can
partially still be optimized if there is only some number crunching going on in
the zone but that optimization is not implemented yet.
However, processing many small geometries (e.g. each hair of a character
separately) will likely **always be slower** than working on fewer larger
geoemtries. The additional flexibility you get by processing each element
separately comes at the cost that Blender can't optimize the operation as well.
For node groups that need to handle lots of geometry elements, we recommend
trying to design the node setup so that iteration over tiny sub-geometries is
not required.
An opposite point is true as well though. It can be faster to process more
medium sized geometries in parallel than fewer very large geometries because of
more multi-threading opportunities. The exact threshold between tiny, medium and
large geometries depends on a lot of factors though.
Overall, this initial version of the new zone does not implement all
optimization opportunities yet, but the points mentioned above will still hold
true later.
Pull Request: https://projects.blender.org/blender/blender/pulls/127331
This continues the cmake modernization effort and introduces support for
allowing our optional dependencies to integrate properly. TBB is added
here as it's proven troublesome to maintain correctly.
Currently the only Blender project which uses the TBB headers directly
is `blenlib`. However, all downstream projects which require blenlib as
their dependency, and wish to properly make use of its threading
facilities, needed to define various TBB items in their CMake files. Not
only is this unnecessary and arcane, but several projects didn't do this
and ended up not using threading as well as producing ODR violations
along the way[1].
This PR makes TBB a modern dependency and exposes it PUBLIC'ly from
`blenlib`. All downstream projects which depend on blenlib will now
receive everything they require from TBB automatically. This includes
the `WITH_TBB` define, the headers, and the library itself.
[1] blender/blender@05241f47f5
Pull Request: https://projects.blender.org/blender/blender/pulls/124916
The depsgraph CoW mechanism is a bit of a misnomer. It creates an
evaluated copy for data-blocks regardless of whether the copy will
actually be written to. The point is to have physical separation between
original and evaluated data. This is in contrast to the commonly used
performance improvement of keeping a user count and copying data
implicitly when it needs to be changed. In Blender code we call this
"implicit sharing" instead. Importantly, the dependency graph has no
idea about the _actual_ CoW behavior in Blender.
Renaming this functionality in the despgraph removes some of the
confusion that comes up when talking about this, and will hopefully
make the depsgraph less confusing to understand initially too. Wording
like "the evaluated copy" (as opposed to the original data-block) has
also become common anyway.
Pull Request: https://projects.blender.org/blender/blender/pulls/118338
Bundling many tests in a single binary reduces build time and disk space
usage, but is less convenient for running individual tests command line
as filter flags need to be used.
This adds WITH_TESTS_SINGLE_BINARY to generate one executable file per
source file. Note that enabling this option requires a significant amount
of disk space.
Due to refactoring, the resulting ctest names are a bit different than
before. The number of tests is also a bit different depending if this
option is used, as one uses gtests discovery and the other is organized
purely by filename, which isn't always 1:1.
Co-authored-by: Sergey Sharybin <sergey@blender.org>
Pull Request: https://projects.blender.org/blender/blender/pulls/114604
Along with the 4.1 libraries upgrade, we are bumping the clang-format
version from 8-12 to 17. This affects quite a few files.
If not already the case, you may consider pointing your IDE to the
clang-format binary bundled with the Blender precompiled libraries.
This refactors `SocketValueVariant` with the following goals in mind:
* Support type erasure so that not all users of `SocketValueVariant` have
to know about all the types sockets can have.
* Move towards supporting "rainbow sockets" which are sockets whoose
type is only known at run-time.
* Reduce complexity when dealing with socket values in general. Previously,
one had to use `SocketValueVariantCPPType` a lot to manage uninitialized
memory. This is better abstracted away now.
One related change that I had to do that I didn't see coming at first was that
I had to refactor `set_default_remaining_outputs` because now the default value
of a `SocketValueVariant` would not contain any value. Previously, it was
initialized the zero-value of the template parameter. Similarly, I had to change
how implicit conversions are created, because comparing the `CPPType` of linked
sockets was not enough anymore to determine if a conversion is necessary.
We could potentially use `SocketValueVariant` for the remaining socket types in the
future as well. Not entirely sure if that helps yet. `SocketValueVariant` can easily be
adapted to make that work though. That would also justify the name
"SocketValueVariant" better.
Pull Request: https://projects.blender.org/blender/blender/pulls/116231
Due to changes in the build environment shader_builder wasn't able to
compile on macOs. This patch reverts several recent changes to CMake files.
* dbb2844ed9
* 94817f64b9
* 1b6cd937ff
The idea is that in the near future shader_builder will run on the buildbot as
part of any regular build to ensure that changes to the CMake doesn't break
shader_builder and we only detect it after a few days.
Pull Request: https://projects.blender.org/blender/blender/pulls/115929
NDEBUG is part of the C standard and disables asserts. Only this will
now be used to decide if asserts are enabled.
DEBUG was a Blender specific define, that has now been removed.
_DEBUG is a Visual Studio define for builds in Debug configuration.
Blender defines this for all platforms. This is still used in a few
places in the draw code, and in external libraries Bullet and Mantaflow.
Pull Request: https://projects.blender.org/blender/blender/pulls/115774
This struct is currently defined in the `functions` module but not actually used there. It's only used by the geometry nodes module, with an indirect dependency from blenkernel via simulation zone baking. This scope is problematic when adding grids as socket data, which should not be part of the functions module.
The `ValueOrField` struct is now moved to blenkernel, so it can be more easily extended to other kinds of data that might be passed around by geometry nodes sockets in future. No functional changes.
Pull Request: https://projects.blender.org/blender/blender/pulls/115087
Nodes that are scheduled can be executed in any order in theory. So when
there are many scheduled nodes, it can be benefitial to start evaluating
them in parallel.
Note that it is not very common that many nodes are scheduled at the
same time in typical setups because the evaluator uses a depth-first heuristic
to decide in which order to evaluate nodes. It can happen more easily in
generated node trees though.
Also, this change only has an affect in practice if none of the scheduled nodes
uses multi-threading internally, as this would also trigger the user of multiple
threads in the graph executor.
This idea is of remapping parameters of lazy-functions is useful
not only for the repeat zone. For example, it could be used for the
for-each zone as well.
Also, moving it to a more general place indicates that there is no
repeat-zone specific stuff in it.
The main goal of this refactor is to simplify how a geometry node group is executed.
Previously, there was duplicated logic that turned the lazy-function graph of a node
group into a single lazy-function. Now this is done only in one place and others can
just execute the lazy-function directly, without having to worry about the underlying graph.
Pull Request: https://projects.blender.org/blender/blender/pulls/112482
Goals of the refactor:
* Simplify adding (named) graph inputs and outputs.
* Add ability to refer to a graph input or output with an index.
* Get rid of the "dummy" terminology which doesn't really help.
Previously, one would add "dummy nodes" which can then serve as input
and output nodes of the graph. Now one directly adds input and outputs
using `Graph.add_input` and `Graph.add_output`. There is one interface
node that contains all inputs and another one that contains all outputs.
Being able to refer to a graph input or output with an index makes it
more efficient to implement some algorithms. E.g. one could have a
bit span for a socket that contains all the information what graph
inputs this socket depends on.
Pull Request: https://projects.blender.org/blender/blender/pulls/112474
This is a light weight solution to passing in some extra context into
a lazy-function that is invoked by the graph executor.
The new functionality is used by #112421.
By storing a raw pointer instead of a `Span`, we save 16 bytes
per node state. I measured a ~5% speedup in my setup with
a simple repeat zone.
5c450aea05 added some additional asserts to check for valid
indices. Generally, index-errors in this area lead to wrong
behaviors of geometry nodes very quickly.
There are many small allocations when the graph executor is
initialized (e.g. all the node/sockets have to be allocated). Those
were already combined into a few allocations by making use
of `LinearAllocator`. However, even better performance can be
achieved by making one larger allocation and then using
preprocessed offsets into that buffer.
I measured up to 20% speedup in geometry nodes with a simple
repeat zone.
Including <iostream> or similar headers is quite expensive, since it
also pulls in things like <locale> and so on. In many BLI headers,
iostreams are only used to implement some sort of "debug print",
or an operator<< for ostream.
Change some of the commonly used places to instead include <iosfwd>,
which is the standard way of forward-declaring iostreams related
classes, and move the actual debug-print / operator<< implementations
into .cc files.
This is not done for templated classes though (it would be possible
to provide explicit operator<< instantiations somewhere in the
source file, but that would lead to hard-to-figure-out linker error
whenever someone would add a different template type). There, where
possible, I changed from full <iostream> include to only the needed
<ostream> part.
For Span<T>, I just removed print_as_lines since it's not used by
anything. It could be moved into a .cc file using a similar approach
as above if needed.
Doing full blender build changes include counts this way:
- <iostream> 1986 -> 978
- <sstream> 2880 -> 925
It does not affect the total build time much though, mostly because
towards the end of it there's just several CPU cores finishing
compiling OpenVDB related source files.
Pull Request: https://projects.blender.org/blender/blender/pulls/111046