Commit Graph

38 Commits

Author SHA1 Message Date
Jacques Lucke
b7a1325c3c BLI: use blender::Mutex by default which wraps tbb::mutex
This patch adds a new `BLI_mutex.hh` header which adds `blender::Mutex` as alias
for either `tbb::mutex` or `std::mutex` depending on whether TBB is enabled.

Description copied from the patch:
```
/**
 * blender::Mutex should be used as the default mutex in Blender. It implements a subset of the API
 * of std::mutex but has overall better guaranteed properties. It can be used with RAII helpers
 * like std::lock_guard. However, it is not compatible with e.g. std::condition_variable. So one
 * still has to use std::mutex for that case.
 *
 * The mutex provided by TBB has these properties:
 * - It's as fast as a spin-lock in the non-contended case, i.e. when no other thread is trying to
 *   lock the mutex at the same time.
 * - In the contended case, it spins a couple of times but then blocks to avoid draining system
 *   resources by spinning for a long time.
 * - It's only 1 byte large, compared to e.g. 40 bytes when using the std::mutex of GCC. This makes
 *   it more feasible to have many smaller mutexes which can improve scalability of algorithms
 *   compared to using fewer larger mutexes. Also it just reduces "memory slop" across Blender.
 * - It is *not* a fair mutex, i.e. it's not guaranteed that a thread will ever be able to lock the
 *   mutex when there are always more than one threads that try to lock it. In the majority of
 *   cases, using a fair mutex just causes extra overhead without any benefit. std::mutex is not
 *   guaranteed to be fair either.
 */
 ```

The performance benchmark suggests that the impact is negilible in almost
all cases. The only benchmarks that show interesting behavior are the once
testing foreach zones in Geometry Nodes. These tests are explicitly testing
overhead, which I still have to reduce over time. So it's not unexpected that
changing the mutex has an impact there. What's interesting is that on macos the
performance improves a lot while on linux it gets worse. Since that overhead
should eventually be removed almost entirely, I don't really consider that
blocking.

Links:
* Documentation of different mutex flavors in TBB:
  https://www.intel.com/content/www/us/en/docs/onetbb/developer-guide-api-reference/2021-12/mutex-flavors.html
* Older implementation of a similar mutex by me:
  https://archive.blender.org/developer/differential/0016/0016711/index.html
* Interesting read regarding how a mutex can be this small:
  https://webkit.org/blog/6161/locking-in-webkit/

Pull Request: https://projects.blender.org/blender/blender/pulls/138370
2025-05-07 04:53:16 +02:00
Jacques Lucke
b6342a7e94 Cleanup: simplify allocating buffers for a CPPType
This reduces verbosity when using `LinearAllocator` or `ResourceScope` to
allocate values for a `CPPType`. Now, this is simplified and one also does not
have to manually add a destructor call anymore.

Pull Request: https://projects.blender.org/blender/blender/pulls/137685
2025-04-17 22:01:07 +02:00
Jacques Lucke
7f1a99e862 Refactor: BLI: Make some CPPType properties public instead of using methods
This makes accessing these properties more convenient. Since we only ever have
const references to `CPPType`, there isn't really a benefit to using methods to
avoid mutation.

Pull Request: https://projects.blender.org/blender/blender/pulls/137482
2025-04-14 17:48:17 +02:00
Jacques Lucke
8ec9c62d3e Geometry Nodes: add Closures and Bundles behind experimental feature flag
This implements bundles and closures which are described in more detail in this
blog post: https://code.blender.org/2024/11/geometry-nodes-workshop-october-2024/

tl;dr:
* Bundles are containers that allow storing multiple socket values in a single
  value. Each value in the bundle is identified by a name. Bundles can be
  nested.
* Closures are functions that are created with the Closure Zone and can be
  evaluated with the Evaluate Closure node.

To use the patch, the `Bundle and Closure Nodes` experimental feature has to be
enabled. This is necessary, because these features are not fully done yet and
still need iterations to improve the workflow before they can be officially
released. These iterations are easier to do in `main` than in a separate branch
though. That's because this patch is quite large and somewhat prone to merge
conflicts. Also other work we want to do, depends on this.

This adds the following new nodes:
* Combine Bundle: can pack multiple values into one.
* Separate Bundle: extracts values from a bundle.
* Closure Zone: outputs a closure zone for use in the `Evaluate Closure` node.
* Evaluate Closure: evaluates the passed in closure.

Things that will be added soon after this lands:
* Fields in bundles and closures. The way this is done changes with #134811, so
  I rather implement this once both are in `main`.
* UI features for keeping sockets in sync (right now there are warnings only).

One bigger issue is the limited support for lazyness. For example, all inputs of
a Combine Bundle node will be evaluated, even if they are not all needed. The
same is true for all captured values of a closure. This is a deeper limitation
that needs to be resolved at some point. This will likely be done after an
initial version of this patch is done.

Pull Request: https://projects.blender.org/blender/blender/pulls/128340
2025-04-03 15:44:06 +02:00
Hans Goudey
b2bbb0a8fb Cleanup: Remove unused includes in functions module
Pull Request: https://projects.blender.org/blender/blender/pulls/133455
2025-01-22 23:29:33 +01:00
Jacques Lucke
5ee60600a3 Geometry Nodes: improve debug graph for repeat zone
This makes some labels more self-explanatory in the graph generated with
`graph.to_dot()`.
2024-09-12 18:20:52 +02:00
Hans Goudey
81a63153d0 Despgraph: Rename "copy-on-write" to "copy-on-evaluation"
The depsgraph CoW mechanism is a bit of a misnomer. It creates an
evaluated copy for data-blocks regardless of whether the copy will
actually be written to. The point is to have physical separation between
original and evaluated data. This is in contrast to the commonly used
performance improvement of keeping a user count and copying data
implicitly when it needs to be changed. In Blender code we call this
"implicit sharing" instead. Importantly, the dependency graph has no
idea about the _actual_ CoW behavior in Blender.

Renaming this functionality in the despgraph removes some of the
confusion that comes up when talking about this, and will hopefully
make the depsgraph less confusing to understand initially too. Wording
like "the evaluated copy" (as opposed to the original data-block) has
also become common anyway.

Pull Request: https://projects.blender.org/blender/blender/pulls/118338
2024-02-19 15:54:08 +01:00
Brecht Van Lommel
e06561a27a Build: replace Blender specific DEBUG by standard NDEBUG
NDEBUG is part of the C standard and disables asserts. Only this will
now be used to decide if asserts are enabled.

DEBUG was a Blender specific define, that has now been removed.

_DEBUG is a Visual Studio define for builds in Debug configuration.
Blender defines this for all platforms. This is still used in a few
places in the draw code, and in external libraries Bullet and Mantaflow.

Pull Request: https://projects.blender.org/blender/blender/pulls/115774
2023-12-06 16:05:14 +01:00
Campbell Barton
137f8dd7bc Cleanup: spelling in comments 2023-10-10 09:44:57 +11:00
Jacques Lucke
7bd509f73a Functions: enable multi-threading when many nodes are scheduled at once
Nodes that are scheduled can be executed in any order in theory. So when
there are many scheduled nodes, it can be benefitial to start evaluating
them in parallel.

Note that it is not very common that many nodes are scheduled at the
same time in typical setups because the evaluator uses a depth-first heuristic
to decide in which order to evaluate nodes. It can happen more easily in
generated node trees though.

Also, this change only has an affect in practice if none of the scheduled nodes
uses multi-threading internally, as this would also trigger the user of multiple
threads in the graph executor.
2023-10-08 16:21:23 +02:00
Jacques Lucke
62e2cc0ad0 Geometry Nodes: refactor geometry nodes execution interface
The main goal of this refactor is to simplify how a geometry node group is executed.
Previously, there was duplicated logic that turned the lazy-function graph of a node
group into a single lazy-function. Now this is done only in one place and others can
just execute the lazy-function directly, without having to worry about the underlying graph.

Pull Request: https://projects.blender.org/blender/blender/pulls/112482
2023-09-17 19:09:45 +02:00
Jacques Lucke
4db6a22c72 Functions: use array indexing instead of VectorSet in graph executor
This avoids the need to build the VectorSet and array
indexing is generally faster than a hash table lookup.
2023-09-17 14:27:01 +02:00
Jacques Lucke
2a5f3bd1cc Functions: refactor lazy-function graph interface
Goals of the refactor:
* Simplify adding (named) graph inputs and outputs.
* Add ability to refer to a graph input or output with an index.
* Get rid of the "dummy" terminology which doesn't really help.

Previously, one would add "dummy nodes" which can then serve as input
and output nodes of the graph. Now one directly adds input and outputs
using `Graph.add_input` and `Graph.add_output`. There is one interface
node that contains all inputs and another one that contains all outputs.

Being able to refer to a graph input or output with an index makes it
more efficient to implement some algorithms. E.g. one could have a
bit span for a socket that contains all the information what graph
inputs this socket depends on.

Pull Request: https://projects.blender.org/blender/blender/pulls/112474
2023-09-17 13:54:09 +02:00
Jacques Lucke
54fd33d783 Functions: support wrapping lazy-function node execute function
This is a light weight solution to passing in some extra context into
a lazy-function that is invoked by the graph executor.
The new functionality is used by #112421.
2023-09-16 18:50:54 +02:00
Jacques Lucke
93f8d55473 Function: add assert to detect invalid side effect nodes early 2023-09-16 18:44:58 +02:00
Jacques Lucke
bd414cdbda Functions: reduce memory usage in node state
By storing a raw pointer instead of a `Span`, we save 16 bytes
per node state. I measured a ~5% speedup in my setup with
a simple repeat zone.

5c450aea05 added some additional asserts to check for valid
indices. Generally, index-errors in this area lead to wrong
behaviors of geometry nodes very quickly.
2023-09-16 12:30:23 +02:00
Jacques Lucke
60c65ab13b Functions: better pack socket state structs
This reduces the amount of used memory.
2023-09-16 12:11:08 +02:00
Jacques Lucke
c74a309209 Functions: combine allocations in lazy function graph executor
There are many small allocations when the graph executor is
initialized (e.g. all the node/sockets have to be allocated). Those
were already combined into a few allocations by making use
of `LinearAllocator`. However, even better performance can be
achieved by making one larger allocation and then using
preprocessed offsets into that buffer.

I measured up to 20% speedup in geometry nodes with a simple
repeat zone.
2023-09-16 11:38:40 +02:00
Aras Pranckevicius
acbd952abf Cleanup: fewer iostreams related includes from BLI/BKE headers
Including <iostream> or similar headers is quite expensive, since it
also pulls in things like <locale> and so on. In many BLI headers,
iostreams are only used to implement some sort of "debug print",
or an operator<< for ostream.

Change some of the commonly used places to instead include <iosfwd>,
which is the standard way of forward-declaring iostreams related
classes, and move the actual debug-print / operator<< implementations
into .cc files.

This is not done for templated classes though (it would be possible
to provide explicit operator<< instantiations somewhere in the
source file, but that would lead to hard-to-figure-out linker error
whenever someone would add a different template type). There, where
possible, I changed from full <iostream> include to only the needed
<ostream> part.

For Span<T>, I just removed print_as_lines since it's not used by
anything. It could be moved into a .cc file using a similar approach
as above if needed.

Doing full blender build changes include counts this way:
- <iostream> 1986 -> 978
- <sstream> 2880 -> 925

It does not affect the total build time much though, mostly because
towards the end of it there's just several CPU cores finishing
compiling OpenVDB related source files.

Pull Request: https://projects.blender.org/blender/blender/pulls/111046
2023-08-16 09:51:37 +02:00
Campbell Barton
e955c94ed3 License Headers: Set copyright to "Blender Authors", add AUTHORS
Listing the "Blender Foundation" as copyright holder implied the Blender
Foundation holds copyright to files which may include work from many
developers.

While keeping copyright on headers makes sense for isolated libraries,
Blender's own code may be refactored or moved between files in a way
that makes the per file copyright holders less meaningful.

Copyright references to the "Blender Foundation" have been replaced with
"Blender Authors", with the exception of `./extern/` since these this
contains libraries which are more isolated, any changed to license
headers there can be handled on a case-by-case basis.

Some directories in `./intern/` have also been excluded:

- `./intern/cycles/` it's own `AUTHORS` file is planned.
- `./intern/opensubdiv/`.

An "AUTHORS" file has been added, using the chromium projects authors
file as a template.

Design task: #110784

Ref !110783.
2023-08-16 00:20:26 +10:00
Jacques Lucke
201a442750 Functions: improve default debug names in lazy function graph executor 2023-06-16 10:53:11 +02:00
Sergey Sharybin
c1bc70b711 Cleanup: Add a copyright notice to files and use SPDX format
A lot of files were missing copyright field in the header and
the Blender Foundation contributed to them in a sense of bug
fixing and general maintenance.

This change makes it explicit that those files are at least
partially copyrighted by the Blender Foundation.

Note that this does not make it so the Blender Foundation is
the only holder of the copyright in those files, and developers
who do not have a signed contract with the foundation still
hold the copyright as well.

Another aspect of this change is using SPDX format for the
header. We already used it for the license specification,
and now we state it for the copyright as well, following the
FAQ:

    https://reuse.software/faq/
2023-05-31 16:19:06 +02:00
Jacques Lucke
8ba9d7b67a Functions: improve handling of thread-local data in lazy functions
The main goal here is to reduce the number of times thread-local data has
to be looked up using e.g. `EnumerableThreadSpecific.local()`. While this
isn't a bottleneck in many cases, it is when the action performed on the local
data is very short and that happens very often (e.g. logging used sockets
during geometry nodes evaluation).

The solution is to simply pass the thread-local data as parameter to many
functions that use it, instead of looking it up in those functions which
generally is more costly.

The lazy-function graph executor now only looks up the local data if
it knows that it might be on a new thread, otherwise it uses the local data
retrieved earlier.

Alongside with `UserData` there is `LocalUserData` now. This allows users
of the lazy-function evaluation (such as geometry nodes) to have custom
thread-local data that is passed to all the lazy-functions automatically.
This is used for logging now.
2023-05-09 13:13:52 +02:00
Campbell Barton
6859bb6e67 Cleanup: format (with BraceWrapping::AfterControlStatement "MultiLine") 2023-05-02 09:37:49 +10:00
Sergey Sharybin
d32d787f5f Clang-Format: Allow empty functions to be single-line
For example

```
OIIOOutputDriver::~OIIOOutputDriver()
{
}
```

becomes

```
OIIOOutputDriver::~OIIOOutputDriver() {}
```

Saves quite some vertical space, which is especially handy for
constructors.

Pull Request: https://projects.blender.org/blender/blender/pulls/105594
2023-03-29 16:50:54 +02:00
Jacques Lucke
73a2c79c07 Functions: free memory of unused sockets earlier
During geometry nodes evaluation some sockets can be determined
to be unused, for example based on the condition input in a switch node.
Once a socket is determined to be unused, that information has to be
propagated backwards through the tree to free any memory that may
have been reserved for those sockets already. This is happening before
this commit already, but in a less ideal way.

Determining that sockets are unused early is good because it helps with
memory reuse and avoids copy-on-write copies caused by shared data.
Now, nodes that are scheduled because an output became unused have
priority over nodes scheduled for other reasons.
2023-01-08 21:09:33 +01:00
Campbell Barton
14fc02f91d Cleanup: spelling in comments 2023-01-06 14:00:36 +11:00
Jacques Lucke
3819a9b15a Fix T103614: crash during geometry nodes evaluation with tbb disabled 2023-01-05 15:36:39 +01:00
Jacques Lucke
83f519b7c1 Functions: initialize node storage and default values on first execution
Previously, this happened when the "node task" first runs, which might
not actually execute the node if there are missing inputs. Deferring the
allocation of storage and default inputs allows for better memory reuse
later (currently the memory is not reused).
2023-01-04 18:46:50 +01:00
Jacques Lucke
0bc0e3f9f7 Fix: geometry nodes crashes with large trees
This was an oversight in rBdba2d828462ae22de5.
The evaluator uses multiple threads to initialize node states
but it is still in single threaded mode.
`get_main_or_local_allocator` did not return the right allocator
in this case.
2023-01-02 18:34:01 +01:00
Jacques Lucke
dba2d82846 Geometry Nodes: avoid using enumerable thread specific on single thread
The geometry nodes evaluator supports "lazy threading", i.e. it starts out
single-threaded. But when it determines that multi-threading can be
benefitial, it switches to multi-threaded mode.

Now it only creates an enumerable-thread-specific if it is actually using
multiple threads. This results in a 6% speedup in my test file with many
node groups and math nodes.
2022-12-29 21:05:58 +01:00
Jacques Lucke
b6ca942e47 Functions: support cycles in lazy-function graph
Lazy-function graphs are now evaluated properly even if they contain
cycles. Note that cycles are only ok if there is no data dependency cycle.
For example, a node might output something that is fed back into itself.
As long as the output can be computed without the input that it feeds into,
everything is ok.

The code that builds the graph is responsible for making sure that there
are no actual data dependencies.
2022-12-29 16:39:40 +01:00
Jacques Lucke
0ebb7ab41f Geometry Nodes: disable unreachable nodes in evaluator
Nodes that were not connected to any output could still impact performance.
While they were never executed, sometimes their inputs could keep references
to geometries that other nodes want to modify. That caused unnecessary geometry
copies, because a geometry can only be modified if it is not shared.

Now, inputs that will never be used are tagged accordingly and they will never
have references to geometries that others might want to modify.
2022-11-16 14:26:11 +01:00
Jacques Lucke
edcce2c073 Cleanup: correct inverted variable name 2022-11-16 13:19:23 +01:00
Campbell Barton
5517c848bd Cleanup: spelling in comments 2022-09-21 12:00:01 +10:00
Jacques Lucke
5c81d3bd46 Geometry Nodes: improve evaluator with lazy threading
In large node setup the threading overhead was sometimes very significant.
That's especially true when most nodes do very little work.

This commit improves the scheduling by not using multi-threading in many
cases unless it's likely that it will be worth it. For more details see the comments
in `BLI_lazy_threading.hh`.

Differential Revision: https://developer.blender.org/D15976
2022-09-20 11:08:05 +02:00
Campbell Barton
f78219c9a8 Cleanup: spelling in comments 2022-09-13 18:03:09 +10:00
Jacques Lucke
4130f1e674 Geometry Nodes: new evaluation system
This refactors the geometry nodes evaluation system. No changes for the
user are expected. At a high level the goals are:
* Support using geometry nodes outside of the geometry nodes modifier.
* Support using the evaluator infrastructure for other purposes like field evaluation.
* Support more nodes, especially when many of them are disabled behind switch nodes.
* Support doing preprocessing on node groups.

For more details see T98492.

There are fairly detailed comments in the code, but here is a high level overview
for how it works now:
* There is a new "lazy-function" system. It is similar in spirit to the multi-function
  system but with different goals. Instead of optimizing throughput for highly
  parallelizable work, this system is designed to compute only the data that is actually
  necessary. What data is necessary can be determined dynamically during evaluation.
  Many lazy-functions can be composed in a graph to form a new lazy-function, which can
  again be used in a graph etc.
* Each geometry node group is converted into a lazy-function graph prior to evaluation.
  To evaluate geometry nodes, one then just has to evaluate that graph. Node groups are
  no longer inlined into their parents.

Next steps for the evaluation system is to reduce the use of threads in some situations
to avoid overhead. Many small node groups don't benefit from multi-threading at all.
This is much easier to do now because not everything has to be inlined in one huge
node tree anymore.

Differential Revision: https://developer.blender.org/D15914
2022-09-13 08:44:32 +02:00