This adds a `VArray::from_std_func` next to the existing `VArray::from_func`. It
does almost the same but inserts some type erasure by using `std::function`
which avoids the need to instantiate the full virtual array. This has is a bit
slower at run-time, but in practice there are some cases where it doesn't
matter. Currently, this patch reduces binary size by ~35kb. Not too much, but
still good or such a simple change.
Pull Request: https://projects.blender.org/blender/blender/pulls/146011
Previously, `VArrayImpl` had a `materialize` and `materialize_to_uninitialized`
function. Now both are merged into one with an additional `bool
dst_is_uninitialized` parameter. The same is done for the
`materialize_compressed` method as all as `GVArrayImpl`.
While this kind of merging is typically not ideal, it reduces the binary size by
~200kb while being basically free performance wise. The cost of this predictable
boolean check is expected to be negligible even if only very few indices are
materialized. Additionally, in most cases, this parameter does not even have to
be checked, because for trivial types it does not matter if the destination
array is already initialized or not when overwriting it.
It saves this much memory, because there are quite a few implementations being
generated with e.g. `VArray::from_func` and a lot of code was duplicated for
each instantiation.
This changes only the actual `(G)VArrayImpl`, but not the `VArray` and `GVArray`
API which is typically used to work with virtual arrays.
Pull Request: https://projects.blender.org/blender/blender/pulls/145144
Cleanup to avoid unnecessary copies of VArray. This
requires ref-qualifier overloads of dereference operator
of attribute reader and some move operators and constructor
overloads in the code.
Pull Request: https://projects.blender.org/blender/blender/pulls/118437
Along with the 4.1 libraries upgrade, we are bumping the clang-format
version from 8-12 to 17. This affects quite a few files.
If not already the case, you may consider pointing your IDE to the
clang-format binary bundled with the Blender precompiled libraries.
This PR implements an initial drawing tool that can already be used for testing.
While this is not fully feature complete (compared to the current grease pencil draw tool) the following is already implemented:
* Pressure support for radius and opacity.
* Material color and vertex color support.
* New active smoothing algorithm based on curve fitting.
* Simplify algorithm as a post-process step.
Some deliberate limitations include:
* The drawing plane is always the front plane. Drawing on surfaces is also not supported.
*
The current approach has not been optimized for performance yet. The goal was to have a straightforward implementation
first and then focus on performance later.
There are numerous parameters in the code that are hard-coded for now. These should be exposed at some point, potentially as user settings.
Pull Request: https://projects.blender.org/blender/blender/pulls/110093
Including <iostream> or similar headers is quite expensive, since it
also pulls in things like <locale> and so on. In many BLI headers,
iostreams are only used to implement some sort of "debug print",
or an operator<< for ostream.
Change some of the commonly used places to instead include <iosfwd>,
which is the standard way of forward-declaring iostreams related
classes, and move the actual debug-print / operator<< implementations
into .cc files.
This is not done for templated classes though (it would be possible
to provide explicit operator<< instantiations somewhere in the
source file, but that would lead to hard-to-figure-out linker error
whenever someone would add a different template type). There, where
possible, I changed from full <iostream> include to only the needed
<ostream> part.
For Span<T>, I just removed print_as_lines since it's not used by
anything. It could be moved into a .cc file using a similar approach
as above if needed.
Doing full blender build changes include counts this way:
- <iostream> 1986 -> 978
- <sstream> 2880 -> 925
It does not affect the total build time much though, mostly because
towards the end of it there's just several CPU cores finishing
compiling OpenVDB related source files.
Pull Request: https://projects.blender.org/blender/blender/pulls/111046
Listing the "Blender Foundation" as copyright holder implied the Blender
Foundation holds copyright to files which may include work from many
developers.
While keeping copyright on headers makes sense for isolated libraries,
Blender's own code may be refactored or moved between files in a way
that makes the per file copyright holders less meaningful.
Copyright references to the "Blender Foundation" have been replaced with
"Blender Authors", with the exception of `./extern/` since these this
contains libraries which are more isolated, any changed to license
headers there can be handled on a case-by-case basis.
Some directories in `./intern/` have also been excluded:
- `./intern/cycles/` it's own `AUTHORS` file is planned.
- `./intern/opensubdiv/`.
An "AUTHORS" file has been added, using the chromium projects authors
file as a template.
Design task: #110784
Ref !110783.
A lot of files were missing copyright field in the header and
the Blender Foundation contributed to them in a sense of bug
fixing and general maintenance.
This change makes it explicit that those files are at least
partially copyrighted by the Blender Foundation.
Note that this does not make it so the Blender Foundation is
the only holder of the copyright in those files, and developers
who do not have a signed contract with the foundation still
hold the copyright as well.
Another aspect of this change is using SPDX format for the
header. We already used it for the license specification,
and now we state it for the copyright as well, following the
FAQ:
https://reuse.software/faq/
Goals of this refactor:
* Reduce memory consumption of `IndexMask`. The old `IndexMask` uses an
`int64_t` for each index which is more than necessary in pretty much all
practical cases currently. Using `int32_t` might still become limiting
in the future in case we use this to index e.g. byte buffers larger than
a few gigabytes. We also don't want to template `IndexMask`, because
that would cause a split in the "ecosystem", or everything would have to
be implemented twice or templated.
* Allow for more multi-threading. The old `IndexMask` contains a single
array. This is generally good but has the problem that it is hard to fill
from multiple-threads when the final size is not known from the beginning.
This is commonly the case when e.g. converting an array of bool to an
index mask. Currently, this kind of code only runs on a single thread.
* Allow for efficient set operations like join, intersect and difference.
It should be possible to multi-thread those operations.
* It should be possible to iterate over an `IndexMask` very efficiently.
The most important part of that is to avoid all memory access when iterating
over continuous ranges. For some core nodes (e.g. math nodes), we generate
optimized code for the cases of irregular index masks and simple index ranges.
To achieve these goals, a few compromises had to made:
* Slicing of the mask (at specific indices) and random element access is
`O(log #indices)` now, but with a low constant factor. It should be possible
to split a mask into n approximately equally sized parts in `O(n)` though,
making the time per split `O(1)`.
* Using range-based for loops does not work well when iterating over a nested
data structure like the new `IndexMask`. Therefor, `foreach_*` functions with
callbacks have to be used. To avoid extra code complexity at the call site,
the `foreach_*` methods support multi-threading out of the box.
The new data structure splits an `IndexMask` into an arbitrary number of ordered
`IndexMaskSegment`. Each segment can contain at most `2^14 = 16384` indices. The
indices within a segment are stored as `int16_t`. Each segment has an additional
`int64_t` offset which allows storing arbitrary `int64_t` indices. This approach
has the main benefits that segments can be processed/constructed individually on
multiple threads without a serial bottleneck. Also it reduces the memory
requirements significantly.
For more details see comments in `BLI_index_mask.hh`.
I did a few tests to verify that the data structure generally improves
performance and does not cause regressions:
* Our field evaluation benchmarks take about as much as before. This is to be
expected because we already made sure that e.g. add node evaluation is
vectorized. The important thing here is to check that changes to the way we
iterate over the indices still allows for auto-vectorization.
* Memory usage by a mask is about 1/4 of what it was before in the average case.
That's mainly caused by the switch from `int64_t` to `int16_t` for indices.
In the worst case, the memory requirements can be larger when there are many
indices that are very far away. However, when they are far away from each other,
that indicates that there aren't many indices in total. In common cases, memory
usage can be way lower than 1/4 of before, because sub-ranges use static memory.
* For some more specific numbers I benchmarked `IndexMask::from_bools` in
`index_mask_from_selection` on 10.000.000 elements at various probabilities for
`true` at every index:
```
Probability Old New
0 4.6 ms 0.8 ms
0.001 5.1 ms 1.3 ms
0.2 8.4 ms 1.8 ms
0.5 15.3 ms 3.0 ms
0.8 20.1 ms 3.0 ms
0.999 25.1 ms 1.7 ms
1 13.5 ms 1.1 ms
```
Pull Request: https://projects.blender.org/blender/blender/pulls/104629
For example
```
OIIOOutputDriver::~OIIOOutputDriver()
{
}
```
becomes
```
OIIOOutputDriver::~OIIOOutputDriver() {}
```
Saves quite some vertical space, which is especially handy for
constructors.
Pull Request: https://projects.blender.org/blender/blender/pulls/105594
Straightforward port. I took the oportunity to remove some C vector
functions (ex: copy_v2_v2).
This makes some changes to DRWView to accomodate the alignement
requirements of the float4x4 type.
This abstraction is rarely used. It could be replaced by some more
general "query" API in the future. For now it's easier to just compare
pointers in the Set Position node where this was used.
This is possible now, because mesh positions are stored as flat `float3`
arrays (previously, they were stored as `MVert` with some other data
interleaved).
This makes `GVArrayImpl` and `VArrayImpl` more similar.
Only passing the pointer instead of the span also increases
efficiency a little bit. The downside is that a few asserts had
to be removed as well. However, in practice the same asserts
are in place at a higher level as well (in `VArrayCommon`).
This refactors how devirtualization is done in general and how
multi-functions use it.
* The old `Devirtualizer` class has been removed in favor of a simpler
solution. It is also more general in the sense that it is not coupled
with `IndexMask` and `VArray`. Instead there is a function that has
inputs which control how different types are devirtualized. The
new implementation is currently less general with regard to the number
of parameters it supports. This can be changed in the future, but
does not seem necessary now and would make the code less obvious.
* Devirtualizers for different types are now defined in their respective
headers.
* The multi-function builder works with the `GVArray` stored in `MFParams`
directly now, instead of first converting it to a `VArray<T>`. This reduces
some constant overhead, which makes the multi-function slightly
faster. This is only noticable when very few elements are processed though.
No functional changes or performance regressions are expected.
This is the conventional way of dealing with unused arguments in C++,
since it works on all compilers.
Regex find and replace: `UNUSED\((\w+)\)` -> `/*$1*/`
Previously the base virtual array implementation optimized for
common cases where data is stored as spans or single values.
However, that didn't make sense when there are already
sub-classes that handle those cases specifically. Instead,
implement the faster materialize methods for each class.
Now, if the base class is reached, it means no optimizations
for avoiding virtual function call overhead are used.
Differential Revision: https://developer.blender.org/D15549
When the curve type attribute doesn't exist, there is no reason to
create an array for it only to fill the default value, which will add
overhead to subsequent "add" operations. I added a "get_if_single"
method to virtual array to simplify this check. Also use the existing
functions for filling curve types.
Differential Revision: https://developer.blender.org/D15560
`GSpan` and spans based on virtual arrays were not default constructible
before, which made them hard to use sometimes. It's generally fine for
spans to be empty.
The main thing the keep in mind is that the type pointer in `GSpan` may
be null now. Generally, code receiving spans as input can assume that
the type is not-null, but sometimes that may be valid. The old #type() method
that returned a reference to the type still exists. It asserts when the
type is null.
This commits reduces the number of function calls through function
pointers in `blender::Any` when the stored type is trivial.
Furthermore, this implements marks some classes as trivial, which
we know are trivial but the compiler does not (the standard currently
says that any class with a virtual destructor is non-trivial). Under some
circumstances we know that final child classes are trivial though.
This allows for some optimizations.
Also see https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2018/p1077r0.html.
This reduces the amount of code, and improves performance a bit by
doing more with less virtual method calls.
Differential Revision: https://developer.blender.org/D15293
This implements the new way to attach curves to a mesh surface using
a uv map (based on the recent discussion in T95776).
The curves data block now not only stores a reference to the surface object
but also a name of a uv map on that object. Having a uv map is optional
for most operations, but it will be required later for animation (when the
curves are supposed to be deformed based on deformation of the surface).
The "Empty Hair" operator in the Add menu sets the uv map name automatically
if possible. It's possible to start working without a uv map and to attach the
curves to a uv map later on. It's also possible to reattach the curves to a new
uv map using the "Curves > Snap to Nearest Surface" operator in curves sculpt
mode.
Note, the implementation to do the reverse lookup from uv to a position on the
surface is trivial and inefficient now. A more efficient data structure will be
implemented separately soon.
Differential Revision: https://developer.blender.org/D15125
My benchmark which spend most time preparing function parameters
takes `250 ms` now, from `510 ms` before. This is mainly achieved by
doing less unnecessary work and by giving the compiler more inlined
code to optimize.
* Reserve correct vector sizes and use unchecked `append` function.
* Construct `GVArray` parameters directly in the vector, instead of
moving/copying them in the vector afterwards.
* Inline some constructors, because that allows the compiler understand
what is happening, resulting in less code.
This probably has negilible impact on the user experience currently,
because there are other bottlenecks.
Differential Revision: https://developer.blender.org/D15009
Goals:
* Better high level control over where devirtualization occurs. There is always
a trade-off between performance and compile-time/binary-size.
* Simplify using array devirtualization.
* Better performance for cases where devirtualization wasn't used before.
Many geometry nodes accept fields as inputs. Internally, that means that the
execution functions have to accept so called "virtual arrays" as inputs. Those
can be e.g. actual arrays, just single values, or lazily computed arrays.
Due to these different possible virtual arrays implementations, access to
individual elements is slower than it would be if everything was just a normal
array (access does through a virtual function call). For more complex execution
functions, this overhead does not matter, but for small functions (like a simple
addition) it very much does. The virtual function call also prevents the compiler
from doing some optimizations (e.g. loop unrolling and inserting simd instructions).
The solution is to "devirtualize" the virtual arrays for small functions where the
overhead is measurable. Essentially, the function is generated many times with
different array types as input. Then there is a run-time dispatch that calls the
best implementation. We have been doing devirtualization in e.g. math nodes
for a long time already. This patch just generalizes the concept and makes it
easier to control. It also makes it easier to investigate the different trade-offs
when it comes to devirtualization.
Nodes that we've optimized using devirtualization before didn't get a speedup.
However, a couple of nodes are using devirtualization now, that didn't before.
Those got a 2-4x speedup in common cases.
* Map Range
* Random Value
* Switch
* Combine XYZ
Differential Revision: https://developer.blender.org/D14628
Method which overrides a base class's virtual methods are expetced to
be marked with `override`. This also gives better idea to the developers
about what is going on.
This does two things:
* Introduce new `materialize_compressed` methods. Those are used
when the dst array should not have any gaps.
* Add materialize methods in various classes where they were missing
(and therefore caused overhead, because slower fallbacks had to be used).
Use a shorter/simpler license convention, stops the header taking so
much space.
Follow the SPDX license specification: https://spdx.org/licenses
- C/C++/objc/objc++
- Python
- Shell Scripts
- CMake, GNUmakefile
While most of the source tree has been included
- `./extern/` was left out.
- `./intern/cycles` & `./intern/atomic` are also excluded because they
use different header conventions.
doc/license/SPDX-license-identifiers.txt has been added to list SPDX all
used identifiers.
See P2788 for the script that automated these edits.
Reviewed By: brecht, mont29, sergey
Ref D14069
This adds `blender::is_same_any_v` which is the almost the same as
`std::is_same_v`. The difference is that it allows for checking multiple
types at the same time.
Differential Revision: https://developer.blender.org/D13673