Using ClangBuildAnalyzer on the whole Blender build, it was pointing
out that BLI_math.h is the heaviest "header hub" (i.e. non tiny file
that is included a lot).
However, there's very little (actually zero) source files in Blender
that need "all the math" (base, colors, vectors, matrices,
quaternions, intersection, interpolation, statistics, solvers and
time). A common use case is source files needing just vectors, or
just vectors & matrices, or just colors etc. Actually, 181 files
were including the whole math thing without needing it at all.
This change removes BLI_math.h completely, and instead in all the
places that need it, includes BLI_math_vector.h or BLI_math_color.h
and so on.
Change from that:
- BLI_math_color.h was included 1399 times -> now 408 (took 114.0sec
to parse -> now 36.3sec)
- BLI_simd.h 1403 -> 418 (109.7sec -> 34.9sec).
Full rebuild of Blender (Apple M1, Xcode, RelWithDebInfo) is not
affected much (342sec -> 334sec). Most of benefit would be when
someone's changing BLI_simd.h or BLI_math_color.h or similar files,
that now there's 3x fewer files result in a recompile.
Pull Request #110944
The root cause was a classic fixed-size epsilon issue. The code that
checked if an fcurve was effectively flat, and thus shouldn't be
normalized, used a fixed-size epsilon that was reasonable for values
close-ish to zero, but didn't work well for values >= 1.0.
This patch addresses the issue by introducing a new function
`ulp_diff_ff()` that robustly computes the number of floating point
steps between two floats, and using that to ensure that a minimum
number of representable floats exist between the min/max values
of the curve. This approach scales appropriately up and down to
both huge and tiny values.
This patch also updates the existing `compare_ff_relative()` function
to use the new robust ulps code for the ulps-based part of its
comparison, resolving an issue documented in its unit tests where
it behaved poorly for values close to zero.
Pull Request: https://projects.blender.org/blender/blender/pulls/110796
This formats code that is disabled using `#if 0`. Formatting was achieved
by temporarily changing `#if 0` to `#if 1 /*something*/`, then formatting,
and then changing it back to `#if 0`.
There's quite a few libraries that depend on dna_type_offsets.h
but had gotten to it by just adding the folder that contains it to
their includes INC section without declaring a dependency to
bf_dna in the LIB section.
which occasionally lead to the lib building before bf_dna and the
header being missing, while this generally gets fixed in CMake by
adding bf_dna to the LIB section of the lib, however until last
week all libraries in the LIB section were linked as INTERFACE so
adding it in there did not resolve the build issue.
To make things still build, we sprinkled add_dependencies wherever
we needed it to force a build order.
This diff :
Declares public include folders for the bf_dna target so there's
no more fudging the INC section required to get to them.
Removes all dna related paths from the INC section for all
libraries.
Adds an alias target bf:dna to signify it has been updated to
modern cmake
Declares a dependency on bf::dna for all libraries that require it
Removes (almost) all calls to add_dependencies for bf_dna
Future work:
Because of the manual dependency management that was done, there is
now some "clutter" with libs depending on bf_dna that realistically
don't. Example bf_intern_opencolorio itself has no dependency on
bf_dna at all, doesn't need it, doesn't use it. However the
dna include folder had been added to it in the past since bf_blenlib
uses dna headers in some of its public headers and
bf_intern_opencolorio does use those blenlib headers.
Given bf_blenlib now correctly declares the dependency on bf_dna
as public bf_intern_opencolorio will get the dna header directory
automatically from CMake, hence some cleanup could be done for
bf_intern_opencolorio
Because 99% of the changes in this diff have been automated, this diff
does not seek to address these issues as there is no easy way to
determine why a certain dependency is in place. A developer will have
to make a pass a this at some later point in time. As I'd rather not
mix automated and manual labour.
There are a few libraries that could not be automatically processed
(ie bf_blendthumb) that also will need this manual look-over.
Pull Request: https://projects.blender.org/blender/blender/pulls/109835
This introduces an alias target `bf::intern::atomic` for
`bf_intern_atomic`. This has the following benefits:
- Any target name with `::` in it will be recognized as an actual
target by cmake, rather than a library name it may not know about.
and will be validated by cmake to exist. Which means if you make
a typo in the LIB section, CMake will error out telling you it
doesn't know about this specific target rather than passing it on
to the build system, where you'll either get build or linker errors
because of said typo.
- Given there is quite a cleanup still to do in the build system,
it won't always be obvious which targets have been updated to
modern targets and which still need to be done. Having a namespaced
target name is a good indicator there.
Pull Request: https://projects.blender.org/blender/blender/pulls/109784
String search & replace is a higher level function (unlike BLI_string.h)
which handlers lower level replacements for printing and string copying.
Also use BLI_string_* prefix (matching other utilities).
This makes it possible to use BLI_string in Blender's internal utilities
without depending on DynStr, MemArena... etc.
Is not visible on any of the officially platforms, as everywhere
SSE2 is available (on Apple Silicon via sse2neon).
Only got noticed by some intermittent issue during development
which made BLI_HAVE_SSE2 unaccessible.
Seems that transpose was done a bit wrong. Not sure if worth trying
to fold the equation into C++ types, as that requires extra memory
transfers for transpose. Opted for a more naive folding, which
avoids extra copies.
Added a regression test for it, verified against numpy, the BLI
SSE2 implementation.
This is the minimal change required to start using modern CMake in the
blender build system. This change is designed to allow small
incremental changes to the build system rather than doing it in one
big bang which would be unmaintainable (for me)
The biggest functional change is, previously all libraries in the
`LIB` section of a `blender_add_lib` call had the `INTERFACE` scope,
which is rarely, if ever the correct scope. This diff changes this to
`PRIVATE`
Concrete implications of this diff :
The `LIB`, `INC` and `INC_SYS` sections of an `blender_add_lib` call
now allow scoping keywords (`PUBLIC`, `PRIVATE,` `INTERFACE`) to
declare the scope of the dependency.
Right now the only library using any modern cmake is
`bf_intern_atomic` which is an header only interface library that will
just advertise its include directories.
This allows us to clean up any `CMakeLists.txt` that adds
`../../../intern/atomic` to its `INC` section to remove it in `INC` by
adding a `PRIVATE bf_intern_atomic` to the `LIB` section.
Pull Request: https://projects.blender.org/blender/blender/pulls/107858
- When string join truncates, break out of the outer loop too.
- Use BLI_string_len_array when calculating the array size for
BLI_string_join_array_by_sep_charN.
- Add tests.
This includes square root and reciprocal, and their safe versions.
For the reciprocal use name rcp, which matches Cycles and allows
to implement the same function for per-element operation on matrices.
Pull Request: https://projects.blender.org/blender/blender/pulls/108705
IndexMask::complement() is often used in geometry processing
algorithms when a selection needs to be inverted, mostly just in
curves code so far.
Instead of reusing `from_predicate` and lookup in the source mask,
scan the mask once, inserting segments between the original indices.
Theoretically this improves the performance from O(N*log(N)) to O(N).
But with the small constant offset of the former, the improvement is
generally just 3-4 times faster. However in some cases like empty
and full masks, the new code takes constant time.

Pull Request: https://projects.blender.org/blender/blender/pulls/108331
A lot of files were missing copyright field in the header and
the Blender Foundation contributed to them in a sense of bug
fixing and general maintenance.
This change makes it explicit that those files are at least
partially copyrighted by the Blender Foundation.
Note that this does not make it so the Blender Foundation is
the only holder of the copyright in those files, and developers
who do not have a signed contract with the foundation still
hold the copyright as well.
Another aspect of this change is using SPDX format for the
header. We already used it for the license specification,
and now we state it for the copyright as well, following the
FAQ:
https://reuse.software/faq/
Goals of this refactor:
* Reduce memory consumption of `IndexMask`. The old `IndexMask` uses an
`int64_t` for each index which is more than necessary in pretty much all
practical cases currently. Using `int32_t` might still become limiting
in the future in case we use this to index e.g. byte buffers larger than
a few gigabytes. We also don't want to template `IndexMask`, because
that would cause a split in the "ecosystem", or everything would have to
be implemented twice or templated.
* Allow for more multi-threading. The old `IndexMask` contains a single
array. This is generally good but has the problem that it is hard to fill
from multiple-threads when the final size is not known from the beginning.
This is commonly the case when e.g. converting an array of bool to an
index mask. Currently, this kind of code only runs on a single thread.
* Allow for efficient set operations like join, intersect and difference.
It should be possible to multi-thread those operations.
* It should be possible to iterate over an `IndexMask` very efficiently.
The most important part of that is to avoid all memory access when iterating
over continuous ranges. For some core nodes (e.g. math nodes), we generate
optimized code for the cases of irregular index masks and simple index ranges.
To achieve these goals, a few compromises had to made:
* Slicing of the mask (at specific indices) and random element access is
`O(log #indices)` now, but with a low constant factor. It should be possible
to split a mask into n approximately equally sized parts in `O(n)` though,
making the time per split `O(1)`.
* Using range-based for loops does not work well when iterating over a nested
data structure like the new `IndexMask`. Therefor, `foreach_*` functions with
callbacks have to be used. To avoid extra code complexity at the call site,
the `foreach_*` methods support multi-threading out of the box.
The new data structure splits an `IndexMask` into an arbitrary number of ordered
`IndexMaskSegment`. Each segment can contain at most `2^14 = 16384` indices. The
indices within a segment are stored as `int16_t`. Each segment has an additional
`int64_t` offset which allows storing arbitrary `int64_t` indices. This approach
has the main benefits that segments can be processed/constructed individually on
multiple threads without a serial bottleneck. Also it reduces the memory
requirements significantly.
For more details see comments in `BLI_index_mask.hh`.
I did a few tests to verify that the data structure generally improves
performance and does not cause regressions:
* Our field evaluation benchmarks take about as much as before. This is to be
expected because we already made sure that e.g. add node evaluation is
vectorized. The important thing here is to check that changes to the way we
iterate over the indices still allows for auto-vectorization.
* Memory usage by a mask is about 1/4 of what it was before in the average case.
That's mainly caused by the switch from `int64_t` to `int16_t` for indices.
In the worst case, the memory requirements can be larger when there are many
indices that are very far away. However, when they are far away from each other,
that indicates that there aren't many indices in total. In common cases, memory
usage can be way lower than 1/4 of before, because sub-ranges use static memory.
* For some more specific numbers I benchmarked `IndexMask::from_bools` in
`index_mask_from_selection` on 10.000.000 elements at various probabilities for
`true` at every index:
```
Probability Old New
0 4.6 ms 0.8 ms
0.001 5.1 ms 1.3 ms
0.2 8.4 ms 1.8 ms
0.5 15.3 ms 3.0 ms
0.8 20.1 ms 3.0 ms
0.999 25.1 ms 1.7 ms
1 13.5 ms 1.1 ms
```
Pull Request: https://projects.blender.org/blender/blender/pulls/104629
Covers the macro ARRAY_SIZE() and STRNCPY.
The problem this change is aimed to solve it to provide cross-platform
compiler-independent safe way pf ensuring that the functions are used
correctly.
The type safety was only ensured for GCC and only for C. The C++
language and Clang compiler would not have detected issues of passing
bare pointer to neither of those macros.
Now the STRNCPY() will only accept a bounded array as the destination
argument, on any compiler.
The ARRAY_SIZE as well, but there are a bit more complications to it
in terms of transparency of the change.
In one place the ARRAY_SIZE was used on float3 type. This worked in the
old code because the type implements subscript operator, and the type
consists of 3 floats. One would argue this is somewhat hidden/implicit
behavior, which better be avoided. So an in-lined value of 3 is used now
there.
Another place is the ARRAY_SIZE used to define a bounded array of the
size which matches bounded array which is a member of a struct. While
the ARRAY_SIZE provides proper size in this case, the compiler does not
believe that the value is known at compile time and errors out with a
message that construction of variable-size arrays is not supported.
Solved by converting the field to std::array<> and adding dedicated
utility to get size of std::array at compile time. There might be a
better way of achieving the same result, or maybe the approach is
fine and just need to find a better place for such utility.
Surely, more macro from the BLI_string.h can be covered with the C++
inlined functions, but need to start somewhere.
There are also quite some changes to ensure the C linkage is not
enforced by code which includes the headers.
Pull Request: https://projects.blender.org/blender/blender/pulls/108041
Add modulo function `mod_f_positive(f, n)` that returns a positive result,
regardless of the sign of `f`.
For example, `mod_f_positive(-0.1, 1.0)` returns `0.9`, whereas the
standard `fmodf()` function would return `-0.1`.
This is useful for rewrapping values to a specific interval.