Previously, passing a falsy `std::function` (i.e. it is empty) or a null
function pointer would result in a `FunctionRef` which is thruthy and thus
appeared callable. Now, when the passed in callable is falsy, the resulting
`FunctionRef` will be too.
Pull Request: https://projects.blender.org/blender/blender/pulls/128823
The paths `C:` (on WIN32) and `/` on other systems was detecting as
relative. This meant `BLI_path_abs_from_cwd` would make both paths
CWD relative. Also remove duplicate call to BLI_path_is_unc.
In addition to float<->half functions to convert one number (#127708), add
float_to_half_array and half_to_float_array functions:
- On x64, this uses SSE2 4-wide implementation to do the conversion
(2x faster half->float, 4x faster float->half compared to scalar),
- There's also an AVX2 codepath that uses CPU hardware F16C instructions
(8-wide), to be used when/if blender codebase will start to be built
for AVX2 (today it is not yet).
- On arm64, this uses NEON VCVT instructions to do the conversion.
Use these functions in Vulkan buffer/texture conversion code. Time taken to
convert float->half texture while viewing EXR file in image space (22M
numbers to convert): 39.7ms -> 10.1ms (would be 6.9ms if building for AVX2)
Pull Request: https://projects.blender.org/blender/blender/pulls/127838
Blender codebase had two ways to convert half (FP16) to float (FP32):
- BLI_math_bits.h half_to_float. Out of 64k possible half values, it converts
4096 of them incorrectly. Mostly denormals and NaNs, which is perhaps not too
relevant. But more importantly, it converts half zero to float 0.000030517578
which does not sound ideal.
- Functions in Vulkan vk_data_conversion.hh. This one converts 2046 possible
half values incorrectly.
Function to convert float (FP32) to half (FP16) was in Vulkan
vk_data_conversion.hh, and it got a bunch of possible inputs wrong. I guess it
did not do proper "round to nearest even" that CPU/GPU hardware does.
This PR:
- Adds BLI_math_half.hh with float_to_half and half_to_float functions.
- Documentation and test coverage.
- When compiling on ARM NEON, use hardware VCVT instructions.
- Removes the incorrect half_to_float from BLI_math_bits.h and replaces single
usage of it in View3D color picking to use the new function.
- Changes Vulkan FP32<->FP16 conversion code to use the new functions, to fix
correctness issues (makes eevee_next_bsdf_vulkan test pass). This makes it
faster too.
Pull Request: https://projects.blender.org/blender/blender/pulls/127708
I've done this a few times and would have benefited from a utility
function for it, apparently it's done in a few more places too. The
utilities aren't multithreaded for now, it doesn't seem important
and often multithreading happens at a different level of the call
stack anyway.
Pull Request: https://projects.blender.org/blender/blender/pulls/127517
This patch optimizes `IndexMask::from_bits` by making use of the fact that many
bits can be processed at once and one does not have to look at every bit
individual in many cases. Bits are stored as array of `BitInt` (aka `uint64_t`).
So we can process at least 64 bits at a time. On some platforms we can also make
use of SIMD and process up to 128 bits at once. This can significantly improve
performance if all bits are set/unset.
As a byproduct, this patch also optimizes `IndexMask::from_bools` which is now
implemented in terms of `IndexMask::from_bits`. The conversion from bools to
bits has been optimized significantly too by using SIMD intrinsics.
Pull Request: https://projects.blender.org/blender/blender/pulls/126888
This changes how the lazy-loading and unloading of volume grids works. With that
it should also fix#124164.
The cache is now moved to a deeper and more global level. This allows reloadable
volume grids to be unloaded automatically when a memory limit is reached. The
previous system for automatically unloading grids only worked in fairly specific
cases and also did not work all that well with caching (parts of) volume
sequences.
At its core, this patch adds a general cache system in `BLI_memory_cache.hh`. It
has a simple interface of the form `get(key, compute_if_not_cached_fn) ->
value`. To avoid growing the cache indefinitly, it uses the new
`BLI_memory_counter.hh` API to detect when the cache size limit is reached. In
this case it can automatically free some cached values. Currently, this uses an
LRU system, where the items that have not been used in a while are removed
first. Other heuristics can be implemented too, but especially for caches for
loading files from disk this works well already.
The new memory cache is internally used by `volume_grid_file_cache.cc` for
loading individual volume grids and their simplified variants. It could
potentially also be used to cache which grids are stored in a file.
Additionally, it can potentially also be used as caching layer in more places
like loading bakes or in import geometry nodes. It's not clear yet whether this
will need an extension to the API which currently is fairly minimal.
To allow different systems to use the same memory cache, it has to support
arbitrary identifiers for the cached data. Therefore, this patch also introduces
`GenericKey`, which is an abstract base class for any kind of key that is
comparable, hashable and copyable.
The implementation of the cache currently relies on a new `ConcurrentMap`
data-structure which is a thin wrapper around `tbb::concurrent_hash_map` with a
fallback implementation for when `tbb` is not available. This data structure
allows concurrent reads and writes to the cache. Note that adding data to the
cache is still serialized because of the memory counting.
The size of the cache depends on the `memory_cache_limit` property that's
already shown in the user preferences. While it has a generic name, it's
currently only used by the VSE which is currently using the `MEM_CacheLimiter`
API which has a similar purpose but seems to be less automatic, thread-safe and
also has no idea of implicit-sharing. It also seems to be designed in a way
where one is expected to create multiple "cache limiters" each of which has its
own limit. Longer term, we should probably strive towards unifying these
systems, which seems feasible but a bit out of scope right now. While it's not
ideal that these cache systems don't use a shared memory limit, it's essentially
what we already have for all cache systems in Blender, so it's nothing new.
Some tests for lazy-loading had to be removed because this behavior is more
implicit now and is not as easily observable from the outside.
Pull Request: https://projects.blender.org/blender/blender/pulls/126411
This introduces `MemoryCount` which can be used across multiple
`MemoryCounter`. Generally, `MemoryCount` is expected to live
longer (e.g. over the entire life-time of a cache), while `MemoryCounter`
is expected to only exists when actually counting the memory.
We often have the situation where it would be good if we could easily estimate
the memory usage of some value (e.g. a mesh, or volume). Examples of where we
ran into this in the past:
* Undo step size.
* Caching of volume grids.
* Caching of loaded geometries for import geometry nodes.
Generally, most caching systems would benefit from the ability to know how much
memory they currently use to make better decisions about which data to free and
when. The goal of this patch is to introduce a simple general API to count the
memory usage that is independent of any specific caching system. I'm doing this
to "fix" the chicken and egg problem that caches need to know the memory usage,
but we don't really need to count the memory usage without using it for caches.
Implementing caching and memory counting at the same time make both harder than
implementing them one after another.
The main difficulty with counting memory usage is that some memory may be shared
using implicit sharing. We want to avoid double counting such memory. How
exactly shared memory is treated depends a bit on the use case, so no specific
assumptions are made about that in the API. The gathered memory usage is not
expected to be exact. It's expected to be a decent approximation. It's neither a
lower nor an upper bound unless specified by some specific type. Cache systems
generally build on top of heuristics to decide when to free what anyway.
There are two sides to this API:
1. Get the amount of memory used by one or more values. This side is used by
caching systems and/or systems that want to present the used memory to the
user.
2. Tell the caller how much memory is used. This side is used by all kinds of
types that can report their memory usage such as meshes.
```cpp
/* Get how much memory is used by two meshes together. */
MemoryCounter memory;
mesh_a->count_memory(memory);
mesh_b->count_memory(memory);
int64_t bytes_used = memory.counted_bytes();
/* Tell the caller how much memory is used. */
void Mesh::count_memory(blender::MemoryCounter &memory) const
{
memory.add_shared(this->runtime->face_offsets_sharing_info,
this->face_offsets().size_in_bytes());
/* Forward memory counting to lower level types. This should be fairly common. */
CustomData_count_memory(this->vert_data, this->verts_num, memory);
}
void CustomData_count_memory(const CustomData &data,
const int totelem,
blender::MemoryCounter &memory)
{
for (const CustomDataLayer &layer : Span{data.layers, data.totlayer}) {
memory.add_shared(layer.sharing_info, [&](blender::MemoryCounter &shared_memory) {
/* Not quite correct for all types, but this is only a rough approximation anyway. */
const int64_t elem_size = CustomData_get_elem_size(&layer);
shared_memory.add(totelem * elem_size);
});
}
}
```
Pull Request: https://projects.blender.org/blender/blender/pulls/126295
Covers OS detection, CPU architecture, bitness, and compiler family.
The goal of this change is to provide easier to use and remember checks
for these things. For example, with this change code like
```
#ifdef _WIN32
...
#elif defined(__APPLE__) || defined(__FreeBSD__) || defined(__NetBSD__) || \
defined(__OpenBSD__)
..
#endif
```
becomes
```
#if OS_WIN
...
#elif OS_MAC || OS_BSD
...
#endif
```
The code is originally based on build_config.h from Chromium, which was
first modified for Libmv, then to some other projects, and now is
adopted for Blender itself.
The checks are relying on the -Wundef to provide hint of cases when an
include is missing prior to the platform-specific checks.
This change only introduces possibility of cleaner checks and does not
start actual refactor.
Pull Request: https://projects.blender.org/blender/blender/pulls/118908
Previously, 4e9e44ad made changes to allow GSpan and GMutableSpan to not
have a type when they are empty. This mirrors the same change in the
conversion from GArray to both span types and when converting to an
actual typed Span<> or MutableSpan<> via typed().
Fixes#125013
Pull Request: https://projects.blender.org/blender/blender/pulls/125018
Reference identifiers instead of "above" in code comments as these
tends to become outdated. Even when declarations are removed it's at
least clear that the reference no longer exists instead of referring to
whatever is currently above the declaration.
It's also straightforward to search history for a removed identifier.
Corrected 4 cases of references to things that were no longer above
the doc-strings. Noticed other references which look to be incorrect
but need further investigation.
Fix of conversion identity quaternion to axis angle. Basically,
if the length of the imaginary part-vector is zero, it is
incorrect to normalize it. Simple identity should be returned.
Pull Request: https://projects.blender.org/blender/blender/pulls/119762
All official Blender platforms use the SIMD code path, with pow() approximations
for 2.4 and 1/2.4 powers. The non-SIMD code path would only be used on other
platforms like PowerPC etc. Make that fallback scalar code path use the same
math approximation for consistency. This is part of #121312.
This also makes srgb_to_linearrgb_v3_v3 and linearrgb_to_srgb_v3_v3 functions
non-inlined. They are 50-100 CPU instructions, and thus hardly good candidates
for forced inlining into each and every call site.
Also _bli_math_blend_sse now uses actual SSE4 blend instruction instead of doing
it in a roundabout way.
Pull Request: https://projects.blender.org/blender/blender/pulls/121368
Factors out utility function "does the path contain any hidden file/folder
components?" function out of innards of UI code (edit_file,
is_hidden_dot_filename) into a BLI function BLI_path_has_hidden_component
and then:
- Adds unit test coverage to it, which uncovered some inconsistencies
- Fix the behavior inconsistencies:
- A path component that is just a dot (.), was not considered hidden.
Unless it was the first folder component (now this is fixed).
- A path component that ended in a tilde (~) was considered hidden.
Unless it was the first folder component (now this is fixed).
- Speedup the function by not doing several recursive scans over the
string; instead all the logic is done in a single string scan.
Synthetic HasHiddenComponents_Performance test: 6.0s -> 1.1s for 50M calls
on Mac M1 Max.
More real world test (setup as in #120494): out of whole build_catalog_tree
time, the time taken by BLI_path_has_hidden_component drops from 37% down
to 28%
Pull Request: https://projects.blender.org/blender/blender/pulls/120541
Previously, this conversion would often result in invalid quaternions or
hit an assert in `normalized_to_quat_fast`. It's not super nice to convert
to euler as an intermediate step performance wise, but it seems to be
the easiest solution for now. Extracting rotations from matrices should
not be done all that often anyway.
Pull Request: https://projects.blender.org/blender/blender/pulls/120568
The `IndexMask` class already had a static function `from_union`.
This adds two new functions `from_difference` and `from_intersection`
as well as tests for each of them.
It also uses `from_intersection` in two grease pencil utility functions.
Pull Request: https://projects.blender.org/blender/blender/pulls/120419
Previously the hulls edges were simply iterated over causing the
rotating calipers to step over points 4x as many times as is needed.
Avoid this by adding angle stepping logic that maps all angles to a
single quadrant, reducing the checks needed to advance the calipers
to each new angle. This gives ~1.4x speedup to AABB fitting logic.
Also add a test for octagon shapes to ensure axis aligned edges work
as expected.
Begin testing the edge edge between indices [0, 1] indices,
instead of [last, 0]. This only ever makes a difference as a tie breaker,
where [0, 1] is now prioritized.
This minor change simplifies further optimizations.
This is intended to be used in the new exact mesh boolean algorithm by @howardt.
The new `BLI_fixed_width_int.hh` header provides types like `Int256` and
`UInt256` which are like e.g. `uint64_t` but with higher precision. The code
supports many different integer sizes.
The following operations are supported:
* Addition
* Subtraction
* Multiplication
* Comparisons
* Negation
* Conversion to and from other number types
* Conversion to and from string (based on `GMP`)
Division is not implemented. It could be implemented, but it's more complex and
is not required for the new mesh boolean algorithm.
Some alternatives to having a custom implementation have been discussed in
https://devtalk.blender.org/t/fixed-length-multiprecision-arithmetic/29189/.
Generally, the implementation is fairly straight forward. The main complexity is
the addition/multiplication algorithm which isn't too complicated. It's nice to
have control over this part as it allows us to optimize the code more if
necessary. Also, from what I understand, we might be able to benefit from some
special cases like multiplying a large integer with a smaller one.
I tried some different ways to optimize this already, but so far the normal
compiler optimization turned out to work best. Not sure if the same is true on
windows though, as it doesn't have native support for an `int128` which helps
the compiler understand what I'm doing. Alternatives I tried so far are using
intrinsics directly (mainly `_addcarry_u64` and similar), writing inline
assembly manually and copying the assembly output from the compiler. I assume
the assembly implementation didn't help for me because it prohibited other
compiler optimizations.
Pull Request: https://projects.blender.org/blender/blender/pulls/119528
This patch adds clamped boundaries variants of the nearest interpolation
functions in the BLI module. The naming convention used by the bilinear
functions were followed.
Needed by #119414.
Pull Request: https://projects.blender.org/blender/blender/pulls/119732
The `IndexMask` data structure was designed to allow us to implement set
operations like `union`, `intersection` and `difference` efficiently
(2cfcb8b0b8). This patch adds an evaluator for
arbitrary expressions involving the mentioned operations. The evaluator makes
use of the design of the `IndexMask` data structure to be quite efficient.
In some common cases, the evaluator runs in constant time. So it's very fast
even if the mask contains many millions of indices. If possible the evaluator
works on entire segments at once instead of looking at the individual indices.
This results in a very low constant factor even if the evaluation time is
linear. If the evaluator has to look at the individual indices to be able to
perform the operation, it can make use of multi-threading.
The evaluation consists of the following steps:
1. A coarse evaluation that looks at entire segments at once.
2. All segments that couldn't be fully evaluated by the coarse evaluation are
evaluated exactly by looking at the actual indices. There are two evaluators
for this case. One that is based on `std::set_union` etc. The other one first
converts the index masks to bit spans, then does bit operations to evaluate
the expression, and then converts the bits back into indices. Depending on
the expression, one or the other can be more efficient.
3. Construct an index mask from the evaluated segments.
Showing the performance of the evaluator is kind of difficult because it highly
depends on the input data. Comparing the performance to something that does not
short-circuit when there are full ranges is meaningless, because one can
construct an example where the new evaluator is arbitrarily faster. I'm still
working on a case where performance can be compared to e.g. using
`std::set_union`. This comparison is only fair when the input data when
constructing a case where the new evaluator can't short-circuit.
One of the main remaining bottlenecks are the calls to `slice_content` on large
index masks. I think the impact of those can still be reduced.
We are not using this evaluator much yet, except through `IndexMask::complement`
calls. I intend to use it when I get to refactoring the field evaluator for
geometry nodes to optimize the evaluation of selections.
Pull Request: https://projects.blender.org/blender/blender/pulls/117805
This adds a new special purpose container data structure that can be
used to gather many elements into many (potentially small) lists efficiently.
I originally worked on this data structure because I might want to use it
in #118772. However, also it's useful in the geometry nodes logger already.
I'm measuring a 10-20% speed improvement in my many-math-nodes file
when I enable logging for all sockets (not just the ones that are currently visible).
Pull Request: https://projects.blender.org/blender/blender/pulls/118774
Unless you're very familiar with `IndexRange`, it's often hard to know what
e.g. `IndexRange(10, 15)` means. Without more context, one could think
that it means `10-14`, `10-15` or `10-24`. This patch adds named constructors
to `IndexRange` to make the behavior more obvious when writing and when
reading the code. With those one can use `IndexRange::from_begin_end(10, 15)`,
`IndexRange::from_begin_end_inclusive(10, 15)` or `IndexRange::from_begin_size(10, 15)`
respectively. While being a bit more verbose, the explicitness makes code easier to
understand and also allows abstracting away some common index computations.
The old unnamed constructor that takes a begin and size is not removed by this patch,
as that would make the patch significantly bigger. I think it's reasonable to generally
use the named constructors going forward and to change the existing usages of the
old constructor over time.
Pull Request: https://projects.blender.org/blender/blender/pulls/118606
Add some comments that clarify that `StringRefNull` can be compared with
other `StringRefNull` and with `StringRef` instances.
Add unit tests that cover these cases.
Pull Request: https://projects.blender.org/blender/blender/pulls/118515