This adds three new functions to find the first 0 or 1 bit in an arbitrarily
long bit span:
```cpp
blender::bits::find_first_0_index(BitSpan) -> std::optional<int64_t>
blender::bits::find_first_1_index(BitSpan) -> std::optional<int64_t>
blender::bits::find_first_1_index_expr(Expr, BitSpans...) -> std::optional<int64_t>
```
The two first ones are implemented in terms of the third. The `*_expr` variant
allows e.g. finding the first set bit when ORing two bit spans together without
computing the entire intermediate result first. Or it can be used to find the
first index where two bit spans are different.
Pull Request: https://projects.blender.org/blender/blender/pulls/134923
This reimplements the CSV parser used by the (still experimental) Import CSV
node.
Reliability is improved by:
* Properly handling quoted fields.
* Unit tests.
* Generalizing the parser to be able to handle customized delimiter, quote and
escape characters (those are not exposed in the node yet though).
* More accurate detection of column types by actually taking all values of a
column into account instead of only the first row.
Performance is improved by designing the parser in a way that supports
multi-threaded parsing. I'm measuring about 5x performance improvement which
mainly comes from multi-threading. Some files I wanted to use for benchmarking
didn't load in the version that's in `main` but do load fine with this new
version.
The implementation is now split up into two parts:
1. A general CSV parser in `blenlib` that manages splitting a buffer into
records and their fields.
2. Application specific parsing of fields into e.g. floats and integers which
remains in `io/csv/importer`.
This separation simplifies unit testing and makes the core code more reusable.
Pull Request: https://projects.blender.org/blender/blender/pulls/134715
Previously, there was a `StringRef.copy` method which would copy the string into
the given buffer. However, it was not defined for the case when the buffer was
too small. It moved the responsibility of making sure the buffer is large enough
to the caller.
Unfortunately, in practice that easily hides bugs in builds without asserts
which don't come up in testing much. Now, the method is replaced with
`StringRef.copy_utf8_truncated` which has much more well defined semantics and
also makes sure that the string remains valid utf-8.
This also renames `unsafe_copy` to `copy_unsafe` to make the naming more similar
to `copy_utf8_truncated`.
Pull Request: https://projects.blender.org/blender/blender/pulls/133677
Before, it was only possible to apply modifiers through a multires modifier if
they were deform-only. The Geometry Nodes modifier is of course not deform-only.
However, often one can build a node setup, that only deforms and does nothing
else.
To make it possible to apply the Geometry Nodes modifier in such cases the
following things had to be done:
* Update `BKE_modifier_deform_verts` to work with modifiers that implement
`modify_geometry_set` instead of `deform_verts`.
* Add error handling for the case when `modify_geometry_set` does more than just
deformation.
* Allow the Geometry Nodes modifier to be applied through a multi-res modifier.
Two new utility types (`ArrayState` and `MeshTopologyState`) have been
introduced to allow for efficient and accurate checking whether the topology has
been modified. In common cases, they can detect that the topology has not been
changed in constant time, but they fall back to linear time checking if it's not
immediately obvious that the topology has not been changed.
This works with the example files from #98559 and #97603.
Pull Request: https://projects.blender.org/blender/blender/pulls/131904
This can be useful when going from a selection of curves to the corresponding
selection of points.
The simple implementation given here should work quite well in the majority of
cases. There is still optimization potential for some cases involving masks with
many gaps. The implementation is also single threaded for now. Using
multi-threading is possible, but should ideally be done in some way to still
lets us exploit the fact that the index ranges are already in sorted order.
Pull Request: https://projects.blender.org/blender/blender/pulls/133323
When using clangd or running clang-tidy on headers there are
currently many errors. These are noisy in IDEs, make auto fixes
impossible, and break features like code completion, refactoring
and navigation.
This makes source/blender headers work by themselves, which is
generally the goal anyway. But #includes and forward declarations
were often incomplete.
* Add #includes and forward declarations
* Add IWYU pragma: export in a few places
* Remove some unused #includes (but there are many more)
* Tweak ShaderCreateInfo macros to work better with clangd
Some types of headers still have errors, these could be fixed or
worked around with more investigation. Mostly preprocessor
template headers like NOD_static_types.h.
Note that that disabling WITH_UNITY_BUILD is required for clangd to
work properly, otherwise compile_commands.json does not contain
the information for the relevant source files.
For more details see the developer docs:
https://developer.blender.org/docs/handbook/tooling/clangd/
Pull Request: https://projects.blender.org/blender/blender/pulls/132608
Before, it was only possible to clear the entire cache at once or to rely on
automatic clearing when it gets full. This patch adds the ability to remove
cached data based on a predicate function.
This is useful for #124369 for partially invalidating the cache for some files.
Pull Request: https://projects.blender.org/blender/blender/pulls/132605
This adds a new `Map.lookup_try` method that returns a `std::optional<Value>`
for a given key. If the key is not in the map, `std::nullopt` is returned. If it
is in the map, then a copy of the value is returned. Note that the copy is
necessary because `std::optional` can't contain reference types.
This method helps a lot in #132219 to reduce boilerplate.
Pull Request: https://projects.blender.org/blender/blender/pulls/132278
This enables moving elements form one vector to another.
Usually this is being doing by extending a vector with the content
from the secondary vector and then clearing the secondary vector.
However sometimes this being performed to transfer ownership of managed elements,
if elements are copied from the secondary vector, but not cleared, this could
lead to 2 vectors to share ownership of objects.
Pull Request: https://projects.blender.org/blender/blender/pulls/131560
Add `bool MutableSpan<T>::contains(const T &value)` function, which is an
exact copy of the function from the `Span<T>` class. This makes it possible
to call `.contains()` directly on the `MutableSpan`, instead of having to
call `.as_span()` first.
Note that the `contains_ptr()` function was already copied from `Span` to
`MutableSpan` before.
No functional changes to the affected code, just the removal of the now
no longer necessary `.to_span()` call.
Pull Request: https://projects.blender.org/blender/blender/pulls/131149
Remove assert statement from make_update.py, making it ready for any
architecture.
Add riscv 32, 64 and 128 bit cpu architecture with little/big endian
to Blender's BLI_build_config.h and Libmv's build_config.h
Tested (to compile) on riscv64 little endian machine.
Pull Request: https://projects.blender.org/blender/blender/pulls/130920
Implements #130836:
interpolate_*_wrapmode_fl take InterpWrapMode wrap_u and wrap_v arguments.
U and V coordinate axes can have different wrap modes: clamp/extend,
border/zero, wrap/repeat.
Note that this removes inconsistency where cubic interpolation was
returning zero for samples completely outside the image, but all other
functions were not, and the behavior was not matching the function
documentation either.
Use the new functions in the new compositor CPU backend.
Possible performance impact for other places (e.g. VSE): measured on
4K resolution, transformed (scaled and rotated) 4K EXR image:
- Nearest filter: no change,
- Bilinear filter: no change,
- Cubic BSpline filter: slight performance decrease, IMB_transform
19.5 -> 20.7 ms (Ryzen 5950X, VS2022). Feels acceptable.
Pull Request: https://projects.blender.org/blender/blender/pulls/130893
This adds support for constructing a `Map` directly from key-value-pairs.
If the same key exists multiple times, only the first one is taken into account.
Note: the keys and values are copied into the map (not moved). This is because
an `std::initializer_list` is read-only. If move-behavior is needed, it is better to
use the existing `map.add*` functions.
```cpp
Map<int, std::string> map = {{1, "where"}, {3, "when"}, {5, "why"}};
````
Pull Request: https://projects.blender.org/blender/blender/pulls/130470
Cleanup to simplify code by using common terms like _first_ and _last_ in context
of predicate applying in a range just like we do for spans and range containers.
Pull Request: https://projects.blender.org/blender/blender/pulls/130380
This extends the `Vector` API to support transfering ownership of a memory
buffer to and from a `Vector`. This reduces the need for unnecessary copies in
some cases which converting between data-structures. A new
`VectorSet::extract_vector` method is added that takes O(1) time. Previously,
this was only possible in O(n) time by copying the entire array.
The `Vector::release` method can be used to e.g. build a vector with the C++
container, but then extract it for use in DNA data.
Pull Request: https://projects.blender.org/blender/blender/pulls/129736
The length checking wasn't accounting for null bytes within multi-byte
sequences and could step over the null bytes.
For BLI_strlen_utf8 this could result in an out of bounds read.
In practice most UTF8 data is validated so the extra checks
are mainly to prevent errors on invalid or corrupt UTF8 text.
Previously, passing a falsy `std::function` (i.e. it is empty) or a null
function pointer would result in a `FunctionRef` which is thruthy and thus
appeared callable. Now, when the passed in callable is falsy, the resulting
`FunctionRef` will be too.
Pull Request: https://projects.blender.org/blender/blender/pulls/128823
The paths `C:` (on WIN32) and `/` on other systems was detecting as
relative. This meant `BLI_path_abs_from_cwd` would make both paths
CWD relative. Also remove duplicate call to BLI_path_is_unc.
In addition to float<->half functions to convert one number (#127708), add
float_to_half_array and half_to_float_array functions:
- On x64, this uses SSE2 4-wide implementation to do the conversion
(2x faster half->float, 4x faster float->half compared to scalar),
- There's also an AVX2 codepath that uses CPU hardware F16C instructions
(8-wide), to be used when/if blender codebase will start to be built
for AVX2 (today it is not yet).
- On arm64, this uses NEON VCVT instructions to do the conversion.
Use these functions in Vulkan buffer/texture conversion code. Time taken to
convert float->half texture while viewing EXR file in image space (22M
numbers to convert): 39.7ms -> 10.1ms (would be 6.9ms if building for AVX2)
Pull Request: https://projects.blender.org/blender/blender/pulls/127838
Blender codebase had two ways to convert half (FP16) to float (FP32):
- BLI_math_bits.h half_to_float. Out of 64k possible half values, it converts
4096 of them incorrectly. Mostly denormals and NaNs, which is perhaps not too
relevant. But more importantly, it converts half zero to float 0.000030517578
which does not sound ideal.
- Functions in Vulkan vk_data_conversion.hh. This one converts 2046 possible
half values incorrectly.
Function to convert float (FP32) to half (FP16) was in Vulkan
vk_data_conversion.hh, and it got a bunch of possible inputs wrong. I guess it
did not do proper "round to nearest even" that CPU/GPU hardware does.
This PR:
- Adds BLI_math_half.hh with float_to_half and half_to_float functions.
- Documentation and test coverage.
- When compiling on ARM NEON, use hardware VCVT instructions.
- Removes the incorrect half_to_float from BLI_math_bits.h and replaces single
usage of it in View3D color picking to use the new function.
- Changes Vulkan FP32<->FP16 conversion code to use the new functions, to fix
correctness issues (makes eevee_next_bsdf_vulkan test pass). This makes it
faster too.
Pull Request: https://projects.blender.org/blender/blender/pulls/127708
I've done this a few times and would have benefited from a utility
function for it, apparently it's done in a few more places too. The
utilities aren't multithreaded for now, it doesn't seem important
and often multithreading happens at a different level of the call
stack anyway.
Pull Request: https://projects.blender.org/blender/blender/pulls/127517
This patch optimizes `IndexMask::from_bits` by making use of the fact that many
bits can be processed at once and one does not have to look at every bit
individual in many cases. Bits are stored as array of `BitInt` (aka `uint64_t`).
So we can process at least 64 bits at a time. On some platforms we can also make
use of SIMD and process up to 128 bits at once. This can significantly improve
performance if all bits are set/unset.
As a byproduct, this patch also optimizes `IndexMask::from_bools` which is now
implemented in terms of `IndexMask::from_bits`. The conversion from bools to
bits has been optimized significantly too by using SIMD intrinsics.
Pull Request: https://projects.blender.org/blender/blender/pulls/126888
This changes how the lazy-loading and unloading of volume grids works. With that
it should also fix#124164.
The cache is now moved to a deeper and more global level. This allows reloadable
volume grids to be unloaded automatically when a memory limit is reached. The
previous system for automatically unloading grids only worked in fairly specific
cases and also did not work all that well with caching (parts of) volume
sequences.
At its core, this patch adds a general cache system in `BLI_memory_cache.hh`. It
has a simple interface of the form `get(key, compute_if_not_cached_fn) ->
value`. To avoid growing the cache indefinitly, it uses the new
`BLI_memory_counter.hh` API to detect when the cache size limit is reached. In
this case it can automatically free some cached values. Currently, this uses an
LRU system, where the items that have not been used in a while are removed
first. Other heuristics can be implemented too, but especially for caches for
loading files from disk this works well already.
The new memory cache is internally used by `volume_grid_file_cache.cc` for
loading individual volume grids and their simplified variants. It could
potentially also be used to cache which grids are stored in a file.
Additionally, it can potentially also be used as caching layer in more places
like loading bakes or in import geometry nodes. It's not clear yet whether this
will need an extension to the API which currently is fairly minimal.
To allow different systems to use the same memory cache, it has to support
arbitrary identifiers for the cached data. Therefore, this patch also introduces
`GenericKey`, which is an abstract base class for any kind of key that is
comparable, hashable and copyable.
The implementation of the cache currently relies on a new `ConcurrentMap`
data-structure which is a thin wrapper around `tbb::concurrent_hash_map` with a
fallback implementation for when `tbb` is not available. This data structure
allows concurrent reads and writes to the cache. Note that adding data to the
cache is still serialized because of the memory counting.
The size of the cache depends on the `memory_cache_limit` property that's
already shown in the user preferences. While it has a generic name, it's
currently only used by the VSE which is currently using the `MEM_CacheLimiter`
API which has a similar purpose but seems to be less automatic, thread-safe and
also has no idea of implicit-sharing. It also seems to be designed in a way
where one is expected to create multiple "cache limiters" each of which has its
own limit. Longer term, we should probably strive towards unifying these
systems, which seems feasible but a bit out of scope right now. While it's not
ideal that these cache systems don't use a shared memory limit, it's essentially
what we already have for all cache systems in Blender, so it's nothing new.
Some tests for lazy-loading had to be removed because this behavior is more
implicit now and is not as easily observable from the outside.
Pull Request: https://projects.blender.org/blender/blender/pulls/126411
This introduces `MemoryCount` which can be used across multiple
`MemoryCounter`. Generally, `MemoryCount` is expected to live
longer (e.g. over the entire life-time of a cache), while `MemoryCounter`
is expected to only exists when actually counting the memory.