Previously, there was a `StringRef.copy` method which would copy the string into
the given buffer. However, it was not defined for the case when the buffer was
too small. It moved the responsibility of making sure the buffer is large enough
to the caller.
Unfortunately, in practice that easily hides bugs in builds without asserts
which don't come up in testing much. Now, the method is replaced with
`StringRef.copy_utf8_truncated` which has much more well defined semantics and
also makes sure that the string remains valid utf-8.
This also renames `unsafe_copy` to `copy_unsafe` to make the naming more similar
to `copy_utf8_truncated`.
Pull Request: https://projects.blender.org/blender/blender/pulls/133677
Before, it was only possible to apply modifiers through a multires modifier if
they were deform-only. The Geometry Nodes modifier is of course not deform-only.
However, often one can build a node setup, that only deforms and does nothing
else.
To make it possible to apply the Geometry Nodes modifier in such cases the
following things had to be done:
* Update `BKE_modifier_deform_verts` to work with modifiers that implement
`modify_geometry_set` instead of `deform_verts`.
* Add error handling for the case when `modify_geometry_set` does more than just
deformation.
* Allow the Geometry Nodes modifier to be applied through a multi-res modifier.
Two new utility types (`ArrayState` and `MeshTopologyState`) have been
introduced to allow for efficient and accurate checking whether the topology has
been modified. In common cases, they can detect that the topology has not been
changed in constant time, but they fall back to linear time checking if it's not
immediately obvious that the topology has not been changed.
This works with the example files from #98559 and #97603.
Pull Request: https://projects.blender.org/blender/blender/pulls/131904
This can be useful when going from a selection of curves to the corresponding
selection of points.
The simple implementation given here should work quite well in the majority of
cases. There is still optimization potential for some cases involving masks with
many gaps. The implementation is also single threaded for now. Using
multi-threading is possible, but should ideally be done in some way to still
lets us exploit the fact that the index ranges are already in sorted order.
Pull Request: https://projects.blender.org/blender/blender/pulls/133323
When using clangd or running clang-tidy on headers there are
currently many errors. These are noisy in IDEs, make auto fixes
impossible, and break features like code completion, refactoring
and navigation.
This makes source/blender headers work by themselves, which is
generally the goal anyway. But #includes and forward declarations
were often incomplete.
* Add #includes and forward declarations
* Add IWYU pragma: export in a few places
* Remove some unused #includes (but there are many more)
* Tweak ShaderCreateInfo macros to work better with clangd
Some types of headers still have errors, these could be fixed or
worked around with more investigation. Mostly preprocessor
template headers like NOD_static_types.h.
Note that that disabling WITH_UNITY_BUILD is required for clangd to
work properly, otherwise compile_commands.json does not contain
the information for the relevant source files.
For more details see the developer docs:
https://developer.blender.org/docs/handbook/tooling/clangd/
Pull Request: https://projects.blender.org/blender/blender/pulls/132608
Before, it was only possible to clear the entire cache at once or to rely on
automatic clearing when it gets full. This patch adds the ability to remove
cached data based on a predicate function.
This is useful for #124369 for partially invalidating the cache for some files.
Pull Request: https://projects.blender.org/blender/blender/pulls/132605
This adds a new `Map.lookup_try` method that returns a `std::optional<Value>`
for a given key. If the key is not in the map, `std::nullopt` is returned. If it
is in the map, then a copy of the value is returned. Note that the copy is
necessary because `std::optional` can't contain reference types.
This method helps a lot in #132219 to reduce boilerplate.
Pull Request: https://projects.blender.org/blender/blender/pulls/132278
This enables moving elements form one vector to another.
Usually this is being doing by extending a vector with the content
from the secondary vector and then clearing the secondary vector.
However sometimes this being performed to transfer ownership of managed elements,
if elements are copied from the secondary vector, but not cleared, this could
lead to 2 vectors to share ownership of objects.
Pull Request: https://projects.blender.org/blender/blender/pulls/131560
Add `bool MutableSpan<T>::contains(const T &value)` function, which is an
exact copy of the function from the `Span<T>` class. This makes it possible
to call `.contains()` directly on the `MutableSpan`, instead of having to
call `.as_span()` first.
Note that the `contains_ptr()` function was already copied from `Span` to
`MutableSpan` before.
No functional changes to the affected code, just the removal of the now
no longer necessary `.to_span()` call.
Pull Request: https://projects.blender.org/blender/blender/pulls/131149
Remove assert statement from make_update.py, making it ready for any
architecture.
Add riscv 32, 64 and 128 bit cpu architecture with little/big endian
to Blender's BLI_build_config.h and Libmv's build_config.h
Tested (to compile) on riscv64 little endian machine.
Pull Request: https://projects.blender.org/blender/blender/pulls/130920
Implements #130836:
interpolate_*_wrapmode_fl take InterpWrapMode wrap_u and wrap_v arguments.
U and V coordinate axes can have different wrap modes: clamp/extend,
border/zero, wrap/repeat.
Note that this removes inconsistency where cubic interpolation was
returning zero for samples completely outside the image, but all other
functions were not, and the behavior was not matching the function
documentation either.
Use the new functions in the new compositor CPU backend.
Possible performance impact for other places (e.g. VSE): measured on
4K resolution, transformed (scaled and rotated) 4K EXR image:
- Nearest filter: no change,
- Bilinear filter: no change,
- Cubic BSpline filter: slight performance decrease, IMB_transform
19.5 -> 20.7 ms (Ryzen 5950X, VS2022). Feels acceptable.
Pull Request: https://projects.blender.org/blender/blender/pulls/130893
This adds support for constructing a `Map` directly from key-value-pairs.
If the same key exists multiple times, only the first one is taken into account.
Note: the keys and values are copied into the map (not moved). This is because
an `std::initializer_list` is read-only. If move-behavior is needed, it is better to
use the existing `map.add*` functions.
```cpp
Map<int, std::string> map = {{1, "where"}, {3, "when"}, {5, "why"}};
````
Pull Request: https://projects.blender.org/blender/blender/pulls/130470
Cleanup to simplify code by using common terms like _first_ and _last_ in context
of predicate applying in a range just like we do for spans and range containers.
Pull Request: https://projects.blender.org/blender/blender/pulls/130380
This extends the `Vector` API to support transfering ownership of a memory
buffer to and from a `Vector`. This reduces the need for unnecessary copies in
some cases which converting between data-structures. A new
`VectorSet::extract_vector` method is added that takes O(1) time. Previously,
this was only possible in O(n) time by copying the entire array.
The `Vector::release` method can be used to e.g. build a vector with the C++
container, but then extract it for use in DNA data.
Pull Request: https://projects.blender.org/blender/blender/pulls/129736
The length checking wasn't accounting for null bytes within multi-byte
sequences and could step over the null bytes.
For BLI_strlen_utf8 this could result in an out of bounds read.
In practice most UTF8 data is validated so the extra checks
are mainly to prevent errors on invalid or corrupt UTF8 text.
Previously, passing a falsy `std::function` (i.e. it is empty) or a null
function pointer would result in a `FunctionRef` which is thruthy and thus
appeared callable. Now, when the passed in callable is falsy, the resulting
`FunctionRef` will be too.
Pull Request: https://projects.blender.org/blender/blender/pulls/128823
The paths `C:` (on WIN32) and `/` on other systems was detecting as
relative. This meant `BLI_path_abs_from_cwd` would make both paths
CWD relative. Also remove duplicate call to BLI_path_is_unc.
In addition to float<->half functions to convert one number (#127708), add
float_to_half_array and half_to_float_array functions:
- On x64, this uses SSE2 4-wide implementation to do the conversion
(2x faster half->float, 4x faster float->half compared to scalar),
- There's also an AVX2 codepath that uses CPU hardware F16C instructions
(8-wide), to be used when/if blender codebase will start to be built
for AVX2 (today it is not yet).
- On arm64, this uses NEON VCVT instructions to do the conversion.
Use these functions in Vulkan buffer/texture conversion code. Time taken to
convert float->half texture while viewing EXR file in image space (22M
numbers to convert): 39.7ms -> 10.1ms (would be 6.9ms if building for AVX2)
Pull Request: https://projects.blender.org/blender/blender/pulls/127838
Blender codebase had two ways to convert half (FP16) to float (FP32):
- BLI_math_bits.h half_to_float. Out of 64k possible half values, it converts
4096 of them incorrectly. Mostly denormals and NaNs, which is perhaps not too
relevant. But more importantly, it converts half zero to float 0.000030517578
which does not sound ideal.
- Functions in Vulkan vk_data_conversion.hh. This one converts 2046 possible
half values incorrectly.
Function to convert float (FP32) to half (FP16) was in Vulkan
vk_data_conversion.hh, and it got a bunch of possible inputs wrong. I guess it
did not do proper "round to nearest even" that CPU/GPU hardware does.
This PR:
- Adds BLI_math_half.hh with float_to_half and half_to_float functions.
- Documentation and test coverage.
- When compiling on ARM NEON, use hardware VCVT instructions.
- Removes the incorrect half_to_float from BLI_math_bits.h and replaces single
usage of it in View3D color picking to use the new function.
- Changes Vulkan FP32<->FP16 conversion code to use the new functions, to fix
correctness issues (makes eevee_next_bsdf_vulkan test pass). This makes it
faster too.
Pull Request: https://projects.blender.org/blender/blender/pulls/127708
I've done this a few times and would have benefited from a utility
function for it, apparently it's done in a few more places too. The
utilities aren't multithreaded for now, it doesn't seem important
and often multithreading happens at a different level of the call
stack anyway.
Pull Request: https://projects.blender.org/blender/blender/pulls/127517
This patch optimizes `IndexMask::from_bits` by making use of the fact that many
bits can be processed at once and one does not have to look at every bit
individual in many cases. Bits are stored as array of `BitInt` (aka `uint64_t`).
So we can process at least 64 bits at a time. On some platforms we can also make
use of SIMD and process up to 128 bits at once. This can significantly improve
performance if all bits are set/unset.
As a byproduct, this patch also optimizes `IndexMask::from_bools` which is now
implemented in terms of `IndexMask::from_bits`. The conversion from bools to
bits has been optimized significantly too by using SIMD intrinsics.
Pull Request: https://projects.blender.org/blender/blender/pulls/126888
This changes how the lazy-loading and unloading of volume grids works. With that
it should also fix#124164.
The cache is now moved to a deeper and more global level. This allows reloadable
volume grids to be unloaded automatically when a memory limit is reached. The
previous system for automatically unloading grids only worked in fairly specific
cases and also did not work all that well with caching (parts of) volume
sequences.
At its core, this patch adds a general cache system in `BLI_memory_cache.hh`. It
has a simple interface of the form `get(key, compute_if_not_cached_fn) ->
value`. To avoid growing the cache indefinitly, it uses the new
`BLI_memory_counter.hh` API to detect when the cache size limit is reached. In
this case it can automatically free some cached values. Currently, this uses an
LRU system, where the items that have not been used in a while are removed
first. Other heuristics can be implemented too, but especially for caches for
loading files from disk this works well already.
The new memory cache is internally used by `volume_grid_file_cache.cc` for
loading individual volume grids and their simplified variants. It could
potentially also be used to cache which grids are stored in a file.
Additionally, it can potentially also be used as caching layer in more places
like loading bakes or in import geometry nodes. It's not clear yet whether this
will need an extension to the API which currently is fairly minimal.
To allow different systems to use the same memory cache, it has to support
arbitrary identifiers for the cached data. Therefore, this patch also introduces
`GenericKey`, which is an abstract base class for any kind of key that is
comparable, hashable and copyable.
The implementation of the cache currently relies on a new `ConcurrentMap`
data-structure which is a thin wrapper around `tbb::concurrent_hash_map` with a
fallback implementation for when `tbb` is not available. This data structure
allows concurrent reads and writes to the cache. Note that adding data to the
cache is still serialized because of the memory counting.
The size of the cache depends on the `memory_cache_limit` property that's
already shown in the user preferences. While it has a generic name, it's
currently only used by the VSE which is currently using the `MEM_CacheLimiter`
API which has a similar purpose but seems to be less automatic, thread-safe and
also has no idea of implicit-sharing. It also seems to be designed in a way
where one is expected to create multiple "cache limiters" each of which has its
own limit. Longer term, we should probably strive towards unifying these
systems, which seems feasible but a bit out of scope right now. While it's not
ideal that these cache systems don't use a shared memory limit, it's essentially
what we already have for all cache systems in Blender, so it's nothing new.
Some tests for lazy-loading had to be removed because this behavior is more
implicit now and is not as easily observable from the outside.
Pull Request: https://projects.blender.org/blender/blender/pulls/126411
This introduces `MemoryCount` which can be used across multiple
`MemoryCounter`. Generally, `MemoryCount` is expected to live
longer (e.g. over the entire life-time of a cache), while `MemoryCounter`
is expected to only exists when actually counting the memory.
We often have the situation where it would be good if we could easily estimate
the memory usage of some value (e.g. a mesh, or volume). Examples of where we
ran into this in the past:
* Undo step size.
* Caching of volume grids.
* Caching of loaded geometries for import geometry nodes.
Generally, most caching systems would benefit from the ability to know how much
memory they currently use to make better decisions about which data to free and
when. The goal of this patch is to introduce a simple general API to count the
memory usage that is independent of any specific caching system. I'm doing this
to "fix" the chicken and egg problem that caches need to know the memory usage,
but we don't really need to count the memory usage without using it for caches.
Implementing caching and memory counting at the same time make both harder than
implementing them one after another.
The main difficulty with counting memory usage is that some memory may be shared
using implicit sharing. We want to avoid double counting such memory. How
exactly shared memory is treated depends a bit on the use case, so no specific
assumptions are made about that in the API. The gathered memory usage is not
expected to be exact. It's expected to be a decent approximation. It's neither a
lower nor an upper bound unless specified by some specific type. Cache systems
generally build on top of heuristics to decide when to free what anyway.
There are two sides to this API:
1. Get the amount of memory used by one or more values. This side is used by
caching systems and/or systems that want to present the used memory to the
user.
2. Tell the caller how much memory is used. This side is used by all kinds of
types that can report their memory usage such as meshes.
```cpp
/* Get how much memory is used by two meshes together. */
MemoryCounter memory;
mesh_a->count_memory(memory);
mesh_b->count_memory(memory);
int64_t bytes_used = memory.counted_bytes();
/* Tell the caller how much memory is used. */
void Mesh::count_memory(blender::MemoryCounter &memory) const
{
memory.add_shared(this->runtime->face_offsets_sharing_info,
this->face_offsets().size_in_bytes());
/* Forward memory counting to lower level types. This should be fairly common. */
CustomData_count_memory(this->vert_data, this->verts_num, memory);
}
void CustomData_count_memory(const CustomData &data,
const int totelem,
blender::MemoryCounter &memory)
{
for (const CustomDataLayer &layer : Span{data.layers, data.totlayer}) {
memory.add_shared(layer.sharing_info, [&](blender::MemoryCounter &shared_memory) {
/* Not quite correct for all types, but this is only a rough approximation anyway. */
const int64_t elem_size = CustomData_get_elem_size(&layer);
shared_memory.add(totelem * elem_size);
});
}
}
```
Pull Request: https://projects.blender.org/blender/blender/pulls/126295
Covers OS detection, CPU architecture, bitness, and compiler family.
The goal of this change is to provide easier to use and remember checks
for these things. For example, with this change code like
```
#ifdef _WIN32
...
#elif defined(__APPLE__) || defined(__FreeBSD__) || defined(__NetBSD__) || \
defined(__OpenBSD__)
..
#endif
```
becomes
```
#if OS_WIN
...
#elif OS_MAC || OS_BSD
...
#endif
```
The code is originally based on build_config.h from Chromium, which was
first modified for Libmv, then to some other projects, and now is
adopted for Blender itself.
The checks are relying on the -Wundef to provide hint of cases when an
include is missing prior to the platform-specific checks.
This change only introduces possibility of cleaner checks and does not
start actual refactor.
Pull Request: https://projects.blender.org/blender/blender/pulls/118908