Commit Graph

426 Commits

Author SHA1 Message Date
Brecht Van Lommel
98b3b36411 Refactor: Build: Add bf::dependencies::eigen target
To make adding a dependeny on TBB easier.

Additional changes:
* Using LIB for libmv tests, as it now brings in includes
* Removing Eigen header listing in iTaSC

Pull Request: https://projects.blender.org/blender/blender/pulls/136865
2025-04-02 16:50:46 +02:00
Jorn Visser
b1bb1d9815 Build: Make the FFTW threads library required to use FFTW
This is done because the library is necessary to make certain FFTW
functions thread safe, see #136557 as well.

Also pass each library variable separately to
`find_package_handle_standard_args` instead of as a list, as otherwise
it won't correctly detect if `libfftw3f` or `libfftw3f_threads` is
missing. This is because CMake considers a value false if it contains
`-NOTFOUND` at the end, but not if it's in the middle. For example,
CMake considers
`.../libfftw3f.a;.../libfftw3f_threads.a;FFTW3_LIBRARY_D-NOTFOUND` to be
false, but
`.../libfftw3f.a;FFTW3_LIBRARY_THREADS_F-NOTFOUND;.../libfftw3.a` to be
true.

---

I noticed that some other find modules also have the same list issue. I guess it was done this way to make CMake print all the found libraries instead of only the first.

Pull Request: https://projects.blender.org/blender/blender/pulls/136692
2025-03-31 14:42:35 +02:00
Jacques Lucke
202db40afb Refactor: move uvproject code from blenlib to blenkernel
I'm moving this for two (related) reasons:
* It depends a lot on the specifics of `Camera` and `Object` data-blocks.
* It links `Object::object_to_world()` which is not an inline function and thus
  easily leads to linker errors. It mostly seems like luck that this is not
  breaking our build due to early dead code elimination when linking binaries
  which use the blenlib static library such as `msgfmt`.

I found this while working on a compilation tool which would not be as lucky and
has a linker error because of the dependence on `Object::object_to_world`.

Pull Request: https://projects.blender.org/blender/blender/pulls/136547
2025-03-26 20:51:57 +01:00
Jacques Lucke
ac2cd6c1ef Geometry Nodes: make CSV parser more reliable and faster
This reimplements the CSV parser used by the (still experimental) Import CSV
node.

Reliability is improved by:
* Properly handling quoted fields.
* Unit tests.
* Generalizing the parser to be able to handle customized delimiter, quote and
  escape characters (those are not exposed in the node yet though).
* More accurate detection of column types by actually taking all values of a
  column into account instead of only the first row.

Performance is improved by designing the parser in a way that supports
multi-threaded parsing. I'm measuring about 5x performance improvement which
mainly comes from multi-threading. Some files I wanted to use for benchmarking
didn't load in the version that's in `main` but do load fine with this new
version.

The implementation is now split up into two parts:
1. A general CSV parser in `blenlib` that manages splitting a buffer into
   records and their fields.
2. Application specific parsing of fields into e.g. floats and integers which
   remains in `io/csv/importer`.

This separation simplifies unit testing and makes the core code more reusable.

Pull Request: https://projects.blender.org/blender/blender/pulls/134715
2025-02-19 11:10:59 +01:00
Brecht Van Lommel
e2e1984e60 Refactor: Convert remainder of blenlib to C++
A few headers like BLI_math_constants.h and BLI_utildefines.h keep working
for C code, for remaining makesdna and userdef defaults code in C.

Pull Request: https://projects.blender.org/blender/blender/pulls/134406
2025-02-12 23:01:08 +01:00
Brecht Van Lommel
ec0fc49fb8 Cleanup: Remove unused BLI_blenlib.h 2025-01-31 17:03:18 +01:00
Jacques Lucke
619d9e4e01 Fix #98559: support applying Geometry Nodes through multires modifier
Before, it was only possible to apply modifiers through a multires modifier if
they were deform-only. The Geometry Nodes modifier is of course not deform-only.
However, often one can build a node setup, that only deforms and does nothing
else.

To make it possible to apply the Geometry Nodes modifier in such cases the
following things had to be done:
* Update `BKE_modifier_deform_verts` to work with modifiers that implement
  `modify_geometry_set` instead of `deform_verts`.
* Add error handling for the case when `modify_geometry_set` does more than just
  deformation.
* Allow the Geometry Nodes modifier to be applied through a multi-res modifier.

Two new utility types (`ArrayState` and `MeshTopologyState`) have been
introduced to allow for efficient and accurate checking whether the topology has
been modified. In common cases, they can detect that the topology has not been
changed in constant time, but they fall back to linear time checking if it's not
immediately obvious that the topology has not been changed.

This works with the example files from #98559 and #97603.

Pull Request: https://projects.blender.org/blender/blender/pulls/131904
2025-01-23 17:34:30 +01:00
Brecht Van Lommel
920e709069 Refactor: Make header files more clangd and clang-tidy friendly
When using clangd or running clang-tidy on headers there are
currently many errors. These are noisy in IDEs, make auto fixes
impossible, and break features like code completion, refactoring
and navigation.

This makes source/blender headers work by themselves, which is
generally the goal anyway. But #includes and forward declarations
were often incomplete.

* Add #includes and forward declarations
* Add IWYU pragma: export in a few places
* Remove some unused #includes (but there are many more)
* Tweak ShaderCreateInfo macros to work better with clangd

Some types of headers still have errors, these could be fixed or
worked around with more investigation. Mostly preprocessor
template headers like NOD_static_types.h.

Note that that disabling WITH_UNITY_BUILD is required for clangd to
work properly, otherwise compile_commands.json does not contain
the information for the relevant source files.

For more details see the developer docs:
https://developer.blender.org/docs/handbook/tooling/clangd/

Pull Request: https://projects.blender.org/blender/blender/pulls/132608
2025-01-07 12:39:13 +01:00
Hans Goudey
70a86b14c0 Cleanup: Repace BLI_range.h with Bounds<float>
Pull Request: https://projects.blender.org/blender/blender/pulls/132181
2024-12-20 21:05:40 +01:00
Hans Goudey
0e15649bf8 Cleanup: Remove unused dynlib files
This has been unused since the game engine was removed.

Pull Request: https://projects.blender.org/blender/blender/pulls/132186
2024-12-20 20:38:40 +01:00
Hans Goudey
31964ef5ca Cleanup: Move BLI_kdopbvh to C++
Pull Request: https://projects.blender.org/blender/blender/pulls/132031
2024-12-17 21:04:55 +01:00
Campbell Barton
5dd67c6e1c Cleanup: sort CMake path lists 2024-12-09 09:18:50 +11:00
Hans Goudey
77af97d5b9 Cleanup: Move some blenlib files to C++
Pull Request: https://projects.blender.org/blender/blender/pulls/131319
2024-12-05 14:36:01 +01:00
Germano Cavalcante
1633766c2e Refactor: move blenlib system_win32 to C++ 2024-11-07 17:33:27 -03:00
Hans Goudey
5ed65e6608 Cleanup: Remove unused grease pencil V2 update cache, DLRB tree
See #123468.

Pull Request: https://projects.blender.org/blender/blender/pulls/129729
2024-11-02 22:18:10 +01:00
Campbell Barton
381898b6dc Refactor: move BLI_path_util header to C++, rename to BLI_path_utils
Move to a C++ header to allow C++ features to be used there,
use the "utils" suffix as it's preferred for new files.

Ref !128147
2024-09-26 21:13:39 +10:00
Campbell Barton
36edfe04e0 Refactor: move blenlib tempfile to C++ 2024-09-26 09:39:35 +10:00
Jesse Yurkovich
447cde140d Cleanup: Remove unused BLI_array macro implementation
Remove unused files.

Pull Request: https://projects.blender.org/blender/blender/pulls/127962
2024-09-22 00:53:14 +02:00
Campbell Barton
427be373f7 Cleanup: sort cmake file lists 2024-09-21 16:26:43 +10:00
Aras Pranckevicius
92544d6d76 BLI: add float<->half conversion functions with correct math, use in Vulkan
Blender codebase had two ways to convert half (FP16) to float (FP32):

- BLI_math_bits.h half_to_float. Out of 64k possible half values, it converts
  4096 of them incorrectly. Mostly denormals and NaNs, which is perhaps not too
  relevant. But more importantly, it converts half zero to float 0.000030517578
  which does not sound ideal.
- Functions in Vulkan vk_data_conversion.hh. This one converts 2046 possible
  half values incorrectly.

Function to convert float (FP32) to half (FP16) was in Vulkan
vk_data_conversion.hh, and it got a bunch of possible inputs wrong. I guess it
did not do proper "round to nearest even" that CPU/GPU hardware does.

This PR:

- Adds BLI_math_half.hh with float_to_half and half_to_float functions.
    - Documentation and test coverage.
    - When compiling on ARM NEON, use hardware VCVT instructions.
- Removes the incorrect half_to_float from BLI_math_bits.h and replaces single
  usage of it in View3D color picking to use the new function.
- Changes Vulkan FP32<->FP16 conversion code to use the new functions, to fix
  correctness issues (makes eevee_next_bsdf_vulkan test pass). This makes it
  faster too.

Pull Request: https://projects.blender.org/blender/blender/pulls/127708
2024-09-18 13:15:00 +02:00
Aras Pranckevicius
806b0e8379 BLI: improve 2/3/4d vector codegen for debug or asserts-enabled builds
Majority of math operations on VecBase<> were implemented by calling into an
indexing operator, sometimes coupled with unroll<Size> template.

When compiler optimizations are off (e.g. Debug build), or when asserts are on
(e.g. usual "developer" setup), this resulted in codegen that is very
sub-optimal. Especially if these vector types are used a lot, e.g. when
scaling down a screenshot for saving as a thumbnail into the blend file.

Address that by explicit code paths for 4,3,2 dimensional vectors, that
avoids both the unroll<> template and indexing operator. To avoid repeated long
typo-prone code, do that with C preprocessor :( -- however all of the
preprocessor innards are in a separate file BLI_math_vector_unroll.hh so they
do not get into the way much.

Scaling down a screenshot to the blend file thumbnail, while saving the blend
file, on my machine: (4K screen resolution, Ryzen 5950X, VS2022 build), which
involves two calls to IMB_scale which uses float4 for pixel operations:

- Release with asserts off (what ships to users): no change at 9.4ms
- Release with asserts on ("developer" setup): 38.1ms -> 9.4ms
- Debug: 226ms -> 64ms
- Debug w/ ASAN: 314ms -> 78ms

Pull Request: https://projects.blender.org/blender/blender/pulls/127577
2024-09-16 13:06:16 +02:00
Hans Goudey
13f179a9c0 Cleanup: Add utility function to sum offset indices group sizes
I've done this a few times and would have benefited from a utility
function for it, apparently it's done in a few more places too. The
utilities aren't multithreaded for now, it doesn't seem important
and often multithreading happens at a different level of the call
stack anyway.

Pull Request: https://projects.blender.org/blender/blender/pulls/127517
2024-09-12 20:28:35 +02:00
Jonas Holzman
fe93de1a91 Obj-C Refactor: Port BLI_delete_soft from objc_* runtime calls to proper Obj-C
Port the macOS version of the `BLI_delete_soft` function from raw
runtime `objc_*` calls function to proper Objective-C for increased
readability and long-term maintainability. This new function is placed
in a new `intern/fileops_apple.mm` file, analogous to the existing
`intern/storage_apple.mm` file.

Pull Request: https://projects.blender.org/blender/blender/pulls/126766
2024-09-03 12:08:20 +02:00
Jacques Lucke
66adedbd78 BLI: optimize constructing IndexMask from bits and bools
This patch optimizes `IndexMask::from_bits` by making use of the fact that many
bits can be processed at once and one does not have to look at every bit
individual in many cases. Bits are stored as array of `BitInt` (aka `uint64_t`).
So we can process at least 64 bits at a time. On some platforms we can also make
use of SIMD and process up to 128 bits at once. This can significantly improve
performance if all bits are set/unset.

As a byproduct, this patch also optimizes `IndexMask::from_bools` which is now
implemented in terms of `IndexMask::from_bits`. The conversion from bools to
bits has been optimized significantly too by using SIMD intrinsics.

Pull Request: https://projects.blender.org/blender/blender/pulls/126888
2024-08-29 12:15:33 +02:00
Campbell Barton
b76fcc3a1f Cleanup: add missing headers to CMake's file listing 2024-08-23 10:19:53 +10:00
Campbell Barton
07b11206eb Cleanup: sort cmake file lists 2024-08-23 10:19:53 +10:00
Campbell Barton
08d5eb8f9c Cleanup: cmake formatting 2024-08-21 23:20:34 +10:00
Jacques Lucke
354a097ce0 Volumes: improve file cache and unloading
This changes how the lazy-loading and unloading of volume grids works. With that
it should also fix #124164.

The cache is now moved to a deeper and more global level. This allows reloadable
volume grids to be unloaded automatically when a memory limit is reached. The
previous system for automatically unloading grids only worked in fairly specific
cases and also did not work all that well with caching (parts of) volume
sequences.

At its core, this patch adds a general cache system in `BLI_memory_cache.hh`. It
has a simple interface of the form `get(key, compute_if_not_cached_fn) ->
value`. To avoid growing the cache indefinitly, it uses the new
`BLI_memory_counter.hh` API to detect when the cache size limit is reached. In
this case it can automatically free some cached values. Currently, this uses an
LRU system, where the items that have not been used in a while are removed
first. Other heuristics can be implemented too, but especially for caches for
loading files from disk this works well already.

The new memory cache is internally used by `volume_grid_file_cache.cc` for
loading individual volume grids and their simplified variants. It could
potentially also be used to cache which grids are stored in a file.
Additionally, it can potentially also be used as caching layer in more places
like loading bakes or in import geometry nodes. It's not clear yet whether this
will need an extension to the API which currently is fairly minimal.

To allow different systems to use the same memory cache, it has to support
arbitrary identifiers for the cached data. Therefore, this patch also introduces
`GenericKey`, which is an abstract base class for any kind of key that is
comparable, hashable and copyable.

The implementation of the cache currently relies on a new `ConcurrentMap`
data-structure which is a thin wrapper around `tbb::concurrent_hash_map` with a
fallback implementation for when `tbb` is not available. This data structure
allows concurrent reads and writes to the cache. Note that adding data to the
cache is still serialized because of the memory counting.

The size of the cache depends on the `memory_cache_limit` property that's
already shown in the user preferences. While it has a generic name, it's
currently only used by the VSE which is currently using the `MEM_CacheLimiter`
API which has a similar purpose but seems to be less automatic, thread-safe and
also has no idea of implicit-sharing. It also seems to be designed in a way
where one is expected to create multiple "cache limiters" each of which has its
own limit. Longer term, we should probably strive towards unifying these
systems, which seems feasible but a bit out of scope right now. While it's not
ideal that these cache systems don't use a shared memory limit, it's essentially
what we already have for all cache systems in Blender, so it's nothing new.

Some tests for lazy-loading had to be removed because this behavior is more
implicit now and is not as easily observable from the outside.

Pull Request: https://projects.blender.org/blender/blender/pulls/126411
2024-08-19 20:39:32 +02:00
Jacques Lucke
a8667aa03f Core: introduce MemoryCounter API
We often have the situation where it would be good if we could easily estimate
the memory usage of some value (e.g. a mesh, or volume). Examples of where we
ran into this in the past:
* Undo step size.
* Caching of volume grids.
* Caching of loaded geometries for import geometry nodes.

Generally, most caching systems would benefit from the ability to know how much
memory they currently use to make better decisions about which data to free and
when. The goal of this patch is to introduce a simple general API to count the
memory usage that is independent of any specific caching system. I'm doing this
to "fix" the chicken and egg problem that caches need to know the memory usage,
but we don't really need to count the memory usage without using it for caches.
Implementing caching and memory counting at the same time make both harder than
implementing them one after another.

The main difficulty with counting memory usage is that some memory may be shared
using implicit sharing. We want to avoid double counting such memory. How
exactly shared memory is treated depends a bit on the use case, so no specific
assumptions are made about that in the API. The gathered memory usage is not
expected to be exact. It's expected to be a decent approximation. It's neither a
lower nor an upper bound unless specified by some specific type. Cache systems
generally build on top of heuristics to decide when to free what anyway.

There are two sides to this API:
1. Get the amount of memory used by one or more values. This side is used by
   caching systems and/or systems that want to present the used memory to the
   user.
2. Tell the caller how much memory is used. This side is used by all kinds of
   types that can report their memory usage such as meshes.

```cpp
/* Get how much memory is used by two meshes together. */
MemoryCounter memory;
mesh_a->count_memory(memory);
mesh_b->count_memory(memory);
int64_t bytes_used = memory.counted_bytes();

/* Tell the caller how much memory is used. */
void Mesh::count_memory(blender::MemoryCounter &memory) const
{
  memory.add_shared(this->runtime->face_offsets_sharing_info,
                    this->face_offsets().size_in_bytes());

  /* Forward memory counting to lower level types. This should be fairly common. */
  CustomData_count_memory(this->vert_data, this->verts_num, memory);
}

void CustomData_count_memory(const CustomData &data,
                             const int totelem,
                             blender::MemoryCounter &memory)
{
  for (const CustomDataLayer &layer : Span{data.layers, data.totlayer}) {
    memory.add_shared(layer.sharing_info, [&](blender::MemoryCounter &shared_memory) {
      /* Not quite correct for all types, but this is only a rough approximation anyway. */
      const int64_t elem_size = CustomData_get_elem_size(&layer);
      shared_memory.add(totelem * elem_size);
    });
  }
}
```

Pull Request: https://projects.blender.org/blender/blender/pulls/126295
2024-08-15 10:54:21 +02:00
Iliya Katueshenock
4839a86984 Cleanup: BLI: Merge files
Deduplicate IndexRange implementation files.

Pull Request: https://projects.blender.org/blender/blender/pulls/126169
2024-08-13 20:26:55 +02:00
Sergey Sharybin
baf9691959 BLI: Add easy and portable way of platform-specific checks
Covers OS detection, CPU architecture, bitness, and compiler family.

The goal of this change is to provide easier to use and remember checks
for these things. For example, with this change code like

```
  #ifdef _WIN32
  ...
  #elif defined(__APPLE__) || defined(__FreeBSD__) || defined(__NetBSD__) ||     \
      defined(__OpenBSD__)
  ..
  #endif

```

becomes

```
  #if OS_WIN
  ...
  #elif OS_MAC || OS_BSD
  ...
  #endif
```

The code is originally based on build_config.h from Chromium, which was
first modified for Libmv, then to some other projects, and now is
adopted for Blender itself.

The checks are relying on the -Wundef to provide hint of cases when an
include is missing prior to the platform-specific checks.

This change only introduces possibility of cleaner checks and does not
start actual refactor.

Pull Request: https://projects.blender.org/blender/blender/pulls/118908
2024-08-06 14:33:53 +02:00
Jesse Yurkovich
ec4fc2d34a CMake: Modernize the optional TBB dependency
This continues the cmake modernization effort and introduces support for
allowing our optional dependencies to integrate properly. TBB is added
here as it's proven troublesome to maintain correctly.

Currently the only Blender project which uses the TBB headers directly
is `blenlib`.  However, all downstream projects which require blenlib as
their dependency, and wish to properly make use of its threading
facilities, needed to define various TBB items in their CMake files. Not
only is this unnecessary and arcane, but several projects didn't do this
and ended up not using threading as well as producing ODR violations
along the way[1].

This PR makes TBB a modern dependency and exposes it PUBLIC'ly from
`blenlib`.  All downstream projects which depend on blenlib will now
receive everything they require from TBB automatically. This includes
the `WITH_TBB` define, the headers, and the library itself.

[1] blender/blender@05241f47f5

Pull Request: https://projects.blender.org/blender/blender/pulls/124916
2024-07-19 23:30:56 +02:00
Miguel Pozo
74224b25a5 GPU: Add GPU_shader_batch_create_from_infos
This is the first commit of the several required to support
subprocess-based parallel compilation on OpenGL.
This provides the base API and implementation, and exposes the max
subprocesses setting on the UI, but it's not used by any code yet.

More information and the rest of the code can be found in #121925.

This one includes:
- A new `GPU_shader_batch` API that allows requesting the compilation
  of multiple shaders at once, allowing GPU backed to compile them in
  parallel and asynchronously without blocking the Blender UI.
- A virtual `ShaderCompiler` class that backends can use to add their
  own implementation.
- A `ShaderCompilerGeneric` class that implements synchronous/blocking
  compilation of batches for backends that don't have their own
  implementation yet.
- A `GLShaderCompiler` that supports parallel compilation using
  subprocesses.
- A new `BLI_subprocess` API, including IPC (required for the
  `GLShaderCompiler` implementation).
- The implementation of the subprocess program in
  `GPU_compilation_subprocess`.
- A new `Max Shader Compilation Subprocesses` option in
  `Preferences > System > Memory & Limits` to enable parallel shader
  compilation and the max number of subprocesses to allocate (each
  subprocess has a relatively high memory footprint).

Implementation Overview:
There's a single `GLShaderCompiler` shared by all OpenGL contexts.
This class stores a pool of up to `GCaps.max_parallel_compilations`
subprocesses that can be used for compilation.
Each subprocess has a shared memory pool used for sending the shader
source code from the main Blender process and for receiving the already
compiled shader binary from the subprocess. This is synchronized using
a series of shared semaphores.
The subprocesses maintain a shader cache on disk inside a
`BLENDER_SHADER_CACHE` folder at the OS temporary folder.
Shaders that fail to compile are tried to be compiled again locally for
proper error reports.
Hanged subprocesses are currently detected using a timeout of 30s.

Pull Request: https://projects.blender.org/blender/blender/pulls/122232
2024-06-05 18:45:57 +02:00
Omar Emara
d4bf23771d Compositor: Optimize Fog Glow Glare node
This patches optimizes the Fog Glow Glare node to be about 25x faster
for 4K images. This is mainly achieved by utilizing the FFTW library and
multi-threading support code. Further improvements are still possible by
caching kernels, but the CPU compositor does not support caching yet.

The old Hartley transform was removed, so the node no longer works when
FFTW is disabled as a build time option, much like the OIDN node. A new
BLI library was introduced for FFTW, it includes some helper routines
relevant for FFTW as well as an initialization routine that sets up
multithreading using TBB as well as thread safety.

Build system support for threaded FFTW was also added, which defines the
relevant variables to detect threading support as well as add the
relevant libraries.

We do not currently have the threaded FFTW libs in our precompiled libs,
so the threading code is disabled until the libs lands in the coming
weeks. So currently, the code is only about 9x faster.

The only functional change is that the kernel is now odd sized, which
should produce more accurate results, but the final result is almost
identical and mostly undetectable.

The plan is to port this to the GPU as well similar to how we implement
OIDN until we have a GPU FFT implementation. GPU compositor can also do
caching, so it should be faster, being able to compute a 4K image in
under half a second.

Pull Request: https://projects.blender.org/blender/blender/pulls/121653
2024-05-17 12:45:21 +02:00
Campbell Barton
ab6e00bd7d Cleanup: sort cmake file lists 2024-05-06 09:20:57 +10:00
Sergey Sharybin
1b0012d51c Refactor: Require C++ for users of BLI_simd.h
This is because sse2neon.h might be used to emulate SSE intrinsics
on ARM64 architecture, and it uses some preprocessor which is not
available for C language when using MSVC.

The old-style math file math_matrix.c uses this header, so needed
to become C++. Simple rename did not work since there is a new math
utility math_matrix.cc exists. Following some existing convention
the math_matrix.c is renamed to math_matrix_c.cc. Eventually all the
code should switch to use C++ style math, and the C style removed,
so it seems reasonable to not mix old and new style of API in the
same file.

There should be no functional changes.

Pull Request: https://projects.blender.org/blender/blender/pulls/121335
2024-05-02 16:22:19 +02:00
Jacques Lucke
8d13a9608b BLI: generalize task size hints for parallel_for
This integrates the functionality for `parallel_for_weighted` from 9a3ceb79de
into `parallel_for`. This reduces the number of entry points to the threading
API and also makes it easier to build higher level threading primitives. For
example, `IndexMask.foreach_*` may use `parallel_for` if a `GrainSize` is
provided, but can't use `parallel_for_weighted` easily without duplicating a
fair amount of code.

The default behavior of `parallel_for` does not change. However, now one can
optionally pass in `TaskSizeHints` as the last parameter. This can be used to
specify the size of individual tasks relative to each other and relative to the
grain size. This helps scheduling more equally sized tasks which generally
improves performance because threads are used more effectively.

One generally does not construct `TaskSizeHints` manually, but calls either
`threading::individual_task_sizes` or `threading::accumulated_task_sizes`. Both
allow specifying individual task sizes, but the latter should be used when the
combined size of consecutive tasks can be computed in O(1) time. This allows
splitting up the work more efficiently. It can often be used in conjunction with
`OffsetIndices`.

Pull Request: https://projects.blender.org/blender/blender/pulls/121127
2024-04-29 23:55:22 +02:00
Jacques Lucke
51f8bf53b2 Geometry Nodes: use xxhash for compute context hash
Previously, md5 was used which is significantly slower. In almost all cases
this does not have a significant performance impact in practice. However,
it's possible to build geometry nodes setups that become a few percent
faster ( by combining lots of cheap node groups). Using xxhash instead of
md5 should never be slower.

Pull Request: https://projects.blender.org/blender/blender/pulls/120225
2024-04-03 20:11:09 +02:00
Campbell Barton
937776b555 Cleanup: sort CMake file lists 2024-04-01 16:48:44 +11:00
Jacques Lucke
7314c86869 BLI: add fixed width integer type
This is intended to be used in the new exact mesh boolean algorithm by @howardt.
The new `BLI_fixed_width_int.hh` header provides types like `Int256` and
`UInt256` which are like e.g. `uint64_t` but with higher precision. The code
supports many different integer sizes.

The following operations are supported:
* Addition
* Subtraction
* Multiplication
* Comparisons
* Negation
* Conversion to and from other number types
* Conversion to and from string (based on `GMP`)

Division is not implemented. It could be implemented, but it's more complex and
is not required for the new mesh boolean algorithm.

Some alternatives to having a custom implementation have been discussed in
https://devtalk.blender.org/t/fixed-length-multiprecision-arithmetic/29189/.

Generally, the implementation is fairly straight forward. The main complexity is
the addition/multiplication algorithm which isn't too complicated. It's nice to
have control over this part as it allows us to optimize the code more if
necessary. Also, from what I understand, we might be able to benefit from some
special cases like multiplying a large integer with a smaller one.

I tried some different ways to optimize this already, but so far the normal
compiler optimization turned out to work best. Not sure if the same is true on
windows though, as it doesn't have native support for an `int128` which helps
the compiler understand what I'm doing. Alternatives I tried so far are using
intrinsics directly (mainly `_addcarry_u64` and similar), writing inline
assembly manually and copying the assembly output from the compiler. I assume
the assembly implementation didn't help for me because it prohibited other
compiler optimizations.

Pull Request: https://projects.blender.org/blender/blender/pulls/119528
2024-03-25 23:39:42 +01:00
Jacques Lucke
ee1fa8e1ca BLI: support set operations on index masks
The `IndexMask` data structure was designed to allow us to implement set
operations like `union`, `intersection` and `difference` efficiently
(2cfcb8b0b8). This patch adds an evaluator for
arbitrary expressions involving the mentioned operations. The evaluator makes
use of the design of the `IndexMask` data structure to be quite efficient.

In some common cases, the evaluator runs in constant time. So it's very fast
even if the mask contains many millions of indices. If possible the evaluator
works on entire segments at once instead of looking at the individual indices.
This results in a very low constant factor even if the evaluation time is
linear. If the evaluator has to look at the individual indices to be able to
perform the operation, it can make use of multi-threading.

The evaluation consists of the following steps:
1. A coarse evaluation that looks at entire segments at once.
2. All segments that couldn't be fully evaluated by the coarse evaluation are
   evaluated exactly by looking at the actual indices. There are two evaluators
   for this case. One that is based on `std::set_union` etc. The other one first
   converts the index masks to bit spans, then does bit operations to evaluate
   the expression, and then converts the bits back into indices. Depending on
   the expression, one or the other can be more efficient.
3. Construct an index mask from the evaluated segments.

Showing the performance of the evaluator is kind of difficult because it highly
depends on the input data. Comparing the performance to something that does not
short-circuit when there are full ranges is meaningless, because one can
construct an example where the new evaluator is arbitrarily faster. I'm still
working on a case where performance can be compared to e.g. using
`std::set_union`. This comparison is only fair when the input data when
constructing a case where the new evaluator can't short-circuit.

One of the main remaining bottlenecks are the calls to `slice_content` on large
index masks. I think the impact of those can still be reduced.

We are not using this evaluator much yet, except through `IndexMask::complement`
calls. I intend to use it when I get to refactoring the field evaluator for
geometry nodes to optimize the evaluation of selections.

Pull Request: https://projects.blender.org/blender/blender/pulls/117805
2024-03-17 09:52:32 +01:00
Campbell Barton
9796805bb8 Cleanup: sort CMake source files 2024-03-07 13:29:09 +11:00
Hans Goudey
139607dd26 Cleanup: Move BLI_bitmap_draw_2d.h to C++ 2024-03-05 10:28:17 -05:00
Hans Goudey
164eb3c25b Cleanup: Move lasso utility files to C++ 2024-03-05 10:23:11 -05:00
Jacques Lucke
fe2a47b5a7 BLI: add chunked list data structure that uses linear allocator
This adds a new special purpose container data structure that can be
used to gather many elements into many (potentially small) lists efficiently.

I originally worked on this data structure because I might want to use it
in #118772. However, also it's useful in the geometry nodes logger already.
I'm measuring a 10-20% speed improvement in my many-math-nodes file
when I enable logging for all sockets (not just the ones that are currently visible).

Pull Request: https://projects.blender.org/blender/blender/pulls/118774
2024-02-28 22:22:21 +01:00
Jacques Lucke
1e20f06c21 BLI: add utility to simplify creating proper random access iterator
The difficulty of implementing this iterator is that it requires lots of operator
overloads which are usually very simple to implement, but result in a lot of code.
The goal of this patch is to abstract the common parts so that it becomes easier
to implement random accessor iterators. Many algorithms can work more
efficiently with random access iterators than with other iterator types.

Also see https://en.cppreference.com/w/cpp/iterator/random_access_iterator

Pull Request: https://projects.blender.org/blender/blender/pulls/118113
2024-02-17 20:59:45 +01:00
Campbell Barton
7747b8c944 Fix convexhull_2d_test for macOS & re-enable the test
Use EXPECT_NEAR instead of EXPECT_EQ to account for a differences in
atan2 implementation on macOS, more generally relying on exact
float comparison for tests is error prone.
2024-02-13 14:07:26 +11:00
Hans Goudey
1394907474 Cleanup: Move uvproject.c to C++ 2024-02-12 20:43:24 -05:00
Campbell Barton
fb81bbaa60 Tests: disable BLI_convexhull_2d_test which fails on macOS 2024-02-13 00:34:44 +11:00
Campbell Barton
b91918564d Tests: add tests for convexhull_2d
Move BLI_convexhull_aabb_fit_points_2d to a public function to be able
to compare compare fitting one convex hull with a simple reference
method.

One test is disabled as it exposes an error in convex hull calculation
which needs further investigation.
2024-02-12 20:17:19 +11:00