Commit Graph

5169 Commits

Author SHA1 Message Date
Jacques Lucke
b99c1abc3a BLI: speedup memory bandwidth bound tasks by reducing threading
This improves performance by **reducing** the amounts of threads used for tasks
which require a high memory bandwidth.

This works because the underlying hardware has a certain maximum memory
bandwidth. If that is used up by a few threads already, any additional threads
wanting to use a lot of memory will just cause more contention which actually
slows things down. By reducing the number of threads that can perform certain
tasks, the remaining threads are also not locked up doing work that they can't
do efficiently. It's best if there is enough scheduled work so that these tasks
can do more compute intensive tasks instead.

To use this new functionality, one has to put the parallel code in question into
a `threading::memory_bandwidth_bound_task(...)` block. Additionally, one also
has to provide a (very) rough approximation for how many bytes are accessed. If
the number is low, the number of threads shouldn't be reduced because it's
likely that all touched memory can be in L3 cache which generally has a much
higher bandwidth than main memory.

The exact number of threads that are allowed to do bandwidth bound tasks at the
same time is generally highly context and hardware dependent. It's also not
really possible to measure reliably because it depends on so many static and
dynamic factors. The thread count is now hardcoded to 8. It seems that this many
threads are easily capable of maxing out the bandwidth capacity.

With this technique I can measure surprisingly good performance improvements:
* Generating a 3000x3000 grid: 133ms -> 103ms.
* Generating a mesh line with 100'000'000 vertices: 212ms -> 189ms.
* Realize mesh instances resulting in ~27'000'000 vertices: 460ms -> 305ms.

In all of these cases, only 8 instead of 24 threads are used. The remaining
threads are idle in these cases, but they could do other work if available.

Pull Request: https://projects.blender.org/blender/blender/pulls/118939
2024-03-19 18:23:56 +01:00
Campbell Barton
38dc888d7f Cleanup: use ELEM macro, remove redundant "struct" 2024-03-19 14:17:47 +11:00
Jacques Lucke
ee1fa8e1ca BLI: support set operations on index masks
The `IndexMask` data structure was designed to allow us to implement set
operations like `union`, `intersection` and `difference` efficiently
(2cfcb8b0b8). This patch adds an evaluator for
arbitrary expressions involving the mentioned operations. The evaluator makes
use of the design of the `IndexMask` data structure to be quite efficient.

In some common cases, the evaluator runs in constant time. So it's very fast
even if the mask contains many millions of indices. If possible the evaluator
works on entire segments at once instead of looking at the individual indices.
This results in a very low constant factor even if the evaluation time is
linear. If the evaluator has to look at the individual indices to be able to
perform the operation, it can make use of multi-threading.

The evaluation consists of the following steps:
1. A coarse evaluation that looks at entire segments at once.
2. All segments that couldn't be fully evaluated by the coarse evaluation are
   evaluated exactly by looking at the actual indices. There are two evaluators
   for this case. One that is based on `std::set_union` etc. The other one first
   converts the index masks to bit spans, then does bit operations to evaluate
   the expression, and then converts the bits back into indices. Depending on
   the expression, one or the other can be more efficient.
3. Construct an index mask from the evaluated segments.

Showing the performance of the evaluator is kind of difficult because it highly
depends on the input data. Comparing the performance to something that does not
short-circuit when there are full ranges is meaningless, because one can
construct an example where the new evaluator is arbitrarily faster. I'm still
working on a case where performance can be compared to e.g. using
`std::set_union`. This comparison is only fair when the input data when
constructing a case where the new evaluator can't short-circuit.

One of the main remaining bottlenecks are the calls to `slice_content` on large
index masks. I think the impact of those can still be reduced.

We are not using this evaluator much yet, except through `IndexMask::complement`
calls. I intend to use it when I get to refactoring the field evaluator for
geometry nodes to optimize the evaluation of selections.

Pull Request: https://projects.blender.org/blender/blender/pulls/117805
2024-03-17 09:52:32 +01:00
Hans Goudey
b5082f6640 Refactor: Simplify BLI_serialize.hh for asset indexer
- Remove the unnecessary `ContainerValue` from the class hierarchy
- Construct `StringValue` with a `std::string` by value to avoid copies
- Remove some indirection by using type names directly instead of aliases
- Use utility methods to lookup/append specific data types for arrays/dicts
- Simplify conversion from unique_ptr to shared_ptr
- Avoid use of `new` and `delete`
- Avoid creating maps of all elements in vector for a single lookup
2024-03-13 14:52:57 -04:00
Campbell Barton
e33f5e36ac Cleanup: spacing around C-style comment blocks 2024-03-09 23:40:57 +11:00
Omar Emara
a444a5eeba Fix: Byte interpolation with clamped boundary returns zero
The byte BLI image interpolation function with clamped boundary returns
zero for out of bound pixels. This is the same as #119164, but for byte
interpolation.

Pull Request: https://projects.blender.org/blender/blender/pulls/119173
2024-03-08 07:50:01 +01:00
Campbell Barton
f3e0e39df5 Cleanup: use const pointers where camera data isn't modified 2024-03-08 17:15:08 +11:00
Hans Goudey
744f3b2823 Cleanup: Grammar in comments: Fix uses of "own"
"Own" (the adjective) cannot be used on its own. It should be combined
with something like "its own", "our own",  "her own", or "the object's own".
It also isn't used separately to mean something like "separate".

Also, "its own" is correct instead of "it's own" which is a misues of the verb.
2024-03-07 16:23:35 -05:00
Omar Emara
5ab0cc8e74 Fix: Interpolation with clamped boundary returns zero
The BLI image interpolation function with clamped boundary returns zero
for out of bound pixels. That's because the neighbour pixel wrapping
condition disregarded the border template argument. To fix this, only
handle that condition if in border mode.

Pull Request: https://projects.blender.org/blender/blender/pulls/119164
2024-03-07 15:34:42 +01:00
Anthony Roberts
445fd42c61 Windows: Add ARM64 support
* Only works on machines with a Qualcomm Snapdragon 8cx Gen3 or above.
  Older generation devices are not and will not be supported due to
  some driver issues
* Requires VS2022 for building.
* Uses new MSVC preprocessor for sse2neon compatibility.
* SIMD is not enabled, waiting on conversion of blenlib to C++.

Ref #119126

Pull Request: https://projects.blender.org/blender/blender/pulls/117036
2024-03-06 16:14:34 +01:00
Campbell Barton
d686699316 Cleanup: various non-functional C++ changes 2024-03-06 14:47:29 +11:00
Hans Goudey
5993c517bd Cleanup: Use C++ Array, Span, int2 for lasso coords 2024-03-05 11:29:04 -05:00
Hans Goudey
139607dd26 Cleanup: Move BLI_bitmap_draw_2d.h to C++ 2024-03-05 10:28:17 -05:00
Hans Goudey
164eb3c25b Cleanup: Move lasso utility files to C++ 2024-03-05 10:23:11 -05:00
Campbell Barton
c789a938d9 Cleanup: remove temporary directory creation 2024-03-05 09:54:49 +11:00
Campbell Barton
5af4987456 Merge branch 'blender-v4.1-release' 2024-03-04 12:21:50 +11:00
Campbell Barton
51126fab33 BLI_tempfile: ensure the temporary directory is absolute
While unreported, there is nothing preventing CWD relative temporary
directories being used. Resolve asserts & errors if the CWD changes at
run-time.
2024-03-04 12:20:44 +11:00
Campbell Barton
1b514659ca Cleanup: minor changes to temp directory API
- Pass null instead of an empty string to BKE_tempdir_init
  because the string isn't meant to be used.
- Never pass null to BLI_temp_directory_path_copy_if_valid
  (the caller must check).
- Additional comments for which checks are performed & why
  from discussion about #95411.
2024-03-04 11:42:02 +11:00
casey bianco-davis
3d136d0d00 BLI: Add support for non-square matrix multiplication.
Adds support for multiplying non-square non-equal matrices.

Co-authored-by: Clément Foucault <foucault.clem@gmail.com>
Pull Request: https://projects.blender.org/blender/blender/pulls/115783
2024-03-03 16:26:04 +01:00
Campbell Barton
da2ac8ee92 Merge branch 'blender-v4.1-release' 2024-02-29 22:04:23 +11:00
Campbell Barton
c19cdc343f Fix assert with temporary directories beginning with "//"
- Skip leading forward slashes when setting the temp directory.
- Add a utility function to set the temporary directory
  which is used for the user preferences & environment variables.

This issue was raised by #95411 where "//" resolves to "/",
then asserts when passed to Blender's file-system functions.
However the crash referenced in this report looks to be caused
by Collada failing to write to the temporary directory which
can be handled separately.

Ref !118872
2024-02-29 22:01:44 +11:00
Hans Goudey
d338261c55 Cleanup: Pass Span by value
Also pass Span instead of `const Array &`
and use parantheses for BLI includes.
2024-02-27 23:09:54 -05:00
Iliya Katueshenock
849279b8f1 Cleanup: Collapsible brackets in macros
Fix of collapsible brackets in Notepad++.
2024-02-27 21:51:41 +01:00
Campbell Barton
5db2a842c0 Unbreak build with GLIBC pre 2.28
Also de-duplicate rename logic for Linux & other UNIX systems.
2024-02-26 10:15:54 +11:00
Jacques Lucke
9a3ceb79de BLI: add weighted parallel for function
The standard `threading::parallel_for` function tries to split the range into
uniformly sized subranges. This is great if each element takes approximately
the same amount of time to compute.

However, there are also situations where the time required to do the work for
a single index differs significantly between different indices. In such a case,
it's better to split the tasks into segments while taking the size of each task into
account.

This patch implements `threading::parallel_for_weighted` which allows passing
in an additional callback that returns the size of each task.

Pull Request: https://projects.blender.org/blender/blender/pulls/118348
2024-02-25 15:01:05 +01:00
Campbell Barton
91895bf806 Unbreak build with GLIBC pre 2.28
Also de-duplicate rename logic for Linux & other UNIX systems.
2024-02-25 22:56:22 +11:00
Sebastian Parborg
8aed44471e Merge branch 'blender-v4.1-release' 2024-02-22 14:28:04 +01:00
Sebastian Parborg
b4610f8fc0 Fix #116049, #117754: Renaming fails on linux with certain filesystems
Not all filesystems on linux supports the RENAME_NOREPLACE flag.
If we get a EINVAL return value, retry with a non atomic operation.

RENAME_NOREPLACE was introduced in 050d48edfc, so this is a regression
fix as well.

Pull Request: https://projects.blender.org/blender/blender/pulls/118571
2024-02-22 14:23:54 +01:00
Jacques Lucke
50709ca253 BLI: add named constructors for IndexRange
Unless you're very familiar with `IndexRange`, it's often hard to know what
e.g. `IndexRange(10, 15)` means. Without more context, one could think
that it means `10-14`, `10-15` or `10-24`. This patch adds named constructors
to `IndexRange` to make the behavior more obvious when writing and when
reading the code. With those one can use `IndexRange::from_begin_end(10, 15)`,
`IndexRange::from_begin_end_inclusive(10, 15)` or `IndexRange::from_begin_size(10, 15)`
respectively. While being a bit more verbose, the explicitness makes code easier to
understand and also allows abstracting away some common index computations.

The old unnamed constructor that takes a begin and size is not removed by this patch,
as that would make the patch significantly bigger. I think it's reasonable to generally
use the named constructors going forward and to change the existing usages of the
old constructor over time.

Pull Request: https://projects.blender.org/blender/blender/pulls/118606
2024-02-22 12:57:10 +01:00
Campbell Barton
d4aedd89d0 Cleanup: spelling in comments 2024-02-22 22:40:46 +11:00
Julian Eisel
99673edd85 Cleanup: Add method to get UUID as std::string
Avoids having to use the C-style `BLI_uuid_format()` function with
manual buffer management, and makes it easy to get a `std::string` from
a UUID.
2024-02-20 15:20:11 +01:00
Jacques Lucke
148cad93e3 BLI: simplify creating index masks from group ids
Pull Request: https://projects.blender.org/blender/blender/pulls/118498
2024-02-20 13:18:16 +01:00
Sybren A. Stüvel
1ee414feb0 Cleanup: avoid compiler warning when USE_BRUTE_FORCE_ASSERT is undefined
Avoid 'unused variable' compiler warning when `USE_BRUTE_FORCE_ASSERT` is
not defined, in release mode builds.

No functional changes.
2024-02-19 17:19:12 +01:00
Brecht Van Lommel
0f2064bc3b Revert changes from main commits that were merged into blender-v4.1-release
The last good commit was 4bf6a2e564.
2024-02-19 15:59:59 +01:00
Hans Goudey
81a63153d0 Despgraph: Rename "copy-on-write" to "copy-on-evaluation"
The depsgraph CoW mechanism is a bit of a misnomer. It creates an
evaluated copy for data-blocks regardless of whether the copy will
actually be written to. The point is to have physical separation between
original and evaluated data. This is in contrast to the commonly used
performance improvement of keeping a user count and copying data
implicitly when it needs to be changed. In Blender code we call this
"implicit sharing" instead. Importantly, the dependency graph has no
idea about the _actual_ CoW behavior in Blender.

Renaming this functionality in the despgraph removes some of the
confusion that comes up when talking about this, and will hopefully
make the depsgraph less confusing to understand initially too. Wording
like "the evaluated copy" (as opposed to the original data-block) has
also become common anyway.

Pull Request: https://projects.blender.org/blender/blender/pulls/118338
2024-02-19 15:54:08 +01:00
Campbell Barton
14b5912eee Cleanup: quiet C4551 warning for MSVC 2024-02-19 09:34:41 +11:00
Campbell Barton
5ae0b0c7f4 Cleanup: use the term "sincos" in convexhull_2d for clarity
The 2D vector calculated from edge vectors represents sin & cos which
wasn't obvious.
2024-02-16 14:26:51 +11:00
Campbell Barton
503d56e2c8 Cleanup: use const variables in convexhull_2d_sorted for clarity 2024-02-16 14:26:49 +11:00
Campbell Barton
5c87dfd269 Cleanup: use BLI_time_ prefix for time functions
Also use the term "now" instead of "check" for clarity.
2024-02-15 13:15:56 +11:00
Hans Goudey
61e61ce0e1 Cleanup: Use Span instead of Vector const reference
Span is preferrable since it's agnostic of the source container,
makes it clearer that there is no ownership, is 8 bytes smaller,
and can be passed by value.
2024-02-14 17:23:01 -05:00
Hans Goudey
1c0f374ec3 Object: Move transform matrices to runtime struct
The `object_to_world` and `world_to_object` matrices are set during
depsgraph evaluation, calculated from the object's animated location,
rotation, scale, parenting, and constraints. It's confusing and
unnecessary to store them with the original data in DNA.

This commit moves them to `ObjectRuntime` and moves the matrices to
use the C++ `float4x4` type, giving the potential for simplified code
using the C++ abstractions. The matrices are accessible with functions
on `Object` directly since they are used so commonly. Though for write
access, directly using the runtime struct is necessary.

The inverse `world_to_object` matrix is often calculated before it's
used, even though it's calculated as part of depsgraph evaluation.
Long term we might not want to store this in `ObjectRuntime` at all,
and just calculate it on demand. Or at least we should remove the
redundant calculations. That should be done separately though.

Pull Request: https://projects.blender.org/blender/blender/pulls/118210
2024-02-14 16:14:49 +01:00
Campbell Barton
aa6ab9caf9 Cleanup: various non-functional changes for C++ 2024-02-14 13:56:58 +11:00
Campbell Barton
1111dff0a6 Tests: improvements to BLI_convexhull_2d_test
The convex hull tests included a reference AABB-fitting function for
comparison which was used to validate the optimized implementation.
This wasn't great as it depended on matching exact return values and
didn't test the logic of AABB-fitting worked usefully.

Replace this with a more general test that creates random polygons with
known bounds, apply a random rotation & translation, then use
AABB-fitting to un-rotate the points, passing when the bounds are no
larger than the size of the generated input.

Details:

- Make BLI_convexhull_aabb_fit_hull_2d a static function again as it was
  only exposed for tests. Use BLI_convexhull_aabb_fit_points_2d instead.
- Remove brute force reference implementation from tests,
  moving this to an assertion within convexhull_2d
  (disabled by default since it's quite slow).
2024-02-14 13:42:14 +11:00
Campbell Barton
3f8cd44485 Cleanup: move BLI_strict_flags.h last, not that it should be kept last
Also add a note in the header why it should be kept last.
2024-02-14 13:40:31 +11:00
Germano Cavalcante
c9bd326255 Merge branch 'blender-v4.1-release' 2024-02-13 20:35:52 -03:00
Germano Cavalcante
c6e229d3e4 Fix #118221: Snap to Edge with Constraint Plane shifts out of plane
The intersection needs to be calculated with the plane passing through
the snap pivot.
2024-02-13 20:35:08 -03:00
Hans Goudey
9cf304160b BLI: Add missing overrides to some generic virtual array implementations
The lack of these functions in the "single trivial value" and "sliced
GVArray" implementations caused some code to call fack to the base
class functions. Those are much slower since they involve a virtual
function call per element. For example, this changed the runtime of
creating a new boolean attribute set to "true" on one million faces
from 3.4 ms to 0.35 ms.

Pull Request: https://projects.blender.org/blender/blender/pulls/118161
2024-02-13 19:59:58 +01:00
Germano Cavalcante
1dd163c2f7 Fix: build error with 'WITH_CXX_GUARDEDALLOC' 2024-02-13 10:59:56 -03:00
Jacques Lucke
cd0e41c73e BLI: improve printing of IndexMask
The new printed format is like this: `(Size: 503 | 0-499, 555, 699, 900)`.
2024-02-13 12:33:48 +01:00
Jacques Lucke
bce1edc2bd BLI: add IndexMask.shift method 2024-02-13 12:33:48 +01:00