25 Commits

Author SHA1 Message Date
Brecht Van Lommel
841ae6e8ab Fix part of #131933: Crash with playback of deforming subdivision surface
The `ForeachContext` in `deform_coarse_vertices` does not use TLS but still has
a `func_free` callback set. Change the task API to allow this.

Pull Request: https://projects.blender.org/blender/blender/pulls/132498
2025-01-02 12:21:56 +01:00
Campbell Barton
b93ddf30e9 Unbreak build WITH_TBB=OFF 2024-04-30 12:12:02 +10:00
Jacques Lucke
8d13a9608b BLI: generalize task size hints for parallel_for
This integrates the functionality for `parallel_for_weighted` from 9a3ceb79de
into `parallel_for`. This reduces the number of entry points to the threading
API and also makes it easier to build higher level threading primitives. For
example, `IndexMask.foreach_*` may use `parallel_for` if a `GrainSize` is
provided, but can't use `parallel_for_weighted` easily without duplicating a
fair amount of code.

The default behavior of `parallel_for` does not change. However, now one can
optionally pass in `TaskSizeHints` as the last parameter. This can be used to
specify the size of individual tasks relative to each other and relative to the
grain size. This helps scheduling more equally sized tasks which generally
improves performance because threads are used more effectively.

One generally does not construct `TaskSizeHints` manually, but calls either
`threading::individual_task_sizes` or `threading::accumulated_task_sizes`. Both
allow specifying individual task sizes, but the latter should be used when the
combined size of consecutive tasks can be computed in O(1) time. This allows
splitting up the work more efficiently. It can often be used in conjunction with
`OffsetIndices`.

Pull Request: https://projects.blender.org/blender/blender/pulls/121127
2024-04-29 23:55:22 +02:00
Campbell Barton
57dd9c21d3 Cleanup: spelling in comments 2024-03-21 10:02:53 +11:00
Jacques Lucke
b99c1abc3a BLI: speedup memory bandwidth bound tasks by reducing threading
This improves performance by **reducing** the amounts of threads used for tasks
which require a high memory bandwidth.

This works because the underlying hardware has a certain maximum memory
bandwidth. If that is used up by a few threads already, any additional threads
wanting to use a lot of memory will just cause more contention which actually
slows things down. By reducing the number of threads that can perform certain
tasks, the remaining threads are also not locked up doing work that they can't
do efficiently. It's best if there is enough scheduled work so that these tasks
can do more compute intensive tasks instead.

To use this new functionality, one has to put the parallel code in question into
a `threading::memory_bandwidth_bound_task(...)` block. Additionally, one also
has to provide a (very) rough approximation for how many bytes are accessed. If
the number is low, the number of threads shouldn't be reduced because it's
likely that all touched memory can be in L3 cache which generally has a much
higher bandwidth than main memory.

The exact number of threads that are allowed to do bandwidth bound tasks at the
same time is generally highly context and hardware dependent. It's also not
really possible to measure reliably because it depends on so many static and
dynamic factors. The thread count is now hardcoded to 8. It seems that this many
threads are easily capable of maxing out the bandwidth capacity.

With this technique I can measure surprisingly good performance improvements:
* Generating a 3000x3000 grid: 133ms -> 103ms.
* Generating a mesh line with 100'000'000 vertices: 212ms -> 189ms.
* Realize mesh instances resulting in ~27'000'000 vertices: 460ms -> 305ms.

In all of these cases, only 8 instead of 24 threads are used. The remaining
threads are idle in these cases, but they could do other work if available.

Pull Request: https://projects.blender.org/blender/blender/pulls/118939
2024-03-19 18:23:56 +01:00
Jacques Lucke
9a3ceb79de BLI: add weighted parallel for function
The standard `threading::parallel_for` function tries to split the range into
uniformly sized subranges. This is great if each element takes approximately
the same amount of time to compute.

However, there are also situations where the time required to do the work for
a single index differs significantly between different indices. In such a case,
it's better to split the tasks into segments while taking the size of each task into
account.

This patch implements `threading::parallel_for_weighted` which allows passing
in an additional callback that returns the size of each task.

Pull Request: https://projects.blender.org/blender/blender/pulls/118348
2024-02-25 15:01:05 +01:00
Campbell Barton
de18b629f0 Cleanup: unused includes in source/blender/blenlib
Remove 30 includes.
2024-02-13 11:07:14 +11:00
Campbell Barton
611930e5a8 Cleanup: use std::min/max instead of MIN2/MAX2 macros 2023-11-07 16:33:19 +11:00
Campbell Barton
5fbcb4c27e Cleanup: remove spaces from commented arguments
Also use local enums for `MA_BM_*` in versioning code.
2023-09-22 12:21:18 +10:00
Campbell Barton
e955c94ed3 License Headers: Set copyright to "Blender Authors", add AUTHORS
Listing the "Blender Foundation" as copyright holder implied the Blender
Foundation holds copyright to files which may include work from many
developers.

While keeping copyright on headers makes sense for isolated libraries,
Blender's own code may be refactored or moved between files in a way
that makes the per file copyright holders less meaningful.

Copyright references to the "Blender Foundation" have been replaced with
"Blender Authors", with the exception of `./extern/` since these this
contains libraries which are more isolated, any changed to license
headers there can be handled on a case-by-case basis.

Some directories in `./intern/` have also been excluded:

- `./intern/cycles/` it's own `AUTHORS` file is planned.
- `./intern/opensubdiv/`.

An "AUTHORS" file has been added, using the chromium projects authors
file as a template.

Design task: #110784

Ref !110783.
2023-08-16 00:20:26 +10:00
Sergey Sharybin
c1bc70b711 Cleanup: Add a copyright notice to files and use SPDX format
A lot of files were missing copyright field in the header and
the Blender Foundation contributed to them in a sense of bug
fixing and general maintenance.

This change makes it explicit that those files are at least
partially copyrighted by the Blender Foundation.

Note that this does not make it so the Blender Foundation is
the only holder of the copyright in those files, and developers
who do not have a signed contract with the foundation still
hold the copyright as well.

Another aspect of this change is using SPDX format for the
header. We already used it for the license specification,
and now we state it for the copyright as well, following the
FAQ:

    https://reuse.software/faq/
2023-05-31 16:19:06 +02:00
Jacques Lucke
f6d824bca6 BLI: move tbb part of parallel_for to implementation file
Previously, `tbb::parallel_for` was instantiated every time `threading::parallel_for`
is used. However, when actual parallelism is used, the overhead of a function
call is negilible. Therefor it is possible to move that part out of the header
without causing noticable performance regressions.

This reduces the size of the Blender binary from 308.2 to 303.5 MB, which is
a reduction of about 1.5%.
2023-05-21 13:31:32 +02:00
Hans Goudey
97746129d5 Cleanup: replace UNUSED macro with commented args in C++ code
This is the conventional way of dealing with unused arguments in C++,
since it works on all compilers.

Regex find and replace: `UNUSED\((\w+)\)` -> `/*$1*/`
2022-10-03 17:38:16 -05:00
Jacques Lucke
5c81d3bd46 Geometry Nodes: improve evaluator with lazy threading
In large node setup the threading overhead was sometimes very significant.
That's especially true when most nodes do very little work.

This commit improves the scheduling by not using multi-threading in many
cases unless it's likely that it will be worth it. For more details see the comments
in `BLI_lazy_threading.hh`.

Differential Revision: https://developer.blender.org/D15976
2022-09-20 11:08:05 +02:00
Campbell Barton
c434782e3a File headers: SPDX License migration
Use a shorter/simpler license convention, stops the header taking so
much space.

Follow the SPDX license specification: https://spdx.org/licenses

- C/C++/objc/objc++
- Python
- Shell Scripts
- CMake, GNUmakefile

While most of the source tree has been included

- `./extern/` was left out.
- `./intern/cycles` & `./intern/atomic` are also excluded because they
  use different header conventions.

doc/license/SPDX-license-identifiers.txt has been added to list SPDX all
used identifiers.

See P2788 for the script that automated these edits.

Reviewed By: brecht, mont29, sergey

Ref D14069
2022-02-11 09:14:36 +11:00
Campbell Barton
8e8a6b80cf Cleanup: replace BLI_assert(!"text") with BLI_assert_msg(0, "text")
This shows the text as part of the assertion message.
2021-07-15 18:29:01 +10:00
Brecht Van Lommel
fcc844f8fb BLI: use explicit task isolation, no longer part of parallel operations
After looking into task isolation issues with Sergey, we couldn't find the
reason behind the deadlocks that we are getting in T87938 and a Sprite Fright
file involving motion blur renders.

There is no apparent place where we adding or waiting on tasks in a task group
from different isolation regions, which is what is known to cause problems. Yet
it still hangs. Either we do not understand some limitation of TBB isolation,
or there is a bug in TBB, but we could not figure it out.

Instead the idea is to use isolation only where we know we need it: when
holding a mutex lock and then doing some multithreaded operation within that
locked region. Three places where we do this now:
* Generated images
* Cached BVH tree building
* OpenVDB lazy grid loading

Compared to the more automatic approach previously used, there is the downside
that it is easy to miss places where we need isolation. Yet doing it more
automatically is also causing unexpected issue and bugs that we found no
solution for, so this seems better.

Patch implemented by Sergey and me.

Differential Revision: https://developer.blender.org/D11603
2021-06-15 17:28:44 +02:00
Brecht Van Lommel
677e63d518 TBB: fix deprecation warnings with newer TBB versions
* USD and OpenVDB headers use deprecated TBB headers, suppress all deprecation
  warnings there since we have no control over them.
* For our own TBB includes, use the individual headers rather than the tbb.h that
  includes everything to avoid warnings, rather than suppressing all.

This is in anticipation of the TBB 2020 upgrade in D10359. Ref D10361.
2021-02-10 19:32:24 +01:00
Sybren A. Stüvel
958df2ed1b Cleanup: Clang-Tidy, modernize-deprecated-headers
No functional changes.
2020-12-04 11:28:09 +01:00
Sybren A. Stüvel
16732def37 Cleanup: Clang-Tidy modernize-use-nullptr
Replace `NULL` with `nullptr` in C++ code.

No functional changes.
2020-11-06 18:08:25 +01:00
Jacques Lucke
4a5389816b Clang-Tidy: enable readability-named-parameter 2020-07-03 17:07:13 +02:00
Brecht Van Lommel
183ba284f2 Cleanup: make guarded memory allocation always thread safe
Previously this would be enabled when threads were used, but threads are now
basically always in use so there is no point. Further, this is only needed for
guarded allocation with --debug-memory which is not performance critical.
2020-05-20 01:03:05 +02:00
Brecht Van Lommel
33fc42bd65 Merge branch 'blender-v2.83-release' 2020-05-20 00:46:15 +02:00
Jeroen Bakker
08ac4d3d71 Fix T76553: Blender Freezes When Playing Back Animation
In some cases blender could freeze. When threads are blocked (waiting for other tasks completion) the scheduler can let the thread perform a different task. If this task wants a write-lock for something that was read-locked in the stack a dead lock will happen.

For task pools every task is isolated. For range tasks the inner loop will be isolated. The implementation is limited as isolation in TBB uses functors which are tricky to add to a C API. We decided to start with a simple and adapt were we need to.

During testing we came to this setup as it was reliable (we weren't able to let it freeze or crash) and didn't had noticeable performance impact.

Reviewed By: Brecht van Lommel

Differential Revision: https://developer.blender.org/D7688
2020-05-14 13:54:16 +02:00
Brecht Van Lommel
d8a3f3595a Task: Use TBB as Task Scheduler
This patch enables TBB as the default task scheduler. TBB stands for Threading Building Blocks and is developed by Intel. The library contains several threading patters. This patch maps blenders BLI_task_* function to their counterpart. After this patch we can add more patterns. A promising one is TBB:graph that can be used for depsgraph, draw manager and compositor.

Performance changes depends on the actual hardware. It was tested on different hardwares from laptops to workstations and we didn't detected any downgrade of the performance.
* Linux Xeon E5-2699 v4 got FPS boost from 12 to 17 using Spring's 04_010_A.anim.blend.
* AMD Ryzen Threadripper 2990WX 32-Core Animation playback goes from 9.5-10.5 FPS to 13.0-14.0 FPS on Agent 327 , 10_03_B.anim.blend.

Reviewed By: brecht, sergey

Differential Revision: https://developer.blender.org/D7475
2020-04-30 08:09:21 +02:00