Files
test/source/blender/blenlib/BLI_enumerable_thread_specific.hh
Jacques Lucke b7a1325c3c BLI: use blender::Mutex by default which wraps tbb::mutex
This patch adds a new `BLI_mutex.hh` header which adds `blender::Mutex` as alias
for either `tbb::mutex` or `std::mutex` depending on whether TBB is enabled.

Description copied from the patch:
```
/**
 * blender::Mutex should be used as the default mutex in Blender. It implements a subset of the API
 * of std::mutex but has overall better guaranteed properties. It can be used with RAII helpers
 * like std::lock_guard. However, it is not compatible with e.g. std::condition_variable. So one
 * still has to use std::mutex for that case.
 *
 * The mutex provided by TBB has these properties:
 * - It's as fast as a spin-lock in the non-contended case, i.e. when no other thread is trying to
 *   lock the mutex at the same time.
 * - In the contended case, it spins a couple of times but then blocks to avoid draining system
 *   resources by spinning for a long time.
 * - It's only 1 byte large, compared to e.g. 40 bytes when using the std::mutex of GCC. This makes
 *   it more feasible to have many smaller mutexes which can improve scalability of algorithms
 *   compared to using fewer larger mutexes. Also it just reduces "memory slop" across Blender.
 * - It is *not* a fair mutex, i.e. it's not guaranteed that a thread will ever be able to lock the
 *   mutex when there are always more than one threads that try to lock it. In the majority of
 *   cases, using a fair mutex just causes extra overhead without any benefit. std::mutex is not
 *   guaranteed to be fair either.
 */
 ```

The performance benchmark suggests that the impact is negilible in almost
all cases. The only benchmarks that show interesting behavior are the once
testing foreach zones in Geometry Nodes. These tests are explicitly testing
overhead, which I still have to reduce over time. So it's not unexpected that
changing the mutex has an impact there. What's interesting is that on macos the
performance improves a lot while on linux it gets worse. Since that overhead
should eventually be removed almost entirely, I don't really consider that
blocking.

Links:
* Documentation of different mutex flavors in TBB:
  https://www.intel.com/content/www/us/en/docs/onetbb/developer-guide-api-reference/2021-12/mutex-flavors.html
* Older implementation of a similar mutex by me:
  https://archive.blender.org/developer/differential/0016/0016711/index.html
* Interesting read regarding how a mutex can be this small:
  https://webkit.org/blog/6161/locking-in-webkit/

Pull Request: https://projects.blender.org/blender/blender/pulls/138370
2025-05-07 04:53:16 +02:00

128 lines
3.2 KiB
C++

/* SPDX-FileCopyrightText: 2023 Blender Authors
*
* SPDX-License-Identifier: GPL-2.0-or-later */
/** \file
* \ingroup bli
*/
#pragma once
#ifdef WITH_TBB
/* Quiet top level deprecation message, unrelated to API usage here. */
# if defined(WIN32) && !defined(NOMINMAX)
/* TBB includes Windows.h which will define min/max macros causing issues
* when we try to use std::min and std::max later on. */
# define NOMINMAX
# define TBB_MIN_MAX_CLEANUP
# endif
# include <tbb/enumerable_thread_specific.h>
# ifdef WIN32
/* We cannot keep this defined, since other parts of the code deal with this on their own, leading
* to multiple define warnings unless we un-define this, however we can only undefine this if we
* were the ones that made the definition earlier. */
# ifdef TBB_MIN_MAX_CLEANUP
# undef NOMINMAX
# endif
# endif
#else
# include <atomic>
# include <functional>
# include "BLI_map.hh"
# include "BLI_mutex.hh"
#endif
#include "BLI_utility_mixins.hh"
namespace blender::threading {
#ifndef WITH_TBB
namespace enumerable_thread_specific_utils {
inline std::atomic<int> next_id = 0;
inline thread_local int thread_id = next_id.fetch_add(1, std::memory_order_relaxed);
} // namespace enumerable_thread_specific_utils
#endif /* !WITH_TBB */
/**
* This is mainly a wrapper for `tbb::enumerable_thread_specific`. The wrapper is needed because we
* want to be able to build without tbb.
*
* More features of the tbb version can be wrapped when they are used.
*/
template<typename T> class EnumerableThreadSpecific : NonCopyable, NonMovable {
#ifdef WITH_TBB
private:
tbb::enumerable_thread_specific<T> values_;
public:
using iterator = typename tbb::enumerable_thread_specific<T>::iterator;
EnumerableThreadSpecific() = default;
template<typename F> EnumerableThreadSpecific(F initializer) : values_(std::move(initializer)) {}
T &local()
{
return values_.local();
}
iterator begin()
{
return values_.begin();
}
iterator end()
{
return values_.end();
}
#else /* WITH_TBB */
private:
Mutex mutex_;
/* Maps thread ids to their corresponding values. The values are not embedded in the map, so that
* their addresses do not change when the map grows. */
Map<int, std::reference_wrapper<T>> values_;
Vector<std::unique_ptr<T>> owned_values_;
std::function<void(void *)> initializer_;
public:
using iterator = typename Map<int, std::reference_wrapper<T>>::MutableValueIterator;
EnumerableThreadSpecific() : initializer_([](void *buffer) { new (buffer) T(); }) {}
template<typename F>
EnumerableThreadSpecific(F initializer)
: initializer_([=](void *buffer) { new (buffer) T(initializer()); })
{
}
T &local()
{
const int thread_id = enumerable_thread_specific_utils::thread_id;
std::lock_guard lock{mutex_};
return values_.lookup_or_add_cb(thread_id, [&]() {
T *value = (T *)::operator new(sizeof(T));
initializer_(value);
owned_values_.append(std::unique_ptr<T>{value});
return std::reference_wrapper<T>{*value};
});
}
iterator begin()
{
return values_.values().begin();
}
iterator end()
{
return values_.values().end();
}
#endif /* WITH_TBB */
};
} // namespace blender::threading