This is intended to be used in the new exact mesh boolean algorithm by @howardt. The new `BLI_fixed_width_int.hh` header provides types like `Int256` and `UInt256` which are like e.g. `uint64_t` but with higher precision. The code supports many different integer sizes. The following operations are supported: * Addition * Subtraction * Multiplication * Comparisons * Negation * Conversion to and from other number types * Conversion to and from string (based on `GMP`) Division is not implemented. It could be implemented, but it's more complex and is not required for the new mesh boolean algorithm. Some alternatives to having a custom implementation have been discussed in https://devtalk.blender.org/t/fixed-length-multiprecision-arithmetic/29189/. Generally, the implementation is fairly straight forward. The main complexity is the addition/multiplication algorithm which isn't too complicated. It's nice to have control over this part as it allows us to optimize the code more if necessary. Also, from what I understand, we might be able to benefit from some special cases like multiplying a large integer with a smaller one. I tried some different ways to optimize this already, but so far the normal compiler optimization turned out to work best. Not sure if the same is true on windows though, as it doesn't have native support for an `int128` which helps the compiler understand what I'm doing. Alternatives I tried so far are using intrinsics directly (mainly `_addcarry_u64` and similar), writing inline assembly manually and copying the assembly output from the compiler. I assume the assembly implementation didn't help for me because it prohibited other compiler optimizations. Pull Request: https://projects.blender.org/blender/blender/pulls/119528
27 lines
676 B
C++
27 lines
676 B
C++
/* SPDX-FileCopyrightText: 2024 Blender Authors
|
|
*
|
|
* SPDX-License-Identifier: GPL-2.0-or-later */
|
|
|
|
#pragma once
|
|
|
|
#include "BLI_utildefines.h"
|
|
|
|
namespace blender {
|
|
|
|
template<class Fn, size_t... I> void unroll_impl(Fn fn, std::index_sequence<I...> /*indices*/)
|
|
{
|
|
(fn(I), ...);
|
|
}
|
|
|
|
/**
|
|
* Variadic templates are used to unroll loops manually. This helps GCC avoid branching during math
|
|
* operations and makes the code generation more explicit and predictable. Unrolling should always
|
|
* be worth it because the vector size is expected to be small.
|
|
*/
|
|
template<int N, class Fn> void unroll(Fn fn)
|
|
{
|
|
unroll_impl(fn, std::make_index_sequence<N>());
|
|
}
|
|
|
|
} // namespace blender
|