Fix #97685: very inaccurate mean computation in Attribute Statistic node

Numerical instability mainly comes from adding values together which have very different magnitude. There are algorithms to keep the error small like "Kahan Summation", however those are also slower because of additional code in the hot loop. This patch implements a simpler approach that is slightly less accurate, but still seems to solve the cases that people commonly run into while being simpler and faster. The approach is to simply compute a couple partial sums first, and to add those up in the end. The individual partial sums can also be computed in parallel. Care has to be taken to maintain determinism with floating point values. If accuracy is still not enough for some use cases, we can revisit this later and e.g. use doubles or a better summation algorithm. Pull Request: https://projects.blender.org/blender/blender/pulls/132759
2025-01-17 12:20:42 +01:00
parent 987003d456
commit aec72c0a9a
1 changed files with 19 additions and 1 deletions
--- a/source/blender/nodes/geometry/nodes/node_geo_attribute_statistic.cc
+++ b/source/blender/nodes/geometry/nodes/node_geo_attribute_statistic.cc
@@ -103,7 +103,25 @@ static void node_gather_link_searches(GatherLinkSearchOpParams &params)

 template<typename T> static T compute_sum(const Span<T> data)
 {
-  return std::accumulate(data.begin(), data.end(), T());
+  /* Explicitly splitting work into chunks for a couple of reasons:
+   * - Improve numerical stability. While there are even more stable algorithms (e.g. Kahan
+   *   summation), they also add more complexity to the hot code path. So far, this simple approach
+   *   seems to solve the common issues people run into.
+   * - Support computing the sum using multiple threads.
+   * - Ensure deterministic results even with floating point numbers.
+   */
+  constexpr int64_t chunk_size = 1024;
+  const int64_t chunks_num = divide_ceil_ul(data.size(), chunk_size);
+  Array<T> partial_sums(chunks_num);
+  threading::parallel_for(partial_sums.index_range(), 1, [&](const IndexRange range) {
+    for (const int64_t i : range) {
+      const int64_t start = i * chunk_size;
+      const Span<T> chunk = data.slice_safe(start, chunk_size);
+      const T partial_sum = std::accumulate(chunk.begin(), chunk.end(), T());
+      partial_sums[i] = partial_sum;
+    }
+  });
+  return std::accumulate(partial_sums.begin(), partial_sums.end(), T());
 }

 static float compute_variance(const Span<float> data, const float mean)