Majority of math operations on VecBase<> were implemented by calling into an
indexing operator, sometimes coupled with unroll<Size> template.
When compiler optimizations are off (e.g. Debug build), or when asserts are on
(e.g. usual "developer" setup), this resulted in codegen that is very
sub-optimal. Especially if these vector types are used a lot, e.g. when
scaling down a screenshot for saving as a thumbnail into the blend file.
Address that by explicit code paths for 4,3,2 dimensional vectors, that
avoids both the unroll<> template and indexing operator. To avoid repeated long
typo-prone code, do that with C preprocessor :( -- however all of the
preprocessor innards are in a separate file BLI_math_vector_unroll.hh so they
do not get into the way much.
Scaling down a screenshot to the blend file thumbnail, while saving the blend
file, on my machine: (4K screen resolution, Ryzen 5950X, VS2022 build), which
involves two calls to IMB_scale which uses float4 for pixel operations:
- Release with asserts off (what ships to users): no change at 9.4ms
- Release with asserts on ("developer" setup): 38.1ms -> 9.4ms
- Debug: 226ms -> 64ms
- Debug w/ ASAN: 314ms -> 78ms
Pull Request: https://projects.blender.org/blender/blender/pulls/127577