This optimizes the move-constructor for `blender::Vector` when all of the
following are true:
* The source and destination vector have exactly the same type.
* The stored type is trivial.
* The inline buffer is `<= 32` bytes large (this value is a heuristic that could
be changed).
The basic idea of the optimization is that under these circumstances one can
just copy the entire inline-buffer over instead of only copying it partially
based on the vector size. While that can mean that more bytes have to be copied,
the machine code that does the copying can be more efficient due to less
branching and the hardcoded size.
The performance impact is quite measurable. Note that the speedup depends on how
many elements are in vector and thus how many elements of the inline buffer are
used. The following table shows the move construction performance of a
`Vector<void *, 4>`. Starting at 5 elements, the performance doesn't change much
anymore, because the inline buffer is just ignored.
| Elements | Old | New |
|----------|------|------|
| 0 | 20.3 | 14.6 |
| 1 | 22.7 | 21.5 |
| 2 | 36.4 | 21.6 |
| 3 | 36.4 | 21.5 |
| 4 | 36.5 | 21.6 |
| 5 | 21.4 | 21.1 |
| 6 | 21.3 | 21.1 |
| 7 | 21.4 | 21.1 |
| 8 | 21.5 | 21.0 |
| 9 | 21.4 | 20.9 |
| 10 | 21.3 | 20.9 |
The binary size stays effectively unchanged (< 2kb change).
Pull Request: https://projects.blender.org/blender/blender/pulls/131841