Restore lost performance from removing `reserve` in ff4d5b6f04
in a better way. First build offsets so we know where in the result the
source data should go, then copy the data in parallel.
Joining geometries containing 1 million instances each took 194 ms
before the change and only 15.6 ms afterwards.
Pull Request: https://projects.blender.org/blender/blender/pulls/113886