I observed a 4-5x performance improvement (from 50ms to 12ms) with five million points, though obviously the change depends on the hardware. In the future we may want to disable the parallelization in `parallel_invoke` when there is a small amount of points. Differential Revision: https://developer.blender.org/D14590