The maximum particles per task of 256 was outdated and lead to too much thread contention. Instead define a low fixed number of tasks per thread. On a i7-7700HQ, creating 4 million particles went down from 31s to 4s. Thanks to Oscar Abad, Sav Martin, Zebus3d, Sebastián Barschkis and Martin Felke for testing and advice. Differential Revision: https://developer.blender.org/D4910