The maximum particles per task of 256 was outdated and lead to too much thread
contention. Instead define a low fixed number of tasks per thread.
On a i7-7700HQ, creating 4 million particles went down from 31s to 4s.
Thanks to Oscar Abad, Sav Martin, Zebus3d, Sebastián Barschkis and Martin Felke
for testing and advice.
Differential Revision: https://developer.blender.org/D4910