Push a task to `TaskPool` when more than `MIN_PRIMS_PER_THREAD` primitives are to be processed. The nodes are rearranged in a depth-first order when copied to the device.
Tested with the scene in #105550 on an Apple M1 Ultra (20 cores), about 11x speedup.
Pull Request: https://projects.blender.org/blender/blender/pulls/105862