Pushing multiple nodes at the same time helps to reduce the amount of time spent waiting for threads to unlock while they manipulate the nodes map, and equalizes the amount of work per thread, since we can iterate over just the nodes that need data stored. I observed a 2.6% speedup in the benchmark file from #118145 (0.59s to 0.57s).