Each time the face bounds are used after they're initially computed, we
recompute the center of the bounds. We only use the actual bounds to
calculate the bounds of each node to decide how it should be split.
This commit changes to store the bounds centers instead, and just use
the full bounds as a type for the parallel reduction.
In a test with a 16 million face grid on a Ryzen 7840U, I observed a
1.28x decrease in BVH build time, from 1072 to 836 ms.
I didn't apply a similar change to multires grids BVH building because
it's not clear the same bottleneck exists due to the lower ratio of
"primitives" (grids) to final subdivided vertices.