On a Ryzen 3700x, this ended up 2.5x faster than before. More
benchmarking details are included in the differential revision.
For smaller grids, all this should do is increase the
code size a bit, and add a few more if statements.
Differential Revision: https://developer.blender.org/D13617