Updating the node bounds just after deforming the vertices in the
node is faster because the position data is still fresh in CPU caches.
Updating it later on means all the other nodes have been processed
in the meantime which will evict that position data from the caches.
This results in a 1.11x improvement in the brush benchmark timing,
from 0.495s to 0.438s on a Ryzen 7950x (best of 5 runs).
As part of the change, the update tagging has completely moved
to each brush implementation. This continues the process of
making each brush more independent.
Part of #118145.
Pull Request: https://projects.blender.org/blender/blender/pulls/127536