Make Subsampling 3x3 filter twice faster (on 4K UHD resolution,
Windows/VS2022/Ryzen5950X: 52.7ms -> 28.3ms), by reformulating how it works:
Conceptually Subsampling filter is a box filter: it sums up N source image
pixels, computes their average and outputs the result. Critical thing is,
that should be done in premultiplied space so that colors from fully or
mostly transparent regions do not "override" opaque colors.
Previously, when operating on byte images, the code achieved this by always
working on byte values, doing "progressively smaller" lerp into byte color
result, taking care of premultiplication and again storing the "straight"
alpha for each sample being processed. This meant that for each sample, there
are 3 divisions involved! This also led to some precision loss, since for all
9 samples all the intermediate results would only be stored at byte precision.
Reformulate that by simply accumulating the premultiplied color as a float.
This gets rid of all divisions, except the last step when said float needs to
be written back into a byte color.
The unit test results have a tiny difference, since now it is arguably better
(as per above, previously it was having some precision loss).
Pull Request: https://projects.blender.org/blender/blender/pulls/117125