Speedup the Color Balance VSE strip modifier, with two things:
- Generally, use a much lower overhead parallel_for, also with
lower grain size (32 image rows, instead of 64 that were used
before). This is what makes the "float" variant faster.
- For "byte" variant, create a precalculated lookup table instead
of doing all the math per-pixel. This was *almost* done in
existing code, except it was put into the code path that was
never-ever used. However, since this is all done on premultiplied
values, I'm using lookup table size of 1024 instead of 256, so
that semitransparent pixels get some more precision for
"in-between values". This LUT is what results in the main speedup
of "byte" variant.
Calculating Color Balance at 4K resolution, times in milliseconds:
- PC (Ryzen 5950X), PNG (byte): 22.2 -> 2.9 ms, EXR (float): 20.1 -> 15.2 ms
- Mac (M1 Max), PNG (byte): 28.9 -> 7.5 ms, EXR (float): 21.8 -> 8.5 ms
More timing details in PR.
Pull Request: https://projects.blender.org/blender/blender/pulls/127121