Both were largely or completely single threaded.
They are used in various places, but testing their usage in VSE
compositor modifier branch (!139634), applying a default "do nothing"
compositor modifier on a 1080p image (on Ryzen 5950X):
51.4ms -> 12.2ms
Details about IMB_byte_from_float:
- No longer allocate a full new float buffer, instead do all work in
a local small (32KB size, half of typical L1 cache) job-local buffer.
- Previous code was doing un-premultiply + OCIO + premultiply
+ un-premultiply again. That is pointless; just do
un-premultiply once.
Details about IMB_float_from_byte / IMB_float_from_byte_ex:
- Remove incorrect code around"allocate float buffer outside of image
buffer" since it was not actually true to begin with.
- Inside threaded part, do color space conversion and premultiply at
once per-scanline, so that data stays in CPU caches more.
Pull Request: https://projects.blender.org/blender/blender/pulls/145716