After doing regular movie frame decoding, there's a "postprocess" step for
each incoming frame, that does deinterlacing if needed, then YUV->RGB
conversion, then vertical image flip and additional interlace filtering if
needed. While this postprocess step is not the "heavy" part of movie
playback, it still takes 2-3ms per each 1080p resolution input frame that
is being played.
This PR does two things:
- Similar to #116008, uses multi-threaded `sws_scale` to do YUV->RGB
conversion.
- Reintroduces "do vertical flip while converting to RGB", where possible.
That was removed in 2ed73fc97e due to issues on arm64 platform, and
theory that negative strides passed to sws_scale is not an officially
supported usage.
My take on the last point: negative strides to sws_scale is a fine and
supported usage, just ffmpeg had a bug specifically on arm64 where they
were accidentally not respected. They fixed that for ffmpeg 6.0, and
backported it to all versions back to 3.4.13 -- you would not backport
something to 10 releases unless that was an actual bug fix!
I have tested the glitch_480p.mp4 that was originally attached to the
bug report #94237 back then, and it works fine both on x64 (Windows)
and arm64 (Mac).
Timings, ffmpeg_postprocess cost for a single 1920x1080 resolution movie
strip inside VSE:
- Windows/VS2022 Ryzen 5950X: 3.04ms -> 1.18ms
- Mac/clang15 M1 Max: 1.10ms -> 0.71ms
Pull Request: https://projects.blender.org/blender/blender/pulls/116309