Wipe effect was completely single threaded. So multi-thread that.
Additionally, simplify math done in Clock wipe code; asin() + hypot()
+ some checks is just atan2() really.
Applying Wipe to 4K UHD sequencer output, on Windows Ryzen 5950X:
- Single/Double: 99.1 -> 9.3 ms (10.6x faster)
- Iris: 153.3 -> 12.3 ms (12.4x faster)
- Clock: 301.4 -> 14.5 ms (20.8x faster)
The same on Mac M1 Max:
- Single: 74.5 -> 13.4 ms (5.6x faster)
- Iris: 84.2 -> 14.3 ms (5.9x faster)
- Clock: 185.3 -> 18.8 ms (9.8x faster)
Pull Request: https://projects.blender.org/blender/blender/pulls/115837