Its unlikely you want to do short -> int, int -> float etc, conversion during swapping (if its needed we could have a non type checking macro).
Double that the optimized assembler outbut using SWAP() remains unchanged from before.
This exposed quite a few places where redundant type conversion was going on.
Also remove curve.c's swapdata() and replace its use with swap_v3_v3()