Detected when testing mr_elephant on an Intel HD520. When copying the velocity buffer using the copy shader, the number of scheduled workgroups could be larger than supported by the device. This PR fixes this by copying multiple vertices per thread when the work size cannot cover all the pixels. Pull Request: https://projects.blender.org/blender/blender/pulls/120915