Since my temporary buffer commit (about a month ago), the OpenCL device was zeroing the wrong buffer, leading to completely wrong filtered feature passes and therefore significantly lower-quality results than CPU and CUDA.