This patch implements the Fast Gaussian blur mode for the Realtime
Compositor. This is a faster but less accurate implementation of
Gaussian blur.
This is implemented as a recursive Gaussian blur algorithm based on the
general method outlined in the following paper:
Hale, Dave. "Recursive gaussian filters." CWP-546 (2006).
In particular, based on the table in Section 5 Conclusion, for very low
radius blur, we use a direct separable Gaussian convolution. For medium
blur radius, we use the fourth order IIR Deriche filter based on the
following paper:
Deriche, Rachid. Recursively implementating the Gaussian and its
derivatives. Diss. INRIA, 1993.
For high radius blur, we use the fourth order IIR Van Vliet filter based
on the following paper:
Van Vliet, Lucas J., Ian T. Young, and Piet W. Verbeek. "Recursive
Gaussian derivative filters." Proceedings. Fourteenth International
Conference on Pattern Recognition (Cat. No. 98EX170). Vol. 1. IEEE,
1998.
That's because direct convolution is faster and more accurate for very
low radius, while the Deriche filter is more accurate for medium blur
radius, while Van Vliet is more accurate for high blur radius. The
criteria suggested by the paper is a sigma value threshold of 3 and 32
for the Deriche and Van Vliet filters respectively, which we apply on
the larger of the two dimensions.
Both the Deriche and Van Vliet filters are numerically unstable for high
blur radius. So we decompose the Van Vliet filter into a parallel bank
of smaller second order filters based on the method of partial fractions
discussed in the book:
Oppenheim, Alan V. Discrete-time signal processing. Pearson Education
India, 1999.
We leave the Deriche filter as is since it is only used for low radii
anyways.
Compared to the CPU implementation, this implementation is more
accurate, but less numerically stable, since CPU uses doubles, which is
not feasible for the GPU.
The only change of behavior between CPU and this implementation is that
this implementation uses the same radius, so Fast Gaussian will match
normal Gaussian, while the CPU implementation has a radius that is 1.5x
the size of normal Gaussian. A patch to change the CPU behavior #121211.
Pull Request: https://projects.blender.org/blender/blender/pulls/120431