The idea is simply to pre-compute fitting and parameterization
in the bssrdf_setup() function and re-use the values in both
sample() and eval().
The only trick is where to store the pre-calculated values and
the answer is inside of ShaderClosure->custom{1,2,3}. There's
no memory bump here because we now simply re-use padding fields
for the pre-calculated values. Similar trick we can do for other
BSDFs.
Seems to give nice speedup up to 7% here on my desktop with
Core i7 CPU, SSE4.1 kernel.