Turns out, using the wrong output buffer (global one, instead of per-thread one) puts the data into the wrong buffer! Fixes T102672
Turns out, using the wrong output buffer (global one, instead of per-thread one) puts the data into the wrong buffer! Fixes T102672