This is because the warm_shader_specialization was not called for the actually used specialization since the samples_len would be updated right before accumulation. Fixed by calling update_sample_table before the warm shader call. Also avoid default filter request 4 sample specialization. This avoid all stall altogether.