The problem was that textures were assigned to different slots on different draw calls, which caused shader specialization/patching by the driver. So the shader would be compiled over and over until all possible assignments were used.