The performance of the sorted_paths_array kernel on B570 is problematic. Relying on local sorting+partitioning instead gives a 25% overall rendering speedup and no regression in shade_surface when rendering Agent 327 Barbershop scene. On Arc A770, it still gives a 2% speedup when rendering Barbershop. Pull Request: https://projects.blender.org/blender/blender/pulls/140308