There is a large overhead when doing copies between a device and non-USM host memory. Using the prepare/release API avoids it, as presented in the optimization guide: https://www.intel.com/content/www/us/en/docs/oneapi/optimization-guide-gpu/2025-0/optimizing-data-transfers.html This currently translates to a 4-5% overall rendering speedups on my Arc B580 in most scenes. Pull Request: https://projects.blender.org/blender/blender/pulls/132859