test2

Author	SHA1	Message	Date
Christoph Neuhauser	72f098248d	Cycles: Add Vulkan/oneAPI graphics interop This PR adds Vulkan/oneAPI graphics interop to Cycles. Just like for CUDA and HIP interop, persistent memory mapping is used, as there could potentially be some overhead of continuously mapping/unmapping buffers. Pull Request: https://projects.blender.org/blender/blender/pulls/144442	2025-10-06 18:16:56 +02:00
Brecht Van Lommel	2615cecf10	Refactor: Cycles: Align log levels with CLOG WORK -> DEBUG DEBUG, STATS -> TRACE Pull Request: https://projects.blender.org/blender/blender/pulls/144490	2025-08-18 20:22:44 +02:00
Brecht Van Lommel	73fe848e07	Fix: Cycles log levels conflict with macros on some platforms In particular DEBUG, but prefix all of them to be sure. Pull Request: https://projects.blender.org/blender/blender/pulls/141749	2025-07-10 19:44:14 +02:00
Brecht Van Lommel	fb4e3c8167	Refactor: Cycles: Remove distinction between severity and verbosity Only use LOG() and LOG_IS_ON() macros, no more VLOG_. Pull Request: https://projects.blender.org/blender/blender/pulls/140244	2025-07-09 20:59:24 +02:00
Xavier Hallade	2df163a648	Fix: Cycles low performance with scenes with many shaders on Arc B570 The performance of the sorted_paths_array kernel on B570 is problematic. Relying on local sorting+partitioning instead gives a 25% overall rendering speedup and no regression in shade_surface when rendering Agent 327 Barbershop scene. On Arc A770, it still gives a 2% speedup when rendering Barbershop. Pull Request: https://projects.blender.org/blender/blender/pulls/140308	2025-06-18 08:21:19 +02:00
Brecht Van Lommel	c87a269021	Fix #133953 : Cycles oneAPI texture randomly renders black * Do oneAPI copy optimization as part of host memory alloc and free, so it is properly released before host memory is freed. * Synchronize after loading texture info, like CUDA and HIP. https://projects.blender.org/blender/blender/pulls/134412	2025-02-13 19:58:56 +01:00
Brecht Van Lommel	cd3d3b2646	Refactor: Cycles: Delay load_texture_info() to enqueue Doing it immediately after moving textures to the host is less efficient, and interacts in confusing ways. Pull Request: https://projects.blender.org/blender/blender/pulls/132912	2025-01-29 14:12:06 +01:00
Brecht Van Lommel	9971648783	Refactor: Cycles: Replace new/delete by unique_ptr, in simple cases Pull Request: https://projects.blender.org/blender/blender/pulls/132361	2025-01-03 10:23:30 +01:00
Brecht Van Lommel	dd51c8660b	Refactor: Cycles: Add const keyword where possible, using clang-tidy Check was misc-const-correctness, combined with readability-isolate-declaration as suggested by the docs. Temporarily clang-format "QualifierAlignment: Left" was used to get consistency with the prevailing order of keywords. Pull Request: https://projects.blender.org/blender/blender/pulls/132361	2025-01-03 10:23:20 +01:00
Brecht Van Lommel	d0c2e68e5f	Refactor: Cycles: Automated clang-tidy fixups in Cycles * Use .empty() and .data() * Use nullptr instead of 0 * No else after return * Simple class member initialization * Add override for virtual methods * Include C++ instead of C headers * Remove some unused includes * Use default constructors * Always use braces * Consistent names in definition and declaration * Change typedef to using Pull Request: https://projects.blender.org/blender/blender/pulls/132361	2025-01-03 10:22:55 +01:00
Xavier Hallade	cee4ad4518	Refactor: Cycles: oneAPI: Simplify num_concurrent_states() Deduplicated code by reusing num_concurrent_busy_states().	2024-07-18 15:46:17 +02:00
Xavier Hallade	c8421a0007	Cycles: set num_sort_partition_elements to 65536 for simd16+ Intel GPUs Intel(R) Data Center GPU Max greatly benefits from this change since its bigger simd width leads to a greater execution divergence.	2024-07-18 15:15:00 +02:00
Xavier Hallade	4d4f8bbfe4	Cycles: set num_sort_partition_elements to 8192 for oneAPI The default value of 65536 wasn't optimal on Intel GPUs, switching to 8192 gives a 0 to 15% performance improvement depending on the scenes.	2024-01-31 17:25:34 +01:00
Nikita Sirgienko	abab47a805	Cycles: oneAPI: Refactoring of local size choice logic	2023-08-22 19:04:16 +02:00
Campbell Barton	c12994612b	License headers: use SPDX-FileCopyrightText in intern/cycles	2023-06-14 16:53:23 +10:00
Nikita Sirgienko	bafd82c9c1	Cycles: oneAPI: use local memory for faster shader sorting Co-authored-by: Stefan Werner <stefan.werner@intel.com> Pull Request: https://projects.blender.org/blender/blender/pulls/107994	2023-05-17 11:07:57 +02:00
Michael Jones	8dd7b5b26b	Cycles: Metal integrator state size tuning This patch tunes the integrator state sizing for Metal (`num_concurrent_states` and `num_concurrent_busy_states`). On all GPUs architecture, we adjust the busy:total states ratio to be 1:4 which gives better rendering performance than the previous 1:16 ratio (independent of total state count). This gives a small performance uplift (e.g. 2-3% on M1 Ultra). Additionally for M2 architectures, we double the overall state size if there is available headroom. Inclusive of the first change, we can expect uplift of close to 10% in future, as this results in larger dispatch sizes and minimises work submission overheads. In order to make an accurate determination of available headroom, we defer the calculation of `num_concurrent_states` and `num_concurrent_busy_states` until the time of integrator state allocation (i.e. after all of the scene data has been allocated). We also refactor `alloc_integrator_soa` to calculate an exact single-state-size in a first pass, right before allocating the integrator SoA buffers in a second pass. Reviewed By: brecht Differential Revision: https://developer.blender.org/D16313	2022-10-24 17:14:33 +01:00
Xavier Hallade	7eeeaec6da	Cycles: use direct linking for oneAPI backend This is a minimal set of changes, allowing a lot of cleanup that can happen afterward as it allows sycl method and objects to be used outside of kernel.cpp. Reviewed By: brecht, sergey Differential Revision: https://developer.blender.org/D15397	2022-10-07 09:50:05 +02:00
Nikita Sirgienko	2ead05d738	Cycles: Add optional per-kernel performance statistics When verbose level 4 is enabled, Blender prints kernel performance data for Cycles on GPU backends (except Metal that doesn't use debug_enqueue_* methods) for groups of kernels. These changes introduce a new CYCLES_DEBUG_PER_KERNEL_PERFORMANCE environment variable to allow getting timings for each kernels separately and not grouped with others. This is done by adding explicit synchronization after each kernel execution. Differential Revision: https://developer.blender.org/D15971	2022-09-27 22:15:00 +02:00
Xavier Hallade	d706d0460c	Cycles oneAPI: simplify num_concurrent_states selection The number of Execution Units and resident "threads" (simd width * threads per EUs) are now exposed and used to select the number of states using a simplified heuristic.	2022-07-27 09:45:33 +02:00
Xavier Hallade	a02992f131	Cycles: Add support for rendering on Intel GPUs using oneAPI This patch adds a new Cycles device with similar functionality to the existing GPU devices. Kernel compilation and runtime interaction happen via oneAPI DPC++ compiler and SYCL API. This implementation is primarly focusing on Intel® Arc™ GPUs and other future Intel GPUs. The first supported drivers are 101.1660 on Windows and 22.10.22597 on Linux. The necessary tools for compilation are: - A SYCL compiler such as oneAPI DPC++ compiler or https://github.com/intel/llvm - Intel® oneAPI Level Zero which is used for low level device queries: https://github.com/oneapi-src/level-zero - To optionally generate prebuilt graphics binaries: Intel® Graphics Compiler All are included in Linux precompiled libraries on svn: https://svn.blender.org/svnroot/bf-blender/trunk/lib The same goes for Windows precompiled binaries but for the graphics compiler, available as "Intel® Graphics Offline Compiler for OpenCL™ Code" from https://www.intel.com/content/www/us/en/developer/articles/tool/oneapi-standalone-components.html, for which path can be set as OCLOC_INSTALL_DIR. Being based on the open SYCL standard, this implementation could also be extended to run on other compatible non-Intel hardware in the future. Reviewed By: sergey, brecht Differential Revision: https://developer.blender.org/D15254 Co-authored-by: Nikita Sirgienko <nikita.sirgienko@intel.com> Co-authored-by: Stefan Werner <stefan.werner@intel.com>	2022-06-29 12:58:04 +02:00

21 Commits