test2

Author	SHA1	Message	Date
Brecht Van Lommel	ecd54ba4e4	Cycles: Metal graphics interop This is trivial with unified memory, and avoids one memory copy. Pull Request: https://projects.blender.org/blender/blender/pulls/137363	2025-04-28 11:38:56 +02:00
Michael Jones	c23c4ae6ba	Cycles: Fix issue affecting Metal kernel profiling (normally disabled) This issue only affects profiling mode (`CYCLES_METAL_PROFILING=1`). There's a modest limit to the number of concurrent counter sampling buffers per device, so instead of creating one per device queue, we create one per device that can be reused by successive device queues. Authored by Emma Liu. Pull Request: https://projects.blender.org/blender/blender/pulls/136248	2025-03-21 12:47:15 +01:00
Michael Jones	584f19a5af	Cycles: Apple Silicon tidy: Remove non-UMA codepaths (v2) This PR removes a bunch of dead code following #123551 (removal of AMD and Intel GPU support). It is safe to assume that UMA will be available, so a lot of codepaths that dealt with copying between CPU and GPU are now just clutter. Pull Request: https://projects.blender.org/blender/blender/pulls/136146	2025-03-19 12:53:01 +01:00
Brecht Van Lommel	ab3204e251	Revert "Cycles: Apple Silicon tidy: Remove non-UMA codepaths" This reverts commit `1a93dfe4fc`. This is hitting asserts in the tests, revert until it's fixed. Ref #136117	2025-03-18 20:37:23 +01:00
Michael Jones	1a93dfe4fc	Cycles: Apple Silicon tidy: Remove non-UMA codepaths This PR removes a bunch of dead code following #123551 (removal of AMD and Intel GPU support). It is safe to assume that UMA will be available, so a lot of codepaths that dealt with copying between CPU and GPU are now just clutter. Pull Request: https://projects.blender.org/blender/blender/pulls/136117	2025-03-18 19:09:25 +01:00
Brecht Van Lommel	dd51c8660b	Refactor: Cycles: Add const keyword where possible, using clang-tidy Check was misc-const-correctness, combined with readability-isolate-declaration as suggested by the docs. Temporarily clang-format "QualifierAlignment: Left" was used to get consistency with the prevailing order of keywords. Pull Request: https://projects.blender.org/blender/blender/pulls/132361	2025-01-03 10:23:20 +01:00
Brecht Van Lommel	d0c2e68e5f	Refactor: Cycles: Automated clang-tidy fixups in Cycles * Use .empty() and .data() * Use nullptr instead of 0 * No else after return * Simple class member initialization * Add override for virtual methods * Include C++ instead of C headers * Remove some unused includes * Use default constructors * Always use braces * Consistent names in definition and declaration * Change typedef to using Pull Request: https://projects.blender.org/blender/blender/pulls/132361	2025-01-03 10:22:55 +01:00
Michael Jones	5a29be3c75	Cycles: Fix #116243 , #122022 - MetalRT live viewport stability issues This PR fixes live viewport stability issues on Mac when MetalRT is enabled. There were two sources of instability: 1) `MTLAccelerationStructure` instances were not being correctly retained meaning that use-after-free crashes could occur following a geometry sync. 2) `MTLIntersectionFunctionTable` objects could be unsafely shared between multiple `MetalDeviceQueue` instances (in this case, `setBuffer` being the unsafe mutation) The solution to 2 involves creating a new `MetalDispatchPipeline` type which is strictly used by only 1 `MetalDeviceQueue` instance. Pull Request: https://projects.blender.org/blender/blender/pulls/124055	2024-07-08 16:18:34 +02:00
Michael Jones	9b833fdeba	Cycles: Use more accurate GPU counter timestamps for profiling in Metal This PR replaces the existing CPU wall-clock based profiling mechanism with more precise GPU counter based timestamps. As before, it is enabled by setting the env var `CYCLES_METAL_PROFILING=1`. Original implementation by Morteza Mostajabodaveh. Pull Request: https://projects.blender.org/blender/blender/pulls/121208	2024-04-29 15:25:32 +02:00
Stefan Werner	31d55e87f9	Cycles: Metal support for OpenImageDenoise This is supported on Apple Silicon GPUs and macOS 13.0+. Co-authored-by: Stefan Werner <stefan.werner@intel.com> Co-authored-by: Attila Afra <attila.t.afra@intel.com> Pull Request: https://projects.blender.org/blender/blender/pulls/116124	2024-02-06 21:13:23 +01:00
Campbell Barton	c12994612b	License headers: use SPDX-FileCopyrightText in intern/cycles	2023-06-14 16:53:23 +10:00
Michael Jones	654e1e901b	Cycles: Use local atomics for faster shader sorting (enabled on Metal) This patch adds two new kernels: SORT_BUCKET_PASS and SORT_WRITE_PASS. These replace PREFIX_SUM and SORTED_PATHS_ARRAY on supported devices (currently implemented on Metal, but will be trivial to enable on the other backends). The new kernels exploit sort partitioning (see D15331) by sorting each partition separately using local atomics. This can give an overall render speedup of 2-3% depending on architecture. As before, we fall back to the original non-partitioned sorting when the shader count is "too high". Reviewed By: brecht Differential Revision: https://developer.blender.org/D16909	2023-02-06 11:18:26 +00:00
Michael Jones	8dd7b5b26b	Cycles: Metal integrator state size tuning This patch tunes the integrator state sizing for Metal (`num_concurrent_states` and `num_concurrent_busy_states`). On all GPUs architecture, we adjust the busy:total states ratio to be 1:4 which gives better rendering performance than the previous 1:16 ratio (independent of total state count). This gives a small performance uplift (e.g. 2-3% on M1 Ultra). Additionally for M2 architectures, we double the overall state size if there is available headroom. Inclusive of the first change, we can expect uplift of close to 10% in future, as this results in larger dispatch sizes and minimises work submission overheads. In order to make an accurate determination of available headroom, we defer the calculation of `num_concurrent_states` and `num_concurrent_busy_states` until the time of integrator state allocation (i.e. after all of the scene data has been allocated). We also refactor `alloc_integrator_soa` to calculate an exact single-state-size in a first pass, right before allocating the integrator SoA buffers in a second pass. Reviewed By: brecht Differential Revision: https://developer.blender.org/D16313	2022-10-24 17:14:33 +01:00
Brecht Van Lommel	523bbf7065	Cycles: generalize shader sorting / locality heuristic to all GPU devices This was added for Metal, but also gives good results with CUDA and OptiX. Also enable it for future Apple GPUs instead of only M1 and M2, since this has been shown to help across multiple GPUs so the better bet seems to enable rather than disable it. Also moves some of the logic outside of the Metal device code, and always enables the code in the kernel since other devices don't do dynamic compile. Time per sample with OptiX + RTX A6000: new old barbershop_interior 0.0730s 0.0727s bmw27 0.0047s 0.0053s classroom 0.0428s 0.0464s fishy_cat 0.0102s 0.0108s junkshop 0.0366s 0.0395s koro 0.0567s 0.0578s monster 0.0206s 0.0223s pabellon 0.0158s 0.0174s sponza 0.0088s 0.0100s spring 0.1267s 0.1280s victor 0.0524s 0.0531s wdas_cloud 0.0817s 0.0816s Ref D15331, T87836	2022-07-15 13:42:47 +02:00
Michael Jones	4b1d315017	Cycles: Improve cache usage on Apple GPUs by chunking active indices This patch partitions the active indices into chunks prior to sorting by material in order to tradeoff some material coherence for better locality. On Apple Silicon GPUs (particularly higher end M1-family GPUs), we observe overall render time speedups of up to 15%. The partitioning is implemented by repeating the range of `shader_sort_key` for each partition, and encoding a "locator" key which distributes the indices into sorted chunks. Reviewed By: brecht Differential Revision: https://developer.blender.org/D15331	2022-07-14 14:26:18 +01:00
Michael Jones	19e0b60f3e	Cycles: MetalDeviceQueue - capture of multiple dispatches, and some tidying This patch adds a new mode of gpu capture (env var `CYCLES_DEBUG_METAL_CAPTURE_SAMPLES`) to capture a block of dispatches between "reset" calls. It also fixes member data naming inconsistencies and adds some missing OS version checks. Screenshot showing .gputrace capture in Xcode 14.0 beta (using `CYCLES_DEBUG_METAL_CAPTURE_SAMPLES="1"` and `CYCLES_DEBUG_METAL_CAPTURE_LIMIT="10"`): {F13155703} Reviewed By: sergey, brecht Differential Revision: https://developer.blender.org/D15179	2022-06-13 13:42:07 +01:00
Michael Jones	4412e14708	Cycles: Useful Metal backend debug & profiling functionality This patch adds some useful debugging & profiling env vars to the Metal backend: - `CYCLES_METAL_PROFILING`: output a per-kernel timing report at the end of the render - `CYCLES_METAL_DEBUG`: enable per-dispatch tracing (very verbose) - `CYCLES_DEBUG_METAL_CAPTURE_KERNEL`: enable programatic .gputrace capture for a specified kernel index Here's an example of the timing report with `CYCLES_METAL_PROFILING` enabled: ``` --------------------------------------------------------------------------------------------------- Kernel name Total threads Dispatches Avg. T/D Time Time% --------------------------------------------------------------------------------------------------- integrator_init_from_camera 657,407,232 161 4,083,274 0.24s 0.51% integrator_intersect_closest 1,629,288,440 681 2,392,494 15.18s 32.12% integrator_intersect_shadow 751,652,291 470 1,599,260 5.80s 12.28% integrator_shade_background 304,612,074 263 1,158,220 1.16s 2.45% integrator_shade_surface 1,159,764,041 676 1,715,627 20.57s 43.52% integrator_shade_shadow 598,885,847 418 1,432,741 1.27s 2.69% integrator_queued_paths_array 2,969,650,130 805 3,689,006 0.35s 0.74% integrator_queued_shadow_paths_array 593,936,619 379 1,567,115 0.14s 0.29% integrator_terminated_paths_array 22,205,417 155 143,260 0.05s 0.10% integrator_sorted_paths_array 2,517,140,043 676 3,723,579 1.65s 3.50% integrator_compact_paths_array 648,912,748 155 4,186,533 0.03s 0.07% integrator_compact_states 20,872,687 155 134,662 0.14s 0.29% integrator_terminated_shadow_paths_array 374,100,675 438 854,111 0.16s 0.33% integrator_compact_shadow_paths_array 503,768,657 438 1,150,156 0.05s 0.10% integrator_compact_shadow_states 37,664,941 202 186,460 0.23s 0.50% integrator_reset 25,165,824 6 4,194,304 0.06s 0.12% film_convert_combined_half_rgba 3,110,400 6 518,400 0.00s 0.01% prefix_sum 676 676 1 0.19s 0.40% --------------------------------------------------------------------------------------------------- 6,760 47.27s 100.00% --------------------------------------------------------------------------------------------------- ``` Reviewed By: brecht Differential Revision: https://developer.blender.org/D15044	2022-06-07 11:08:39 +01:00
Sergey Sharybin	eccc9d8eba	Cleanup: Remove unused function in Cycles queue Noticed while looking into oneAPI patch. Seems to be unused, without clear indication why/when it might be needed. Removing the function simplifies adding the new backend. Differential Revision: https://developer.blender.org/D14652	2022-04-19 10:32:07 +02:00
Brecht Van Lommel	9cfc7967dd	Cycles: use SPDX license headers * Replace license text in headers with SPDX identifiers. * Remove specific license info from outdated readme.txt, instead leave details to the source files. * Add list of SPDX license identifiers used, and corresponding license texts. * Update copyright dates while we're at it. Ref D14069, T95597	2022-02-11 17:47:34 +01:00
Michael Jones	e23b54a59f	Cycles: Fix OS version warnings This patch suppresses OS version warnings and hides currently unsupported Metal GPUs when enumerating devices. Reviewed By: brecht Differential Revision: https://developer.blender.org/D13506	2021-12-08 15:08:12 +00:00
Michael Jones	9558fa5196	Cycles: Metal host-side code This patch adds the Metal host-side code: - Add all core host-side Metal backend files (device_impl, queue, etc) - Add MetalRT BVH setup files - Integrate with Cycles device enumeration code - Revive `path_source_replace_includes` in util/path (required for MSL compilation) This patch also includes a couple of small kernel-side fixes: - Add an implementation of `lgammaf` for Metal [Nemes, Gergő (2010), "New asymptotic expansion for the Gamma function", Archiv der Mathematik](https://users.renyi.hu/~gergonemes/) - include "work_stealing.h" inside the Metal context class because it accesses state now Ref T92212 Reviewed By: brecht Maniphest Tasks: T92212 Differential Revision: https://developer.blender.org/D13423	2021-12-07 15:52:21 +00:00

21 Commits