test2

Author	SHA1	Message	Date
Brecht Van Lommel	ecd54ba4e4	Cycles: Metal graphics interop This is trivial with unified memory, and avoids one memory copy. Pull Request: https://projects.blender.org/blender/blender/pulls/137363	2025-04-28 11:38:56 +02:00
Brecht Van Lommel	f506564a47	Cleanup: Unused argument compiler warning	2025-03-23 21:01:25 +01:00
Michael Jones	c23c4ae6ba	Cycles: Fix issue affecting Metal kernel profiling (normally disabled) This issue only affects profiling mode (`CYCLES_METAL_PROFILING=1`). There's a modest limit to the number of concurrent counter sampling buffers per device, so instead of creating one per device queue, we create one per device that can be reused by successive device queues. Authored by Emma Liu. Pull Request: https://projects.blender.org/blender/blender/pulls/136248	2025-03-21 12:47:15 +01:00
Michael Jones	584f19a5af	Cycles: Apple Silicon tidy: Remove non-UMA codepaths (v2) This PR removes a bunch of dead code following #123551 (removal of AMD and Intel GPU support). It is safe to assume that UMA will be available, so a lot of codepaths that dealt with copying between CPU and GPU are now just clutter. Pull Request: https://projects.blender.org/blender/blender/pulls/136146	2025-03-19 12:53:01 +01:00
Brecht Van Lommel	ab3204e251	Revert "Cycles: Apple Silicon tidy: Remove non-UMA codepaths" This reverts commit `1a93dfe4fc`. This is hitting asserts in the tests, revert until it's fixed. Ref #136117	2025-03-18 20:37:23 +01:00
Michael Jones	1a93dfe4fc	Cycles: Apple Silicon tidy: Remove non-UMA codepaths This PR removes a bunch of dead code following #123551 (removal of AMD and Intel GPU support). It is safe to assume that UMA will be available, so a lot of codepaths that dealt with copying between CPU and GPU are now just clutter. Pull Request: https://projects.blender.org/blender/blender/pulls/136117	2025-03-18 19:09:25 +01:00
Brecht Van Lommel	21a90f26b6	Cleanup: Fix C++20 deprecation warnings in Cycles Pull Request: https://projects.blender.org/blender/blender/pulls/134338	2025-02-11 16:42:03 +01:00
Brecht Van Lommel	dd51c8660b	Refactor: Cycles: Add const keyword where possible, using clang-tidy Check was misc-const-correctness, combined with readability-isolate-declaration as suggested by the docs. Temporarily clang-format "QualifierAlignment: Left" was used to get consistency with the prevailing order of keywords. Pull Request: https://projects.blender.org/blender/blender/pulls/132361	2025-01-03 10:23:20 +01:00
Brecht Van Lommel	d0c2e68e5f	Refactor: Cycles: Automated clang-tidy fixups in Cycles * Use .empty() and .data() * Use nullptr instead of 0 * No else after return * Simple class member initialization * Add override for virtual methods * Include C++ instead of C headers * Remove some unused includes * Use default constructors * Always use braces * Consistent names in definition and declaration * Change typedef to using Pull Request: https://projects.blender.org/blender/blender/pulls/132361	2025-01-03 10:22:55 +01:00
Brecht Van Lommel	3c2a6fbb9c	Refactor: Cycles: Use nullptr instead of NULL Pull Request: https://projects.blender.org/blender/blender/pulls/132361	2025-01-03 10:22:43 +01:00
Jason Fielder	7fbc9e9428	Fix: Metal: Memory leaks identified by Instruments and Xcode memory graph. Running Xcode memory graphs and the Instruments tools revealed memory leaks caused, in the main, by over-retained objects. This removes the unnecessary 'retains' and adds some asserts to guard against over-retaining in the future. There are a few memory leaks remaining involving PyUnicode_DecodeUTF8 but I am unable to identify the cause of these at this time. Authored by Apple: James McCarthy Pull Request: https://projects.blender.org/blender/blender/pulls/129117	2024-11-01 11:56:51 +01:00
Weizhen Huang	4a4270d73c	Merge branch 'blender-v4.2-release'	2024-07-08 16:19:41 +02:00
Michael Jones	5a29be3c75	Cycles: Fix #116243 , #122022 - MetalRT live viewport stability issues This PR fixes live viewport stability issues on Mac when MetalRT is enabled. There were two sources of instability: 1) `MTLAccelerationStructure` instances were not being correctly retained meaning that use-after-free crashes could occur following a geometry sync. 2) `MTLIntersectionFunctionTable` objects could be unsafely shared between multiple `MetalDeviceQueue` instances (in this case, `setBuffer` being the unsafe mutation) The solution to 2 involves creating a new `MetalDispatchPipeline` type which is strictly used by only 1 `MetalDeviceQueue` instance. Pull Request: https://projects.blender.org/blender/blender/pulls/124055	2024-07-08 16:18:34 +02:00
Alaska	c8340cf754	Cycles: Remove AMD and Intel GPU support from Metal backend This is because with the addition of new features to Cycles, these GPUs experienced significant performance regressions and bugs, all stemming from bugs in the Metal GPU driver/compiler. The only reasonable way to work around these issues was to disable parts of Cycles code on these GPUs to avoid the driver/compiler bugs. This resulted in increased development time maintaining these platforms while being unable to deliver feature parity with other GPU backends. It has been decided that this development time is better spent maintaining platforms that are still actively maintained by hardware/software vendors, and so AMD and Intel GPU support will be removed from the Metal backend for Cycles. Pull Request: https://projects.blender.org/blender/blender/pulls/123551	2024-06-26 17:16:20 +02:00
Michael Jones	5508b41a40	Cycles: MetalRT optimisations (scene_intersect_shadow + random_walk) This PR contains optimisations and a general tidy-up of the MetalRT backend. - Currently `scene_intersect` is used for both normal and (opaque) shadow rays, however the usage patterns are different enough to warrant specialisation. Shadow intersection tests (flagged with `PATH_RAY_SHADOW_OPAQUE`) only need a bool result, but need a larger "self" payload in order to exclude hits against target lights. By specialising we can minimise the payload size in each case (which is helps performance) and avoid some dynamic branching. This PR introduces a new `scene_intersect_shadow` function which is specialised in Metal, and currently redirects to `scene_intersect` in the other backends. - Currently `scene_intersect_local` is implemented for worst-case payload requirements as demanded by `subsurface_disk` (where `max_hits` is 4). The random_walk case only demands 1 hit result which we can retrieve directly from the intersector object (rather than stashing it in the payload). By specialising, we significantly reduce the payload size for random_walk queries, which has a big impact on performance. Additionally, we only need to use a custom intersection function for the first ray test in a random walk (for self-primitive filtering), so this PR forces faster `opaque` intersection testing for all but the first random walk test. - Currently `scene_intersect_volume` has a lot of redundant code to handle non-triangle primitives despite volumes only being enclosed by trimeshes. This PR removes this code. Additionally, this PR tidies up the convoluted intersection function linking code, removes some redundant intersection handlers, and uses more consistent naming of intersection functions. On a M3 MacBook Pro, these changes give 2-3% performance increase on typical scenes with opaque trimesh materials (e.g. barbershop, classroom junkshop), but can give over 15% performance increase for certain scenes using random walk SSS (e.g. monster). Pull Request: https://projects.blender.org/blender/blender/pulls/121397	2024-05-10 16:38:02 +02:00
Michael Jones	9b833fdeba	Cycles: Use more accurate GPU counter timestamps for profiling in Metal This PR replaces the existing CPU wall-clock based profiling mechanism with more precise GPU counter based timestamps. As before, it is enabled by setting the env var `CYCLES_METAL_PROFILING=1`. Original implementation by Morteza Mostajabodaveh. Pull Request: https://projects.blender.org/blender/blender/pulls/121208	2024-04-29 15:25:32 +02:00
Brecht Van Lommel	36c11ee482	Fix #118514 : Cycles MetalRT crash with empty scene Pull Request: https://projects.blender.org/blender/blender/pulls/118907	2024-02-29 17:28:13 +01:00
Raul Fernandez	324ff4ddef	macOS: Remove unnecessary checks now that minimum version is macOS 11.2 MacOS minimum version is now 11.2 we no longer need to check for lower API versions. Pull Request: https://projects.blender.org/blender/blender/pulls/118388	2024-02-16 19:03:23 +01:00
Stefan Werner	31d55e87f9	Cycles: Metal support for OpenImageDenoise This is supported on Apple Silicon GPUs and macOS 13.0+. Co-authored-by: Stefan Werner <stefan.werner@intel.com> Co-authored-by: Attila Afra <attila.t.afra@intel.com> Pull Request: https://projects.blender.org/blender/blender/pulls/116124	2024-02-06 21:13:23 +01:00
Campbell Barton	617f7b76df	Cleanup: comment block formatting	2024-01-08 11:31:43 +11:00
Brecht Van Lommel	d377ef2543	Clang Format: bump to version 17 Along with the 4.1 libraries upgrade, we are bumping the clang-format version from 8-12 to 17. This affects quite a few files. If not already the case, you may consider pointing your IDE to the clang-format binary bundled with the Blender precompiled libraries.	2024-01-03 13:38:14 +01:00
Michael Jones	4e3ee4f026	Cycles: Fix animation hangs/crashes in Metal due to leaking temp objects This PR adds `@autoreleasepool` blocks around functions that have been observed to create hidden temporary NSObjects, and eventually cause command buffer failures. A couple of allocations needed to be tweaked in order to maintain correct retain/release behaviour. This PR also fixes the command buffer error text to show more useful information.	2023-10-24 23:20:16 +01:00
Michael Jones	1c1c6ac457	Cycles: Fix last failing unit test (T39823) on MetalRT This PR fixes T39823, the sole failing unit test when running with MetalRT. It does so by implementing and binding a missing intersection handler (`__anyhit__cycles_metalrt_volume_test_tri`) which is required for `scene_intersect_volume` (as used by `integrator_volume_stack_update_for_subsurface`) to work as intended. This scene exposed the error as it uses subsurface scattering on a sphere which is intersected by volume. Pull Request: https://projects.blender.org/blender/blender/pulls/112876	2023-09-25 22:41:27 +02:00
Michael Jones	6c98cb73ac	Cycles: Use new MetalRT curve primitives for 3D curves and ribbons This patch updates the experimental MetalRT code path to use new [curve primitives](https://developer.apple.com/videos/play/wwdc2023/10128/) which were recently added in macOS 14. This replaces the previous custom box intersection implementation, allowing the driver to better optimise curve acceleration structures for the GPU. On existing hardware, this can speed up MetalRT renders by up to 40% for scenes that use hair / curve primitives extensively. The MetalRT option will only be available on macOS >= 14, and requires Xcode >= 15 to build (otherwise the option will be compiled out). Authored by Marco Giordano, Michael Jones, and Jason Fielder --- Before / after render times (M1 Max MacBook Pro, macOS 14 beta, MetalRT enabled): ``` Custom box intersection MetalRT curve primitives Speedup fishy_cat 111.5 80.5 1.39 koro 114.4 86.7 1.32 sinosauropteryx 291.8 279.2 1.05 spring 142.3 142.2 1.00 victor 442.7 347.7 1.27 ``` --- Pull Request: https://projects.blender.org/blender/blender/pulls/111795	2023-09-13 16:02:49 +02:00
Sergey Sharybin	bad41885db	Cleanup: Mark unused function arguments as such A lot of such cases got discovered since recent change to CLang's compiler flags for C++. Pull Request: https://projects.blender.org/blender/blender/pulls/109732	2023-07-05 12:02:06 +02:00
Campbell Barton	c12994612b	License headers: use SPDX-FileCopyrightText in intern/cycles	2023-06-14 16:53:23 +10:00
Sergey Sharybin	ba3f26fac5	Cycles: light and shadow linking With light linking, lights can be set to affect only specific objects in the scene. Shadow linking additionally gives control over which objects acts a shadow blockers for a light. Usage: https://wiki.blender.org/wiki/Reference/Release_Notes/4.0/Cycles Implementation: https://wiki.blender.org/wiki/Source/Render/Cycles/LightLinking Ref #104972 Co-authored-by: Brecht Van Lommel <brecht@blender.org>	2023-05-24 14:11:47 +02:00
Campbell Barton	6859bb6e67	Cleanup: format (with BraceWrapping::AfterControlStatement "MultiLine")	2023-05-02 09:37:49 +10:00
Michael Jones	5f61eca7af	Cycles: Exploit non-uniform threadgroup sizes on Metal This patch replaces `dispatchThreadgroups` with `dispatchThreads` which takes care of non-uniform threadgroup bounds. This allows us to remove the bounds guards in the integrator kernel entry points. Pull Request: https://projects.blender.org/blender/blender/pulls/106217	2023-03-29 21:46:11 +02:00
Campbell Barton	b3625e6bfd	Cleanup: comment blocks	2023-03-09 10:39:49 +11:00
Brecht Van Lommel	02c2970983	Cycles: add NanoVDB support for Metal on Apple Silicon Contributed by Yulia Kuznetcova at Apple. NanoVDB is patched to give add address spaces required by Metal. We hope that in the future Metal will support the generic address space. For AMD and Intel this is currently not available since it causes a performance regression also on scenes without volumes. Pull Request #104837	2023-02-21 15:03:52 +01:00
Michael Jones	2d994de77c	Cycles: MetalRT optimisation for subsurface intersection queries This patch optimises subsurface intersection queries on MetalRT. Currently intersect_local traverses from the scene root, retrospectively discarding all non-local hits. Using a lookup of bottom level acceleration structures, we can explicitly query only the relevant instance. On M1 Max, with MetalRT selected, this can give a render speedup of 15-20% for scenes like Monster which make heavy use of subsurface scattering. Patch authored by Marco Giordano. Reviewed By: brecht Differential Revision: https://developer.blender.org/D17153	2023-02-06 19:12:29 +00:00
Michael Jones	654e1e901b	Cycles: Use local atomics for faster shader sorting (enabled on Metal) This patch adds two new kernels: SORT_BUCKET_PASS and SORT_WRITE_PASS. These replace PREFIX_SUM and SORTED_PATHS_ARRAY on supported devices (currently implemented on Metal, but will be trivial to enable on the other backends). The new kernels exploit sort partitioning (see D15331) by sorting each partition separately using local atomics. This can give an overall render speedup of 2-3% depending on architecture. As before, we fall back to the original non-partitioned sorting when the shader count is "too high". Reviewed By: brecht Differential Revision: https://developer.blender.org/D16909	2023-02-06 11:18:26 +00:00
Michael Jones	08b3426df9	Cycles: Occupancy tuning for new higher end M2 machines This patch adds occupancy tuning for the newly announced high-end M2 machines, giving 10-15% render speedup over a pre-tuned build. Reviewed By: brecht Differential Revision: https://developer.blender.org/D17037	2023-01-19 17:56:40 +00:00
Michael Jones	a7cc6e015c	Cycles: Additional Metal kernel specialisation exposed through UI This patch adds a new "Kernel Optimization Level" dropdown menu to control Metal kernel specialisation. Currently this defaults to "full" optimisation, on the assumption that the changes proposed in D16371 will address usability concerns around app responsiveness and shader cache housekeeping. Reviewed By: brecht Differential Revision: https://developer.blender.org/D16514	2023-01-04 23:36:52 +00:00
Michael Jones	77c3e67d3d	Cycles: Improved render start/stop responsiveness on Metal All kernel specialisation is now performed in the background regardless of kernel type, meaning that the first render will be visible a few seconds sooner. The only exception is during benchmark warm up, in which case we wait for all kernels to be cached. When stopping a render, we call a new `cancel()` method on the device which causes any outstanding compilation work to be cancelled, and we destroy the device in a detached thread so that any stale queued compilations can be safely purged without blocking the UI for longer than necessary. Reviewed By: brecht Differential Revision: https://developer.blender.org/D16371	2023-01-04 16:00:53 +00:00
Michael Jones	8dd7b5b26b	Cycles: Metal integrator state size tuning This patch tunes the integrator state sizing for Metal (`num_concurrent_states` and `num_concurrent_busy_states`). On all GPUs architecture, we adjust the busy:total states ratio to be 1:4 which gives better rendering performance than the previous 1:16 ratio (independent of total state count). This gives a small performance uplift (e.g. 2-3% on M1 Ultra). Additionally for M2 architectures, we double the overall state size if there is available headroom. Inclusive of the first change, we can expect uplift of close to 10% in future, as this results in larger dispatch sizes and minimises work submission overheads. In order to make an accurate determination of available headroom, we defer the calculation of `num_concurrent_states` and `num_concurrent_busy_states` until the time of integrator state allocation (i.e. after all of the scene data has been allocated). We also refactor `alloc_integrator_soa` to calculate an exact single-state-size in a first pass, right before allocating the integrator SoA buffers in a second pass. Reviewed By: brecht Differential Revision: https://developer.blender.org/D16313	2022-10-24 17:14:33 +01:00
Brecht Van Lommel	523bbf7065	Cycles: generalize shader sorting / locality heuristic to all GPU devices This was added for Metal, but also gives good results with CUDA and OptiX. Also enable it for future Apple GPUs instead of only M1 and M2, since this has been shown to help across multiple GPUs so the better bet seems to enable rather than disable it. Also moves some of the logic outside of the Metal device code, and always enables the code in the kernel since other devices don't do dynamic compile. Time per sample with OptiX + RTX A6000: new old barbershop_interior 0.0730s 0.0727s bmw27 0.0047s 0.0053s classroom 0.0428s 0.0464s fishy_cat 0.0102s 0.0108s junkshop 0.0366s 0.0395s koro 0.0567s 0.0578s monster 0.0206s 0.0223s pabellon 0.0158s 0.0174s sponza 0.0088s 0.0100s spring 0.1267s 0.1280s victor 0.0524s 0.0531s wdas_cloud 0.0817s 0.0816s Ref D15331, T87836	2022-07-15 13:42:47 +02:00
Michael Jones	4b1d315017	Cycles: Improve cache usage on Apple GPUs by chunking active indices This patch partitions the active indices into chunks prior to sorting by material in order to tradeoff some material coherence for better locality. On Apple Silicon GPUs (particularly higher end M1-family GPUs), we observe overall render time speedups of up to 15%. The partitioning is implemented by repeating the range of `shader_sort_key` for each partition, and encoding a "locator" key which distributes the indices into sorted chunks. Reviewed By: brecht Differential Revision: https://developer.blender.org/D15331	2022-07-14 14:26:18 +01:00
Brecht Van Lommel	9b6e86ace1	Cycles: stop Metal rendering on command buffer error If there is an error we should stop rendering, instead of finishing with a wrong render result or reporting a wrong benchmark time. Ref T96519 Differential Revision: https://developer.blender.org/D15287	2022-06-24 16:51:56 +02:00
Brecht Van Lommel	ff1883307f	Cleanup: renaming and consistency for kernel data * Rename "texture" to "data array". This has not used textures for a long time, there are just global memory arrays now. (On old CUDA GPUs there was a cache for textures but not global memory, so we used to put all data in textures.) * For CUDA and HIP, put globals in KernelParams struct like other devices. * Drop __ prefix for data array names, no possibility for naming conflict now that these are in a struct.	2022-06-20 12:30:48 +02:00
Brecht Van Lommel	2c1bffa286	Cleanup: add verbose logging category names instead of numbers And use them more consistently than before.	2022-06-17 14:08:14 +02:00
Michael Jones	19e0b60f3e	Cycles: MetalDeviceQueue - capture of multiple dispatches, and some tidying This patch adds a new mode of gpu capture (env var `CYCLES_DEBUG_METAL_CAPTURE_SAMPLES`) to capture a block of dispatches between "reset" calls. It also fixes member data naming inconsistencies and adds some missing OS version checks. Screenshot showing .gputrace capture in Xcode 14.0 beta (using `CYCLES_DEBUG_METAL_CAPTURE_SAMPLES="1"` and `CYCLES_DEBUG_METAL_CAPTURE_LIMIT="10"`): {F13155703} Reviewed By: sergey, brecht Differential Revision: https://developer.blender.org/D15179	2022-06-13 13:42:07 +01:00
Sergey Sharybin	0fddff027e	Cleanup: Unused but set variable in Cycles Metal profiler	2022-06-09 10:20:26 +02:00
Michael Jones	4412e14708	Cycles: Useful Metal backend debug & profiling functionality This patch adds some useful debugging & profiling env vars to the Metal backend: - `CYCLES_METAL_PROFILING`: output a per-kernel timing report at the end of the render - `CYCLES_METAL_DEBUG`: enable per-dispatch tracing (very verbose) - `CYCLES_DEBUG_METAL_CAPTURE_KERNEL`: enable programatic .gputrace capture for a specified kernel index Here's an example of the timing report with `CYCLES_METAL_PROFILING` enabled: ``` --------------------------------------------------------------------------------------------------- Kernel name Total threads Dispatches Avg. T/D Time Time% --------------------------------------------------------------------------------------------------- integrator_init_from_camera 657,407,232 161 4,083,274 0.24s 0.51% integrator_intersect_closest 1,629,288,440 681 2,392,494 15.18s 32.12% integrator_intersect_shadow 751,652,291 470 1,599,260 5.80s 12.28% integrator_shade_background 304,612,074 263 1,158,220 1.16s 2.45% integrator_shade_surface 1,159,764,041 676 1,715,627 20.57s 43.52% integrator_shade_shadow 598,885,847 418 1,432,741 1.27s 2.69% integrator_queued_paths_array 2,969,650,130 805 3,689,006 0.35s 0.74% integrator_queued_shadow_paths_array 593,936,619 379 1,567,115 0.14s 0.29% integrator_terminated_paths_array 22,205,417 155 143,260 0.05s 0.10% integrator_sorted_paths_array 2,517,140,043 676 3,723,579 1.65s 3.50% integrator_compact_paths_array 648,912,748 155 4,186,533 0.03s 0.07% integrator_compact_states 20,872,687 155 134,662 0.14s 0.29% integrator_terminated_shadow_paths_array 374,100,675 438 854,111 0.16s 0.33% integrator_compact_shadow_paths_array 503,768,657 438 1,150,156 0.05s 0.10% integrator_compact_shadow_states 37,664,941 202 186,460 0.23s 0.50% integrator_reset 25,165,824 6 4,194,304 0.06s 0.12% film_convert_combined_half_rgba 3,110,400 6 518,400 0.00s 0.01% prefix_sum 676 676 1 0.19s 0.40% --------------------------------------------------------------------------------------------------- 6,760 47.27s 100.00% --------------------------------------------------------------------------------------------------- ``` Reviewed By: brecht Differential Revision: https://developer.blender.org/D15044	2022-06-07 11:08:39 +01:00
Brecht Van Lommel	610619c203	Merge branch 'blender-v3.2-release'	2022-05-31 17:35:16 +02:00
Brecht Van Lommel	f2cd7e08fe	Fix Cycles MNEE not working for Metal Move MNEE to own kernel, separate from shader ray-tracing. This does introduce the limitation that a shader can't use both MNEE and AO/bevel, but that seems like the better trade-off for now. We can experiment with bigger kernel organization changes later. Differential Revision: https://developer.blender.org/D15070	2022-05-31 17:24:43 +02:00
Michael Jones	007184bcf2	Enable inlining on Apple Silicon. Use new process-wide ShaderCache in order to safely re-enable binary archives This patch is the same as D14763, but with a fix for unit test failures caused by ShaderCache fetch logic not working in the non-MetalRT case: ``` diff --git a/intern/cycles/device/metal/kernel.mm b/intern/cycles/device/metal/kernel.mm index ad268ae7057..6aa1a56056e 100644 --- a/intern/cycles/device/metal/kernel.mm +++ b/intern/cycles/device/metal/kernel.mm @@ -203,9 +203,12 @@ bool kernel_has_intersection(DeviceKernel device_kernel) /* metalrt options / request.pipeline->use_metalrt = device->use_metalrt; - request.pipeline->metalrt_hair = device->kernel_features & KERNEL_FEATURE_HAIR; - request.pipeline->metalrt_hair_thick = device->kernel_features & KERNEL_FEATURE_HAIR_THICK; - request.pipeline->metalrt_pointcloud = device->kernel_features & KERNEL_FEATURE_POINTCLOUD; + request.pipeline->metalrt_hair = device->use_metalrt && + (device->kernel_features & KERNEL_FEATURE_HAIR); + request.pipeline->metalrt_hair_thick = device->use_metalrt && + (device->kernel_features & KERNEL_FEATURE_HAIR_THICK); + request.pipeline->metalrt_pointcloud = device->use_metalrt && + (device->kernel_features & KERNEL_FEATURE_POINTCLOUD); { thread_scoped_lock lock(cache_mutex); @@ -225,9 +228,9 @@ bool kernel_has_intersection(DeviceKernel device_kernel) / metalrt options / bool use_metalrt = device->use_metalrt; - bool metalrt_hair = device->kernel_features & KERNEL_FEATURE_HAIR; - bool metalrt_hair_thick = device->kernel_features & KERNEL_FEATURE_HAIR_THICK; - bool metalrt_pointcloud = device->kernel_features & KERNEL_FEATURE_POINTCLOUD; + bool metalrt_hair = use_metalrt && (device->kernel_features & KERNEL_FEATURE_HAIR); + bool metalrt_hair_thick = use_metalrt && (device->kernel_features & KERNEL_FEATURE_HAIR_THICK); + bool metalrt_pointcloud = use_metalrt && (device->kernel_features & KERNEL_FEATURE_POINTCLOUD); MetalKernelPipeline best_pipeline = nullptr; for (auto &pipeline : collection) { ``` Reviewed By: brecht Differential Revision: https://developer.blender.org/D14923	2022-05-11 16:20:59 +01:00
Brecht Van Lommel	52a5f68562	Revert "Cycles: Enable inlining on Apple Silicon for 1.1x speedup" This reverts commit `b82de02e7c`. It is causing crashes in various regression tests. Ref D14763	2022-04-28 00:46:43 +02:00
Michael Jones	b82de02e7c	Cycles: Enable inlining on Apple Silicon for 1.1x speedup This is a stripped down version of D14645 without the scene specialisation optimisations. The two major changes in this patch are: - Enables more aggressive inlining on Apple Silicon resulting in a 1.1x speedup and 10% reduction in spill, at the cost of longer pipeline build times - Revival of shader binary archives through a new ShaderCache which is shared between MetalDevice instances using the same physical MTLDevice. This mitigates the extra compile times via explicit caching (rather than, as before, relying on the implicit system shader cache which can be purged without notice) Reviewed By: brecht Differential Revision: https://developer.blender.org/D14763	2022-04-26 22:17:16 +01:00

1 2

55 Commits