griefith/test

Author	SHA1	Message	Date
Stefan Werner	31d55e87f9	Cycles: Metal support for OpenImageDenoise This is supported on Apple Silicon GPUs and macOS 13.0+. Co-authored-by: Stefan Werner <stefan.werner@intel.com> Co-authored-by: Attila Afra <attila.t.afra@intel.com> Pull Request: https://projects.blender.org/blender/blender/pulls/116124	2024-02-06 21:13:23 +01:00
Brecht Van Lommel	d015e98ee6	Fix Cycles ASAN error with boolean kernel arguments	2023-12-12 13:27:36 +01:00
Campbell Barton	c12994612b	License headers: use SPDX-FileCopyrightText in intern/cycles	2023-06-14 16:53:23 +10:00
Sergey Sharybin	d32d787f5f	Clang-Format: Allow empty functions to be single-line For example ``` OIIOOutputDriver::~OIIOOutputDriver() { } ``` becomes ``` OIIOOutputDriver::~OIIOOutputDriver() {} ``` Saves quite some vertical space, which is especially handy for constructors. Pull Request: https://projects.blender.org/blender/blender/pulls/105594	2023-03-29 16:50:54 +02:00
Michael Jones	654e1e901b	Cycles: Use local atomics for faster shader sorting (enabled on Metal) This patch adds two new kernels: SORT_BUCKET_PASS and SORT_WRITE_PASS. These replace PREFIX_SUM and SORTED_PATHS_ARRAY on supported devices (currently implemented on Metal, but will be trivial to enable on the other backends). The new kernels exploit sort partitioning (see D15331) by sorting each partition separately using local atomics. This can give an overall render speedup of 2-3% depending on architecture. As before, we fall back to the original non-partitioned sorting when the shader count is "too high". Reviewed By: brecht Differential Revision: https://developer.blender.org/D16909	2023-02-06 11:18:26 +00:00
Michael Jones	8dd7b5b26b	Cycles: Metal integrator state size tuning This patch tunes the integrator state sizing for Metal (`num_concurrent_states` and `num_concurrent_busy_states`). On all GPUs architecture, we adjust the busy:total states ratio to be 1:4 which gives better rendering performance than the previous 1:16 ratio (independent of total state count). This gives a small performance uplift (e.g. 2-3% on M1 Ultra). Additionally for M2 architectures, we double the overall state size if there is available headroom. Inclusive of the first change, we can expect uplift of close to 10% in future, as this results in larger dispatch sizes and minimises work submission overheads. In order to make an accurate determination of available headroom, we defer the calculation of `num_concurrent_states` and `num_concurrent_busy_states` until the time of integrator state allocation (i.e. after all of the scene data has been allocated). We also refactor `alloc_integrator_soa` to calculate an exact single-state-size in a first pass, right before allocating the integrator SoA buffers in a second pass. Reviewed By: brecht Differential Revision: https://developer.blender.org/D16313	2022-10-24 17:14:33 +01:00
Campbell Barton	6d1d1bf2b1	Cleanup: spelling in comments Also add missing task ID.	2022-09-28 09:41:31 +10:00
Nikita Sirgienko	2ead05d738	Cycles: Add optional per-kernel performance statistics When verbose level 4 is enabled, Blender prints kernel performance data for Cycles on GPU backends (except Metal that doesn't use debug_enqueue_* methods) for groups of kernels. These changes introduce a new CYCLES_DEBUG_PER_KERNEL_PERFORMANCE environment variable to allow getting timings for each kernels separately and not grouped with others. This is done by adding explicit synchronization after each kernel execution. Differential Revision: https://developer.blender.org/D15971	2022-09-27 22:15:00 +02:00
Brecht Van Lommel	523bbf7065	Cycles: generalize shader sorting / locality heuristic to all GPU devices This was added for Metal, but also gives good results with CUDA and OptiX. Also enable it for future Apple GPUs instead of only M1 and M2, since this has been shown to help across multiple GPUs so the better bet seems to enable rather than disable it. Also moves some of the logic outside of the Metal device code, and always enables the code in the kernel since other devices don't do dynamic compile. Time per sample with OptiX + RTX A6000: new old barbershop_interior 0.0730s 0.0727s bmw27 0.0047s 0.0053s classroom 0.0428s 0.0464s fishy_cat 0.0102s 0.0108s junkshop 0.0366s 0.0395s koro 0.0567s 0.0578s monster 0.0206s 0.0223s pabellon 0.0158s 0.0174s sponza 0.0088s 0.0100s spring 0.1267s 0.1280s victor 0.0524s 0.0531s wdas_cloud 0.0817s 0.0816s Ref D15331, T87836	2022-07-15 13:42:47 +02:00
Michael Jones	4b1d315017	Cycles: Improve cache usage on Apple GPUs by chunking active indices This patch partitions the active indices into chunks prior to sorting by material in order to tradeoff some material coherence for better locality. On Apple Silicon GPUs (particularly higher end M1-family GPUs), we observe overall render time speedups of up to 15%. The partitioning is implemented by repeating the range of `shader_sort_key` for each partition, and encoding a "locator" key which distributes the indices into sorted chunks. Reviewed By: brecht Differential Revision: https://developer.blender.org/D15331	2022-07-14 14:26:18 +01:00
Sergey Sharybin	eccc9d8eba	Cleanup: Remove unused function in Cycles queue Noticed while looking into oneAPI patch. Seems to be unused, without clear indication why/when it might be needed. Removing the function simplifies adding the new backend. Differential Revision: https://developer.blender.org/D14652	2022-04-19 10:32:07 +02:00
Brecht Van Lommel	f130d4f211	Cleanup: fix various typos Contributed by luzpaz. Differential Revision: https://developer.blender.org/D14203	2022-03-07 17:28:39 +01:00
Brecht Van Lommel	9cfc7967dd	Cycles: use SPDX license headers * Replace license text in headers with SPDX identifiers. * Remove specific license info from outdated readme.txt, instead leave details to the source files. * Add list of SPDX license identifiers used, and corresponding license texts. * Update copyright dates while we're at it. Ref D14069, T95597	2022-02-11 17:47:34 +01:00
Patrick Mours	8393ccd076	Cycles: Add OptiX temporal denoising support Enables the `bpy.ops.cycles.denoise_animation()` operator again and modifies it to support temporal denoising with OptiX. This requires renders that were done with both the "Vector" and "Denoising Data" passes. Differential Revision: https://developer.blender.org/D11442	2022-01-05 15:58:36 +01:00
Michael Jones	98a5c924fc	Cycles: Metal readiness: Specify DeviceQueue::enqueue arg types This patch adds new arg-type parameters to `DeviceQueue::enqueue` and its overrides. This is in preparation for the Metal backend which needs this information for correct argument encoding. Ref T92212 Reviewed By: brecht Maniphest Tasks: T92212 Differential Revision: https://developer.blender.org/D13357	2021-11-29 14:56:06 +00:00
Brecht Van Lommel	fd25e883e2	Cycles: remove prefix from source code file names Remove prefix of filenames that is the same as the folder name. This used to help when #includes were using individual files, but now they are always relative to the cycles root directory and so the prefixes are redundant. For patches and branches, git merge and rebase should be able to detect the renames and move over code to the right file.	2021-10-26 15:37:04 +02:00

16 Commits