test2

Author	SHA1	Message	Date
Campbell Barton	58ea0e051f	Cleanup: spelling in comments	2023-11-09 09:54:28 +11:00
Campbell Barton	6bba008325	Cleanup: format	2023-11-09 09:34:49 +11:00
Michael Jones	051ce95628	Cycles: Use Metal Program Scope Global Built-ins on macOS >= 14.0 This PR simplifies the kernel entrypoints by using [Metal Program Scope Global Built-ins](https://developer.apple.com/metal/Metal-Shading-Language-Specification.pdf) when available (macOS >= 14.0). Pull Request: https://projects.blender.org/blender/blender/pulls/114535	2023-11-07 11:20:16 +01:00
Michael Jones	1c1c6ac457	Cycles: Fix last failing unit test (T39823) on MetalRT This PR fixes T39823, the sole failing unit test when running with MetalRT. It does so by implementing and binding a missing intersection handler (`__anyhit__cycles_metalrt_volume_test_tri`) which is required for `scene_intersect_volume` (as used by `integrator_volume_stack_update_for_subsurface`) to work as intended. This scene exposed the error as it uses subsurface scattering on a sphere which is intersected by volume. Pull Request: https://projects.blender.org/blender/blender/pulls/112876	2023-09-25 22:41:27 +02:00
Campbell Barton	b7f3e0d84e	Cleanup: spelling & punctuation in comments Also remove some unhelpful/redundant comments.	2023-09-14 13:25:24 +10:00
Harley Acheson	092b568a90	Cleanup: Make format Formatting changes resulting from Make Format	2023-09-13 11:03:43 -07:00
Michael Jones	6c98cb73ac	Cycles: Use new MetalRT curve primitives for 3D curves and ribbons This patch updates the experimental MetalRT code path to use new [curve primitives](https://developer.apple.com/videos/play/wwdc2023/10128/) which were recently added in macOS 14. This replaces the previous custom box intersection implementation, allowing the driver to better optimise curve acceleration structures for the GPU. On existing hardware, this can speed up MetalRT renders by up to 40% for scenes that use hair / curve primitives extensively. The MetalRT option will only be available on macOS >= 14, and requires Xcode >= 15 to build (otherwise the option will be compiled out). Authored by Marco Giordano, Michael Jones, and Jason Fielder --- Before / after render times (M1 Max MacBook Pro, macOS 14 beta, MetalRT enabled): ``` Custom box intersection MetalRT curve primitives Speedup fishy_cat 111.5 80.5 1.39 koro 114.4 86.7 1.32 sinosauropteryx 291.8 279.2 1.05 spring 142.3 142.2 1.00 victor 442.7 347.7 1.27 ``` --- Pull Request: https://projects.blender.org/blender/blender/pulls/111795	2023-09-13 16:02:49 +02:00
Campbell Barton	9e41eccc6e	Cleanup: spelling in comments	2023-09-08 17:12:29 +10:00
Sergey Sharybin	7e4a51329b	Fix shadow linking for Cycles Metal RT The shadow intersection kernels needs to perform extra checks to see whether object is really considered a blocker. Pull Request: https://projects.blender.org/blender/blender/pulls/112012	2023-09-06 15:25:30 +02:00
Campbell Barton	1f01a64403	Cleanup: spelling in comments	2023-09-06 14:23:01 +10:00
Sergey Sharybin	71b4a97cbc	Refactor: De-duplicate Metal RT self intersection checks Use the common BVH utilities header for this. Added a special type qualifier ccl_ray_data which is defined to ccl_private for all platforms but Metal. On Metal it is defined to ray_data. The tricky part is that the BVH utilities are wrapped into the Metal context class. In some of the BVH functions the context has been already constructed, but it wasn't done in all the callbacks. From a quick render tests of the Junkshop benchmark scene there is no render time difference, No functional changes are expected. Pull Request: https://projects.blender.org/blender/blender/pulls/111967	2023-09-05 17:21:49 +02:00
Sergey Sharybin	7365f0b094	Cleanup: Cover .metal files with `make format` Pull Request: https://projects.blender.org/blender/blender/pulls/111930	2023-09-05 09:59:47 +02:00
Sergey Sharybin	c59c97c947	Cleanup: Ensure correct order of headers in Metal kernel Explicitly splint into groups of headers, so that clang-format does not ruin the required order of headers.	2023-09-05 09:59:41 +02:00
Campbell Barton	0caf227530	License headers: use SPDX-FileCopyrightText for .inl and .osl files	2023-08-04 13:24:17 +10:00
Campbell Barton	c12994612b	License headers: use SPDX-FileCopyrightText in intern/cycles	2023-06-14 16:53:23 +10:00
Michael Jones	d0467a277a	Cycles: MetalRT: Don't apply local object transform if it is baked This patch fixes the failing shader/bevel unit test when MetalRT is enabled. The ray was being transformed into local object space even when the SD_OBJECT_TRANSFORM_APPLIED flag was set. Pull Request: https://projects.blender.org/blender/blender/pulls/107292	2023-04-24 15:20:21 +02:00
Michael Jones	5f61eca7af	Cycles: Exploit non-uniform threadgroup sizes on Metal This patch replaces `dispatchThreadgroups` with `dispatchThreads` which takes care of non-uniform threadgroup bounds. This allows us to remove the bounds guards in the integrator kernel entry points. Pull Request: https://projects.blender.org/blender/blender/pulls/106217	2023-03-29 21:46:11 +02:00
Michael Jones	944a5854c6	Cycles: Fix MetalRT shadow all hit bug This patch fixes a MetalRT issue where viable shadow hits are discounted based on the false assumption that hits are ordered by distance. With this patch, the following unit tests now pass: - openvdb smoke - shadow catcher pt transparent lamp only 0.8 - shadow catcher pt transparent lamp only 1.0 Pull Request: https://projects.blender.org/blender/blender/pulls/106276	2023-03-29 20:20:07 +02:00
Julian Eisel	30e517c3ca	Merge branch 'blender-v3.5-release'	2023-03-15 13:07:26 +01:00
Michael Jones	089e8a1887	Cycles: Fix Metal API validation error (use uint instead of ushort) This PR fixes an error that is given when Metal API validation is enabled. The compute grid can exceed 65536 threads so `ushort` is not sufficient for `metal_grid_id [[threadgroup_position_in_grid]]`. This PR also fixes OS version warnings ([Cycles Metal: Unguarded access to newer macOS features #105630](https://projects.blender.org/blender/blender/issues/105630)) Pull Request: https://projects.blender.org/blender/blender/pulls/105763	2023-03-14 22:05:55 +01:00
William Leeson	6c03339e48	Cycles: reduce mesh memory usage by unflattening To improve mesh upload speeds and reduce the size of the scene data which allows larger scenes to be rendered. The meshes in Cycles are currently stored as flattened meshes, where each triangle is stored as a set of 3 vertices. Unflattening writes out the vertices in a list according to the index buffer. This uses a lot of memory and for current hardware does not provide a noticeable benefit. This change unflattens the mesh by directly using the meshes vertex and index buffers directly and skips the unflattening. This change allows for larger scenes and also a reduction in the sizes of the meshes. Further it results in a decrease the amount of time it takes to upload the data to a GPU. This is especially important for when multiple GPUs are used in a single machine. Pull Request #105173	2023-02-27 10:39:19 +01:00
Brecht Van Lommel	02c2970983	Cycles: add NanoVDB support for Metal on Apple Silicon Contributed by Yulia Kuznetcova at Apple. NanoVDB is patched to give add address spaces required by Metal. We hope that in the future Metal will support the generic address space. For AMD and Intel this is currently not available since it causes a performance regression also on scenes without volumes. Pull Request #104837	2023-02-21 15:03:52 +01:00
Michael Jones	2d994de77c	Cycles: MetalRT optimisation for subsurface intersection queries This patch optimises subsurface intersection queries on MetalRT. Currently intersect_local traverses from the scene root, retrospectively discarding all non-local hits. Using a lookup of bottom level acceleration structures, we can explicitly query only the relevant instance. On M1 Max, with MetalRT selected, this can give a render speedup of 15-20% for scenes like Monster which make heavy use of subsurface scattering. Patch authored by Marco Giordano. Reviewed By: brecht Differential Revision: https://developer.blender.org/D17153	2023-02-06 19:12:29 +00:00
Michael Jones	654e1e901b	Cycles: Use local atomics for faster shader sorting (enabled on Metal) This patch adds two new kernels: SORT_BUCKET_PASS and SORT_WRITE_PASS. These replace PREFIX_SUM and SORTED_PATHS_ARRAY on supported devices (currently implemented on Metal, but will be trivial to enable on the other backends). The new kernels exploit sort partitioning (see D15331) by sorting each partition separately using local atomics. This can give an overall render speedup of 2-3% depending on architecture. As before, we fall back to the original non-partitioned sorting when the shader count is "too high". Reviewed By: brecht Differential Revision: https://developer.blender.org/D16909	2023-02-06 11:18:26 +00:00
Campbell Barton	79c82fc1c5	Cleanup: trailing space	2023-01-31 15:49:04 +11:00
Hallam Roberts	a501a2dbff	Images: add mirror extension type This adds a new mirror image extension type for shaders and geometry nodes (next to the existing repeat, extend and clip options). See D16432 for a more detailed explanation of `wrap_mirror`. This also adds a new sampler flag `GPU_SAMPLER_MIRROR_REPEAT`. It acts as a modifier to `GPU_SAMPLER_REPEAT`, so any `REPEAT` flag must be set for the `MIRROR` flag to have an effect. Differential Revision: https://developer.blender.org/D16432	2022-12-14 19:27:29 +01:00
Michael Jones	b0e2e45496	Cycles: Enable MetalRT pointclouds & other fixes Code authored by Marco Giordano. This fixes pointcloud rendering on MetalRT and some other subtle MetalRT bugs: - Incorrect kernel hashing - Missing specialisation constants - Incorrect visibility filtering - Missing null pointer check Reviewed By: brecht Differential Revision: https://developer.blender.org/D16499	2022-11-14 16:39:18 +00:00
Patrick Mours	e6b38deb9d	Cycles: Add basic support for using OSL with OptiX This patch generalizes the OSL support in Cycles to include GPU device types and adds an implementation for that in the OptiX device. There are some caveats still, including simplified texturing due to lack of OIIO on the GPU and a few missing OSL intrinsics. Note that this is incomplete and missing an update to the OSL library before being enabled! The implementation is already committed now to simplify further development. Maniphest Tasks: T101222 Differential Revision: https://developer.blender.org/D15902	2022-11-09 15:30:21 +01:00
Lukas Stockner	e2a93e9c7c	Fix T94136: Cycles: No Hair Shadows with Transparent BSDF	2022-10-20 04:47:21 +02:00
Morteza Mostajab	e6902d19a0	Cycles: Allow Intel GPUs under Metal Known Issues: - Command buffer failures when using binary archives (binary archives is disabled for Intel GPUs as a workaround) - Wrong texture sampler being applied (to be addressed in the future) Ref T92212 Reviewed By: brecht Maniphest Tasks: T92212 Differential Revision: https://developer.blender.org/D16253	2022-10-19 17:09:38 +01:00
Michael Jones	2b88ee50fb	Cycles: Tweak inlining policy on Metal This patch optimises the Metal inlining policy. It gives a small speedup (2-3% on M1 Max) with no notable compilation slowdown vs what is already in master. Previously noted compilation slowdowns (as reported in T100102) were caused by forcing inlining for `ccl_device`, but we get better rendering perf by relying on compiler heuristics in these cases. Reviewed By: brecht Differential Revision: https://developer.blender.org/D16081	2022-09-27 17:01:28 +01:00
Brecht Van Lommel	6d08ba8a50	Fix T100824: Cycles GPU render broken on macOS 13 Beta and Apple silicon The recent revert of Apple silicon inlining changes to avoid long compile times worked on macOS 12, but in macOS 13 Beta it results in render errors. This may be a compiler bug and perhaps get fixed in time, but try to be on the safe side and ensure Blender 3.3.0 works regardless. This brings part of the inlining back, which brings improved performance but also longer compiler times again. Compile time is around 2min now, where the previous full inlining was about 5-7min. Patch by Michael Jones. Differential Revision: https://developer.blender.org/D15897	2022-09-06 19:11:52 +02:00
Brecht Van Lommel	9961aae1e6	Merge branch 'blender-v3.3-release'	2022-08-18 20:31:34 +02:00
Brecht Van Lommel	e11c899e71	Cycles: disable Metal inlining optimization on Apple GPUs This gave a 1.1x speedup, however also leads to very long compile times that make it seems like Blender has stopped working. This can be brought back in the future behind an option that users can explicitly enabled. Fix T100102 Ref D14923, D14763, T92212	2022-08-18 20:01:29 +02:00
Brecht Van Lommel	3aeacb9ab3	Merge branch 'blender-v3.3-release'	2022-08-15 13:53:42 +02:00
Brecht Van Lommel	c2c019dda8	Fix Cycles MetalRT compile error	2022-08-13 19:55:38 +02:00
Brecht Van Lommel	1988665c3c	Cleanup: make vector types make/print functions consistent between CPU and GPU Now all the same ones are available on CPU and GPU, which was previously not possible due to lack of operator overloadng in OpenCL. Print functions are no-ops on some GPUs. Ref D15535	2022-08-09 16:07:23 +02:00
Brecht Van Lommel	fa514564b0	Fix T99201: Cycles render difference with 3D hair curves between OptiX and Emrbee It should consistently use the Cycles pirmitive ID for self intersection detection, not the one from the OptiX or Embree acceleration structure. Differential Revision: https://developer.blender.org/D15632	2022-08-05 15:03:47 +02:00
Brecht Van Lommel	38af5b0501	Cycles: switch Cycles triangle barycentric convention to match Embree/OptiX Simplifies intersection code a little and slightly improves precision regarding self intersection. The parametric texture coordinate in shader nodes is still the same as before for compatibility.	2022-07-27 21:03:33 +02:00
Brecht Van Lommel	4cf6524731	Fix Cycles Metal build errors after recent changes float8 is a reserved type in Metal, but is not implemented. So rename to float8_t for now. Also move back intersection handlers to kernel.metal, they can't be in the class that encapsulates the other Metal kernel functions.	2022-07-26 00:17:37 +02:00
Brecht Van Lommel	7a74d91e32	Cleanup: move device BVH code to kernel/device/*/bvh.h Having the OptiX/MetalRT/Embree/MetalRT implementations all in one file with many #ifdefs became too confusing. Instead split it up per device, and also move it together with device specific hit/filter/intersect functions and associated data types.	2022-07-25 16:34:22 +02:00
Brecht Van Lommel	484ad31653	Cycles: simplify handling of ray distance in GPU rendering All our intersections functions now work with unnormalized ray direction, which means we no longer need to transform ray distance between world and object space, they can all remain in world space. There doesn't seem to be any real performance difference one way or the other, but it does simplify the code.	2022-07-25 13:27:40 +02:00
Brecht Van Lommel	5152c7c152	Cycles: refactor rays to have start and end distance, fix precision issues For transparency, volume and light intersection rays, adjust these distances rather than the ray start position. This way we increment the start distance by the smallest possible float increment to avoid self intersections, and be sure it works as the distance compared to be will be exactly the same as before, due to the ray start position and direction remaining the same. Fix T98764, T96537, hair ray tracing precision issues. Differential Revision: https://developer.blender.org/D15455	2022-07-15 18:46:24 +02:00
Brecht Van Lommel	bb376da6df	Fix Cycles MetalRT error after recent specialization changes	2022-07-15 18:28:13 +02:00
Michael Jones	da4ef05e4d	Cycles: Apple Silicon optimization to specialize intersection kernels The Metal backend now compiles and caches a second set of kernels which are optimized for scene contents, enabled for Apple Silicon. The implementation supports doing this both for intersection and shading kernels. However this is currently only enabled for intersection kernels that are quick to compile, and already give a good speedup. Enabling this for shading kernels would be faster still, however this also causes a long wait times and would need a good user interface to control this. M1 Max samples per minute (macOS 13.0): PSO_GENERIC PSO_SPECIALIZED_INTERSECT PSO_SPECIALIZED_SHADE barbershop_interior 83.4 89.5 93.7 bmw27 1486.1 1671.0 1825.8 classroom 175.2 196.8 206.3 fishy_cat 674.2 704.3 719.3 junkshop 205.4 212.0 257.7 koro 310.1 336.1 342.8 monster 376.7 418.6 424.1 pabellon 273.5 325.4 339.8 sponza 830.6 929.6 1142.4 victor 86.7 96.4 96.3 wdas_cloud 111.8 112.7 183.1 Code contributed by Jason Fielder, Morteza Mostajabodaveh and Michael Jones Differential Revision: https://developer.blender.org/D14645	2022-07-15 13:40:04 +02:00
Brecht Van Lommel	ff1883307f	Cleanup: renaming and consistency for kernel data * Rename "texture" to "data array". This has not used textures for a long time, there are just global memory arrays now. (On old CUDA GPUs there was a cache for textures but not global memory, so we used to put all data in textures.) * For CUDA and HIP, put globals in KernelParams struct like other devices. * Drop __ prefix for data array names, no possibility for naming conflict now that these are in a struct.	2022-06-20 12:30:48 +02:00
Michael Jones	007184bcf2	Enable inlining on Apple Silicon. Use new process-wide ShaderCache in order to safely re-enable binary archives This patch is the same as D14763, but with a fix for unit test failures caused by ShaderCache fetch logic not working in the non-MetalRT case: ``` diff --git a/intern/cycles/device/metal/kernel.mm b/intern/cycles/device/metal/kernel.mm index ad268ae7057..6aa1a56056e 100644 --- a/intern/cycles/device/metal/kernel.mm +++ b/intern/cycles/device/metal/kernel.mm @@ -203,9 +203,12 @@ bool kernel_has_intersection(DeviceKernel device_kernel) /* metalrt options / request.pipeline->use_metalrt = device->use_metalrt; - request.pipeline->metalrt_hair = device->kernel_features & KERNEL_FEATURE_HAIR; - request.pipeline->metalrt_hair_thick = device->kernel_features & KERNEL_FEATURE_HAIR_THICK; - request.pipeline->metalrt_pointcloud = device->kernel_features & KERNEL_FEATURE_POINTCLOUD; + request.pipeline->metalrt_hair = device->use_metalrt && + (device->kernel_features & KERNEL_FEATURE_HAIR); + request.pipeline->metalrt_hair_thick = device->use_metalrt && + (device->kernel_features & KERNEL_FEATURE_HAIR_THICK); + request.pipeline->metalrt_pointcloud = device->use_metalrt && + (device->kernel_features & KERNEL_FEATURE_POINTCLOUD); { thread_scoped_lock lock(cache_mutex); @@ -225,9 +228,9 @@ bool kernel_has_intersection(DeviceKernel device_kernel) / metalrt options / bool use_metalrt = device->use_metalrt; - bool metalrt_hair = device->kernel_features & KERNEL_FEATURE_HAIR; - bool metalrt_hair_thick = device->kernel_features & KERNEL_FEATURE_HAIR_THICK; - bool metalrt_pointcloud = device->kernel_features & KERNEL_FEATURE_POINTCLOUD; + bool metalrt_hair = use_metalrt && (device->kernel_features & KERNEL_FEATURE_HAIR); + bool metalrt_hair_thick = use_metalrt && (device->kernel_features & KERNEL_FEATURE_HAIR_THICK); + bool metalrt_pointcloud = use_metalrt && (device->kernel_features & KERNEL_FEATURE_POINTCLOUD); MetalKernelPipeline best_pipeline = nullptr; for (auto &pipeline : collection) { ``` Reviewed By: brecht Differential Revision: https://developer.blender.org/D14923	2022-05-11 16:20:59 +01:00
Brecht Van Lommel	52a5f68562	Revert "Cycles: Enable inlining on Apple Silicon for 1.1x speedup" This reverts commit `b82de02e7c`. It is causing crashes in various regression tests. Ref D14763	2022-04-28 00:46:43 +02:00
Michael Jones	b82de02e7c	Cycles: Enable inlining on Apple Silicon for 1.1x speedup This is a stripped down version of D14645 without the scene specialisation optimisations. The two major changes in this patch are: - Enables more aggressive inlining on Apple Silicon resulting in a 1.1x speedup and 10% reduction in spill, at the cost of longer pipeline build times - Revival of shader binary archives through a new ShaderCache which is shared between MetalDevice instances using the same physical MTLDevice. This mitigates the extra compile times via explicit caching (rather than, as before, relying on the implicit system shader cache which can be purged without notice) Reviewed By: brecht Differential Revision: https://developer.blender.org/D14763	2022-04-26 22:17:16 +01:00
Stefan Werner	65dcb5ebd3	Cycles: Semantically separate 2D and 3D texture objects Currently there are no functional changes. Preparing for an upcoming oneAPI integration where such separation in types is needed.	2022-04-01 19:44:31 +02:00

1 2

68 Commits