test2

Author	SHA1	Message	Date
Brecht Van Lommel	52a5f68562	Revert "Cycles: Enable inlining on Apple Silicon for 1.1x speedup" This reverts commit `b82de02e7c`. It is causing crashes in various regression tests. Ref D14763	2022-04-28 00:46:43 +02:00
Michael Jones	b82de02e7c	Cycles: Enable inlining on Apple Silicon for 1.1x speedup This is a stripped down version of D14645 without the scene specialisation optimisations. The two major changes in this patch are: - Enables more aggressive inlining on Apple Silicon resulting in a 1.1x speedup and 10% reduction in spill, at the cost of longer pipeline build times - Revival of shader binary archives through a new ShaderCache which is shared between MetalDevice instances using the same physical MTLDevice. This mitigates the extra compile times via explicit caching (rather than, as before, relying on the implicit system shader cache which can be purged without notice) Reviewed By: brecht Differential Revision: https://developer.blender.org/D14763	2022-04-26 22:17:16 +01:00
Brecht Van Lommel	8d2da45f98	Revert "Fix Cycles HIP assuming warp size 32" This reverts commit `390b9f1305`. It seems to break things on Linux for unknown reasons, so leave it out for now. A solution to this will be required for Vega cards though.	2022-04-20 18:09:23 +02:00
Stefan Werner	65dcb5ebd3	Cycles: Semantically separate 2D and 3D texture objects Currently there are no functional changes. Preparing for an upcoming oneAPI integration where such separation in types is needed.	2022-04-01 19:44:31 +02:00
Stefan Werner	9c6dff70c8	Cycles: Introduce postfix for kernel body definition Increases flexibility of code-generation for kernel entry points. Currently no functional changes, preparing for integration with oneAPI.	2022-04-01 19:44:02 +02:00
Brecht Van Lommel	51380b9346	Fix Cycles Metal build error and GCC warning after recent changes Function overloading of make_float4() doesn't work since it's a macro, just don't do this minor cleanup then.	2022-03-23 23:25:31 +01:00
Kévin Dietrich	d84b4becd3	Fix compile error on GCC Explicit template specialization has to happen outside of class definition (some compilers are more lenient). Since it is not possible to specialize the method without also specializing the enclosing class for all of its possible types, the method is moved outside of the class, and specialized there.	2022-03-23 22:01:32 +01:00
Ethan-Hall	4e56e738a8	Cycles: optimize CPU texture sampler interpolation Use templates to optimize the CPU texture sampler to interpolate using float for single component datatypes instead of using float4 for all types. Differential Revision: https://developer.blender.org/D14424	2022-03-23 20:06:12 +01:00
Ethan-Hall	4abb8a14a2	Cycles: make 3D texture sampling at boundaries more similar to GPU CPU code for cubic interpolation with clip texture extension only performed texture interpolation inside the range of [0,1]. As a result, even though the volume's color is sampled using cubic interpolation, the boundary is not being interpolated. The GPU appears was interpolating samples that span the clip boundary softening the edge, which the CPU now does also. This commit also includes refactoring of 2D and 3D texture sampling in preparation of adding new extension modes. Differential Revision: https://developer.blender.org/D14295	2022-03-21 16:38:13 +01:00
Brecht Van Lommel	390b9f1305	Fix Cycles HIP assuming warp size 32 In HIP these masks are 64 bit, while in CUDA only 32 bit.	2022-03-16 18:05:48 +01:00
Brecht Van Lommel	076079454f	Cleanup: remove some unused Cycles GPU code To make porting to other architectures easier, clarifying that this does not need to be supported. The unused parallel_reduce implementation assumed warp size 32, but is easy to update if we ever need it in the future.	2022-03-16 18:05:08 +01:00
Ethan-Hall	3902bebf18	Cycles: make smart interpolation fallback to cubic for GPU Matching CPU and Eevee behavior. Differential Revision: https://developer.blender.org/D14296	2022-03-11 18:27:58 +01:00
Brecht Van Lommel	a9a05d5597	Merge branch 'blender-v3.1-release'	2022-02-15 01:05:47 +01:00
Brecht Van Lommel	facd9d8268	Cleanup: clang-format	2022-02-15 01:05:25 +01:00
Brecht Van Lommel	35c261dfcf	Merge branch 'blender-v3.1-release'	2022-02-11 23:58:41 +01:00
Michael Jones	27d3140b13	Cycles: Fix Metal kernel compilation for AMD GPUs Workaround for a compilation issue preventing kernels compiling for AMD GPUs: Avoid problematic use of templates on Metal by making `gpu_parallel_active_index_array` a wrapper macro, and moving `blocksize` to be a macro parameter. Reviewed By: brecht Differential Revision: https://developer.blender.org/D14081	2022-02-11 22:52:48 +00:00
Brecht Van Lommel	9cfc7967dd	Cycles: use SPDX license headers * Replace license text in headers with SPDX identifiers. * Remove specific license info from outdated readme.txt, instead leave details to the source files. * Add list of SPDX license identifiers used, and corresponding license texts. * Update copyright dates while we're at it. Ref D14069, T95597	2022-02-11 17:47:34 +01:00
William Leeson	ae44070341	Cycles: explicitly skip self-intersection Remember the last intersected primitive and skip any intersections with the same primitive. Ref D12954	2022-01-26 17:51:05 +01:00
Brecht Van Lommel	d68ce0e475	Cycles: add pointcloud implementation for Metal RT This is not currently working, with an internal compiler error. However we are currently using BVH2 instead of Metal RT. So this has no effect for users, it's being committed to avoid the code getting outdated. Ref T92573, T92212 Differential Revision: https://developer.blender.org/D13632	2022-01-21 14:42:27 +01:00
Brecht Van Lommel	0cf2fafd81	Fix T94050, T94570, T94527: Cycles Bevel and AO nodes not working with Metal Workaround what may be a compiler bug, solution found by Michael Jones.	2022-01-13 10:40:41 +01:00
Michael Jones	efe3d60a2c	Cycles: Fix Metal build This patch fixes a couple of new Metal kernel compilation errors: 1) a kernel parameter count overflow, and 2) missing address space qualifiers. Reviewed By: brecht Differential Revision: https://developer.blender.org/D13763	2022-01-07 16:19:31 +00:00
Patrick Mours	8393ccd076	Cycles: Add OptiX temporal denoising support Enables the `bpy.ops.cycles.denoise_animation()` operator again and modifies it to support temporal denoising with OptiX. This requires renders that were done with both the "Vector" and "Denoising Data" passes. Differential Revision: https://developer.blender.org/D11442	2022-01-05 15:58:36 +01:00
Brecht Van Lommel	e2e7f7ea52	Fix Cycles OptiX crash with 3D curves after point cloud changes Includes refactoring to reduce the number of bits taken by primitive types, so they more easily fit in the OptiX limit.	2021-12-20 14:14:43 +01:00
Brecht Van Lommel	35b1e9fc3a	Cycles: pointcloud rendering This add support for rendering of the point cloud object in Blender, as a native geometry type in Cycles that is more memory and time efficient than instancing sphere meshes. This can be useful for rendering sand, water splashes, particles, motion graphics, etc. Points are currently always rendered as spheres, with backface culling. More shapes are likely to be added later, but this is the most important one and can be customized with shaders. For CPU rendering the Embree primitive is used, for GPU there is our own intersection code. Motion blur is suppored. Volumes inside points are not currently supported. Implemented with help from: * Kévin Dietrich: Alembic procedural integration * Patrick Mourse: OptiX integration * Josh Whelchel: update for cycles-x changes Ref T92573 Differential Revision: https://developer.blender.org/D9887	2021-12-16 20:54:04 +01:00
Michael Jones	9558fa5196	Cycles: Metal host-side code This patch adds the Metal host-side code: - Add all core host-side Metal backend files (device_impl, queue, etc) - Add MetalRT BVH setup files - Integrate with Cycles device enumeration code - Revive `path_source_replace_includes` in util/path (required for MSL compilation) This patch also includes a couple of small kernel-side fixes: - Add an implementation of `lgammaf` for Metal [Nemes, Gergő (2010), "New asymptotic expansion for the Gamma function", Archiv der Mathematik](https://users.renyi.hu/~gergonemes/) - include "work_stealing.h" inside the Metal context class because it accesses state now Ref T92212 Reviewed By: brecht Maniphest Tasks: T92212 Differential Revision: https://developer.blender.org/D13423	2021-12-07 15:52:21 +00:00
Campbell Barton	ac447ba1a3	Cleanup: clang-format, trailing space	2021-11-30 10:15:17 +11:00
Brecht Van Lommel	4fac3be146	Fix Cycles OptiX doing a bit too much work for almost opaque curve shadows Found in D13353, likely has no significant impact in performance.	2021-11-29 18:41:37 +01:00
Michael Jones	f613c4c095	Cycles: MetalRT support (kernel side) This patch adds MetalRT support to Cycles kernel code. It is mostly additive in nature or confined to Metal-specific code, however there are a few areas where this interacts with other code: - MetalRT closely follows the Optix implementation, and in some cases (notably handling of transforms) it makes sense to extend Optix special-casing to MetalRT. For these generalisations we now have `__KERNEL_GPU_RAYTRACING__` instead of `__KERNEL_OPTIX__`. - MetalRT doesn't support primitive offsetting (as with `primitiveIndexOffset` in Optix), so we define and populate a new kernel texture, `__object_prim_offset`, containing per-object primitive / curve-segment offsets. This is referenced and applied in MetalRT intersection handlers. - Two new BVH layout enum values have been added: `BVH_LAYOUT_METAL` and `BVH_LAYOUT_MULTI_METAL_EMBREE` for XPU mode). Some host-side enum case handling has been updated where it is trivial to do so. Ref T92212 Reviewed By: brecht Maniphest Tasks: T92212 Differential Revision: https://developer.blender.org/D13353	2021-11-29 15:20:26 +00:00
Michael Jones	eb7827e797	Cycles: Fix film convert address space mismatch on Metal This patch fixes an address space mismatch in the film convert kernels on Metal. The `film_get_pass_pixel_...` functions take a `ccl_private` result pointer, but the film convert kernels pass a `ccl_global` memory pointer. Specialising the pass-fetch functions with templates results in compilation errors on Visual Studio, so instead this patch just adds an intermediate local on Metal. Reviewed By: brecht Differential Revision: https://developer.blender.org/D13350	2021-11-26 13:58:48 +00:00
William Leeson	c49d2cbe92	Merge branch 'blender-v3.0-release' to bring in D13042: Fix performance decrease with Scrambling Distance on	2021-11-25 09:41:03 +01:00
Alaska	b41c72b710	Fix performance decrease with Scrambling Distance on With the current code in master, scrambling distance is enabled on non-hardware accelerated ray tracing devices see a measurable performance decrease when compared scrambling distance on vs off. From testing, this performance decrease comes from the large tile sizes scheduled in `tile.cpp`. This patch attempts to address the performance decrease by using different algorithms to calculate the tile size for devices with hardware accelerated ray traversal and devices without. Large tile sizes for hardware accelerated devices and small tile sizes for others. Most of this code is based on proposals from @brecht and @leesonw Reviewed By: brecht, leesonw Differential Revision: https://developer.blender.org/D13042	2021-11-25 09:32:26 +01:00
Patrick Mours	7a97e925fd	Cycles: Add support for building with OptiX 7.4 SDK and use built-in catmull-rom curve type Some enum names were changed/removed in OptiX 7.4, so some changes are necessary to make things compile still. In addition, OptiX 7.4 also adds built-in support for catmull-rom curves, so it is no longer necessary to convert the catmull-rom data to cubic bsplines first, and has endcaps disabled by default now, so can remove the special handling via any-hit programs that filtered them out before. Differential Revision: https://developer.blender.org/D13351	2021-11-24 16:33:04 +01:00
Brecht Van Lommel	1b94c53aa6	Cleanup: fix typos in comments and docs Contributed by luzpaz. Differential Revision: https://developer.blender.org/D10447	2021-11-19 13:02:16 +01:00
Michael Jones	d1f944c186	Cycles: declare constants at program scope on Metal MSL requires that constant address space literals be declared at program scope. This patch moves the `blackbody_table_r/g/b` and `cie_colour_match` constants into separate files so they can be declared at the appropriate scope. Ref T92212 Differential Revision: https://developer.blender.org/D13241	2021-11-18 14:38:05 +01:00
Michael Jones	d19e35873f	Cycles: several small fixes and additions for MSL This patch contains many small leftover fixes and additions that are required for Metal-enablement: - Address space fixes and a few other small compile fixes - Addition of missing functionality to the Metal adapter headers - Addition of various scattered `__KERNEL_METAL__` blocks (e.g. for atomic support & maths functions) Ref T92212 Differential Revision: https://developer.blender.org/D13263	2021-11-18 14:38:02 +01:00
Brecht Van Lommel	063ad8635e	Cycles: reduce triangle memory usage with packed_float3 Depends on D13243 Differential Revision: https://developer.blender.org/D13244	2021-11-17 17:29:41 +01:00
Brecht Van Lommel	9937d5379c	Cycles: add packed_float3 type for storage Introduce a packed_float3 type for smaller storage that is exactly 3 floats, instead of 4. For computation float3 is still used since it can use SIMD instructions. Ref T92212 Differential Revision: https://developer.blender.org/D13243	2021-11-17 17:29:41 +01:00
Michael Jones	64003fa4b0	Cycles: Adapt volumetric lambda functions to work on MSL This patch adapts the existing volumetric read/write lambda functions for Metal. Lambda expressions are not supported on MSL, so two new macros `VOLUME_READ_LAMBDA` and `VOLUME_WRITE_LAMBDA` have been defined with a default implementation which, on Metal, is overridden to use inline function objects. This patch also removes the last remaining mention of the now-unused `ccl_addr_space`. Ref T92212 Reviewed By: leesonw Maniphest Tasks: T92212 Differential Revision: https://developer.blender.org/D13234	2021-11-16 13:42:23 +00:00
Campbell Barton	1143bf281a	Cleanup: spelling in comments, comment block formatting	2021-11-13 13:07:13 +11:00
Campbell Barton	acc800d24d	Cleanup: clang-format	2021-11-13 12:47:18 +11:00
Sergey Sharybin	ce395c84a3	Merge branch 'blender-v3.0-release'	2021-11-11 15:29:35 +01:00
Sergey Sharybin	d26d3cfe19	Fix T92868: Cycles catcher with transparency crashes The issue was caused by splitting happening twice. Fixed by checking for split flag which is assigned to the both states during split. The tricky part was to write catcher data at the moment of split: the transparency and shadow catcher sample count is to be accumulated at that point. Now it is happening in the `intersect_closest` kernel. The downside is that render buffer is to be passed to the kernel, but the benefit is that extra split bounce check is not needed now. Had to move the passes write to shadow catcher header, since include of `film/passes.h` causes all the fun of requirement to have BSDF data structures available. Differential Revision: https://developer.blender.org/D13177	2021-11-11 15:21:35 +01:00
Brecht Van Lommel	3fa86f4b28	Merge branch 'blender-v3.0-release'	2021-11-10 20:19:09 +01:00
Brecht Van Lommel	6b0008129e	Fix T92972: Cycles HIP wrong render display after a recent refactor It's unclear why this fails. Maybe the size of half4 is not the expected 8 bytes and adjacent pixels are overwritten. Or there is some bug in the HIP compiler writing a struct into global memory, which we probably don't do elsewhere in the kernel. Thanks to Thomas, William and Jeroen for helping investigate this.	2021-11-10 20:03:07 +01:00
Patrick Mours	f565620435	Fix T92985: CUDA errors with Cycles film convert kernels rB3a4c8f406a3a3bf0627477c6183a594fa707a6e2 changed the macros that create the film convert kernel entry points, but in the process accidentally changed the parameter definition to one of those (which caused CUDA launch and misaligned address errors) and changed the implementation as well. This restores the correct implementation from before. In addition, the `ccl_gpu_kernel_threads` macro did not work as intended and caused the generated launch bounds to end up with an incorrect input for the second parameter (it was set to "thread_num_registers", rather than the result of the block number calculation). I'm not entirely sure why, as the macro definition looked sound to me. Decided to simply go with two separate macros instead, to simplify and solve this. Also changed how state is captured with the `ccl_gpu_kernel_lambda` macro slightly, to avoid a compiler warning (expression has no effect) that otherwise occurred. Maniphest Tasks: T92985 Differential Revision: https://developer.blender.org/D13175	2021-11-10 15:49:50 +01:00
Michael Jones	3a4c8f406a	Cycles: Adapt shared kernel/device/gpu layer for MSL This patch adapts the shared kernel entrypoints so that they can be compiled as MSL (Metal Shading Language). Where possible, the adaptations avoid changes in common code. In MSL, kernel function inputs are explicitly bound to resources. In the case of argument buffers, we declare a struct containing the kernel arguments, accessible via device pointer. This differs from CUDA and HIP where kernel function arguments are declared as traditional C-style function parameters. This patch adapts the entrypoints declared in kernel.h so that they can be translated via a new `ccl_gpu_kernel_signature` macro into the required parameter struct + kernel entrypoint pairing for MSL. MSL buffer attribution must be applied to function parameters or non-static class data members. To allow universal access to the integrator state, kernel data, and texture fetch adapters, we wrap all of the shared kernel code in a `MetalKernelContext` class. This is achieved by bracketing the appropriate kernel headers with "context_begin.h" and "context_end.h" on Metal. When calling deeper into the kernel code, we must reference the context class (e.g. `context.integrator_init_from_camera`). This extra prefixing is performed by a set of defines in "context_end.h". These will require explicit maintenance if entrypoints change. We invite discussion on more maintainable ways to enforce correctness. Lambda expressions are not supported on MSL, so a new `ccl_gpu_kernel_lambda` macro generates an inline function object and optionally capturing any required state. This yields the same behaviour. This approach is applied to all parallel_... implementations which are templated by operation. The lambda expressions in the film_convert... kernels don't adapt cleanly to use function objects. However, these entrypoints can be macro-generated more concisely to avoid lambda expressions entirely, instead relying on constant folding to handle the pixel/channel conversions. A separate implementation of `gpu_parallel_active_index_array` is provided for Metal to workaround some subtle differences in SIMD width, and also to encapsulate some required thread parameters which must be declared as explicit entrypoint function parameters. Ref T92212 Reviewed By: brecht Maniphest Tasks: T92212 Differential Revision: https://developer.blender.org/D13109	2021-11-09 21:43:10 +00:00
Patrick Mours	440a3475b8	Cycles: Improve OptiX denoising with dark images and fix crash when denoiser is destroyed Adds a pass before denoising that calculates the intensity of the image, which can be passed into the OptiX denoiser for more optimal results for very dark or very bright images. In addition this also fixes a crash that sometimes occurred on exit. The OptiX denoiser object has to be destroyed before the OptiX device context object (since it references that). But in C++ the destructor function of a class is called before its fields are destructed, so "~OptiXDevice" was always called before "OptiXDevice::~Denoiser" and therefore "optixDeviceContextDestroy" was called before "optixDenoiserDestroy", hence the crash. Differential Revision: https://developer.blender.org/D13160	2021-11-09 14:49:00 +01:00
Brecht Van Lommel	97ff37bf54	Cycles: perform CPU film reading in the kernel, to use AVX2 half conversion Adds a bunch of CPU kernel function to process on row of pixels, and use those instead of calling unoptimized implementations. Fixes T92598	2021-11-05 22:04:36 +01:00
Brecht Van Lommel	fd25e883e2	Cycles: remove prefix from source code file names Remove prefix of filenames that is the same as the folder name. This used to help when #includes were using individual files, but now they are always relative to the cycles root directory and so the prefixes are redundant. For patches and branches, git merge and rebase should be able to detect the renames and move over code to the right file.	2021-10-26 15:37:04 +02:00
Brecht Van Lommel	d7d40745fa	Cycles: changes to source code folders structure * Split render/ into scene/ and session/. The scene/ folder now contains the scene and its nodes. The session/ folder contains the render session and associated data structures like drivers and render buffers. * Move top level kernel headers into new folders kernel/camera/, kernel/film/, kernel/light/, kernel/sample/, kernel/util/ * Move integrator related kernel headers into kernel/integrator/ * Move OSL shaders from kernel/shaders/ to kernel/osl/shaders/ For patches and branches, git merge and rebase should be able to detect the renames and move over code to the right file.	2021-10-26 15:36:39 +02:00

1 2

72 Commits