Commit Graph

80 Commits

Author SHA1 Message Date
Michael Jones
4412e14708 Cycles: Useful Metal backend debug & profiling functionality
This patch adds some useful debugging & profiling env vars to the Metal backend:

- `CYCLES_METAL_PROFILING`: output a per-kernel timing report at the end of the render
- `CYCLES_METAL_DEBUG`: enable per-dispatch tracing (very verbose)
- `CYCLES_DEBUG_METAL_CAPTURE_KERNEL`: enable programatic .gputrace capture for a specified kernel index

Here's an example of the timing report with `CYCLES_METAL_PROFILING` enabled:

```
---------------------------------------------------------------------------------------------------
Kernel name                                 Total threads   Dispatches     Avg. T/D    Time   Time%
---------------------------------------------------------------------------------------------------
integrator_init_from_camera                   657,407,232          161    4,083,274   0.24s   0.51%
integrator_intersect_closest                1,629,288,440          681    2,392,494  15.18s  32.12%
integrator_intersect_shadow                   751,652,291          470    1,599,260   5.80s  12.28%
integrator_shade_background                   304,612,074          263    1,158,220   1.16s   2.45%
integrator_shade_surface                    1,159,764,041          676    1,715,627  20.57s  43.52%
integrator_shade_shadow                       598,885,847          418    1,432,741   1.27s   2.69%
integrator_queued_paths_array               2,969,650,130          805    3,689,006   0.35s   0.74%
integrator_queued_shadow_paths_array          593,936,619          379    1,567,115   0.14s   0.29%
integrator_terminated_paths_array              22,205,417          155      143,260   0.05s   0.10%
integrator_sorted_paths_array               2,517,140,043          676    3,723,579   1.65s   3.50%
integrator_compact_paths_array                648,912,748          155    4,186,533   0.03s   0.07%
integrator_compact_states                      20,872,687          155      134,662   0.14s   0.29%
integrator_terminated_shadow_paths_array      374,100,675          438      854,111   0.16s   0.33%
integrator_compact_shadow_paths_array         503,768,657          438    1,150,156   0.05s   0.10%
integrator_compact_shadow_states               37,664,941          202      186,460   0.23s   0.50%
integrator_reset                               25,165,824            6    4,194,304   0.06s   0.12%
film_convert_combined_half_rgba                 3,110,400            6      518,400   0.00s   0.01%
prefix_sum                                            676          676            1   0.19s   0.40%
---------------------------------------------------------------------------------------------------
                                                                 6,760               47.27s 100.00%
---------------------------------------------------------------------------------------------------
```

Reviewed By: brecht

Differential Revision: https://developer.blender.org/D15044
2022-06-07 11:08:39 +01:00
Brecht Van Lommel
610619c203 Merge branch 'blender-v3.2-release' 2022-05-31 17:35:16 +02:00
Brecht Van Lommel
f2cd7e08fe Fix Cycles MNEE not working for Metal
Move MNEE to own kernel, separate from shader ray-tracing. This does introduce
the limitation that a shader can't use both MNEE and AO/bevel, but that seems
like the better trade-off for now.

We can experiment with bigger kernel organization changes later.

Differential Revision: https://developer.blender.org/D15070
2022-05-31 17:24:43 +02:00
Jacques Lucke
25d216724b Cleanup: make format 2022-05-24 15:53:16 +02:00
Patrick Mours
a8c81ffa83 Cycles: Add half precision float support for volumes with NanoVDB
This patch makes it possible to change the precision with which to
store volume data in the NanoVDB data structure (as float, half, or
using variable bit quantization) via the previously unused precision
field in the volume data block.
It makes it possible to further reduce memory usage during
rendering, at a slight cost to the visual detail of a volume.

Differential Revision: https://developer.blender.org/D10023
2022-05-23 19:08:01 +02:00
Sergey Sharybin
698e394e7e Merge branch 'blender-v3.2-release' 2022-05-23 16:02:52 +02:00
Sergey Sharybin
9bb4bf5748 Fix missing 64bit casts when calculating Cycles render buffer offset
Found those missing casts while looking into a crash report made in
the Blender Chat. Was unable to reproduce the crash, but the casts
should totally be there to avoid integer overflow.
2022-05-23 15:59:52 +02:00
Michael Jones
007184bcf2 Enable inlining on Apple Silicon. Use new process-wide ShaderCache in order to safely re-enable binary archives
This patch is the same as D14763, but with a fix for unit test failures caused by ShaderCache fetch logic not working in the non-MetalRT case:

```
diff --git a/intern/cycles/device/metal/kernel.mm b/intern/cycles/device/metal/kernel.mm
index ad268ae7057..6aa1a56056e 100644
--- a/intern/cycles/device/metal/kernel.mm
+++ b/intern/cycles/device/metal/kernel.mm
@@ -203,9 +203,12 @@ bool kernel_has_intersection(DeviceKernel device_kernel)

   /* metalrt options */
   request.pipeline->use_metalrt = device->use_metalrt;
-  request.pipeline->metalrt_hair = device->kernel_features & KERNEL_FEATURE_HAIR;
-  request.pipeline->metalrt_hair_thick = device->kernel_features & KERNEL_FEATURE_HAIR_THICK;
-  request.pipeline->metalrt_pointcloud = device->kernel_features & KERNEL_FEATURE_POINTCLOUD;
+  request.pipeline->metalrt_hair = device->use_metalrt &&
+                                   (device->kernel_features & KERNEL_FEATURE_HAIR);
+  request.pipeline->metalrt_hair_thick = device->use_metalrt &&
+                                         (device->kernel_features & KERNEL_FEATURE_HAIR_THICK);
+  request.pipeline->metalrt_pointcloud = device->use_metalrt &&
+                                         (device->kernel_features & KERNEL_FEATURE_POINTCLOUD);

   {
     thread_scoped_lock lock(cache_mutex);
@@ -225,9 +228,9 @@ bool kernel_has_intersection(DeviceKernel device_kernel)

   /* metalrt options */
   bool use_metalrt = device->use_metalrt;
-  bool metalrt_hair = device->kernel_features & KERNEL_FEATURE_HAIR;
-  bool metalrt_hair_thick = device->kernel_features & KERNEL_FEATURE_HAIR_THICK;
-  bool metalrt_pointcloud = device->kernel_features & KERNEL_FEATURE_POINTCLOUD;
+  bool metalrt_hair = use_metalrt && (device->kernel_features & KERNEL_FEATURE_HAIR);
+  bool metalrt_hair_thick = use_metalrt && (device->kernel_features & KERNEL_FEATURE_HAIR_THICK);
+  bool metalrt_pointcloud = use_metalrt && (device->kernel_features & KERNEL_FEATURE_POINTCLOUD);

   MetalKernelPipeline *best_pipeline = nullptr;
   for (auto &pipeline : collection) {

```

Reviewed By: brecht

Differential Revision: https://developer.blender.org/D14923
2022-05-11 16:20:59 +01:00
Brecht Van Lommel
52a5f68562 Revert "Cycles: Enable inlining on Apple Silicon for 1.1x speedup"
This reverts commit b82de02e7c. It is causing
crashes in various regression tests.

Ref D14763
2022-04-28 00:46:43 +02:00
Michael Jones
b82de02e7c Cycles: Enable inlining on Apple Silicon for 1.1x speedup
This is a stripped down version of D14645 without the scene specialisation optimisations.

The two major changes in this patch are:

- Enables more aggressive inlining on Apple Silicon resulting in a 1.1x speedup and 10% reduction in spill, at the cost of longer pipeline build times
- Revival of shader binary archives through a new ShaderCache which is shared between MetalDevice instances using the same physical MTLDevice. This mitigates the extra compile times via explicit caching (rather than, as before, relying on the implicit system shader cache which can be purged without notice)

Reviewed By: brecht

Differential Revision: https://developer.blender.org/D14763
2022-04-26 22:17:16 +01:00
Brecht Van Lommel
8d2da45f98 Revert "Fix Cycles HIP assuming warp size 32"
This reverts commit 390b9f1305. It seems to
break things on Linux for unknown reasons, so leave it out for now. A solution
to this will be required for Vega cards though.
2022-04-20 18:09:23 +02:00
Stefan Werner
65dcb5ebd3 Cycles: Semantically separate 2D and 3D texture objects
Currently there are no functional changes.

Preparing for an upcoming oneAPI integration where such separation
in types is needed.
2022-04-01 19:44:31 +02:00
Stefan Werner
9c6dff70c8 Cycles: Introduce postfix for kernel body definition
Increases flexibility of code-generation for kernel entry points.

Currently no functional changes, preparing for integration with oneAPI.
2022-04-01 19:44:02 +02:00
Brecht Van Lommel
51380b9346 Fix Cycles Metal build error and GCC warning after recent changes
Function overloading of make_float4() doesn't work since it's a macro, just
don't do this minor cleanup then.
2022-03-23 23:25:31 +01:00
Kévin Dietrich
d84b4becd3 Fix compile error on GCC
Explicit template specialization has to happen outside of class
definition (some compilers are more lenient). Since it is not possible to
specialize the method without also specializing the enclosing class for
all of its possible types, the method is moved outside of the class, and
specialized there.
2022-03-23 22:01:32 +01:00
Ethan-Hall
4e56e738a8 Cycles: optimize CPU texture sampler interpolation
Use templates to optimize the CPU texture sampler to interpolate using
float for single component datatypes instead of using float4 for all types.

Differential Revision: https://developer.blender.org/D14424
2022-03-23 20:06:12 +01:00
Ethan-Hall
4abb8a14a2 Cycles: make 3D texture sampling at boundaries more similar to GPU
CPU code for cubic interpolation with clip texture extension only performed
texture interpolation inside the range of [0,1]. As a result, even though the
volume's color is sampled using cubic interpolation, the boundary is not
being interpolated. The GPU appears was interpolating samples that span the
clip boundary softening the edge, which the CPU now does also.

This commit also includes refactoring of 2D and 3D texture sampling in
preparation of adding new extension modes.

Differential Revision: https://developer.blender.org/D14295
2022-03-21 16:38:13 +01:00
Brecht Van Lommel
390b9f1305 Fix Cycles HIP assuming warp size 32
In HIP these masks are 64 bit, while in CUDA only 32 bit.
2022-03-16 18:05:48 +01:00
Brecht Van Lommel
076079454f Cleanup: remove some unused Cycles GPU code
To make porting to other architectures easier, clarifying that this does not
need to be supported. The unused parallel_reduce implementation assumed warp
size 32, but is easy to update if we ever need it in the future.
2022-03-16 18:05:08 +01:00
Ethan-Hall
3902bebf18 Cycles: make smart interpolation fallback to cubic for GPU
Matching CPU and Eevee behavior.

Differential Revision: https://developer.blender.org/D14296
2022-03-11 18:27:58 +01:00
Brecht Van Lommel
a9a05d5597 Merge branch 'blender-v3.1-release' 2022-02-15 01:05:47 +01:00
Brecht Van Lommel
facd9d8268 Cleanup: clang-format 2022-02-15 01:05:25 +01:00
Brecht Van Lommel
35c261dfcf Merge branch 'blender-v3.1-release' 2022-02-11 23:58:41 +01:00
Michael Jones
27d3140b13 Cycles: Fix Metal kernel compilation for AMD GPUs
Workaround for a compilation issue preventing kernels compiling for AMD GPUs: Avoid problematic use of templates on Metal by making `gpu_parallel_active_index_array` a wrapper macro, and moving `blocksize` to be a macro parameter.

Reviewed By: brecht

Differential Revision: https://developer.blender.org/D14081
2022-02-11 22:52:48 +00:00
Brecht Van Lommel
9cfc7967dd Cycles: use SPDX license headers
* Replace license text in headers with SPDX identifiers.
* Remove specific license info from outdated readme.txt, instead leave details
  to the source files.
* Add list of SPDX license identifiers used, and corresponding license texts.
* Update copyright dates while we're at it.

Ref D14069, T95597
2022-02-11 17:47:34 +01:00
William Leeson
ae44070341 Cycles: explicitly skip self-intersection
Remember the last intersected primitive and skip any intersections with the
same primitive.

Ref D12954
2022-01-26 17:51:05 +01:00
Brecht Van Lommel
d68ce0e475 Cycles: add pointcloud implementation for Metal RT
This is not currently working, with an internal compiler error. However
we are currently using BVH2 instead of Metal RT. So this has no effect for
users, it's being committed to avoid the code getting outdated.

Ref T92573, T92212

Differential Revision: https://developer.blender.org/D13632
2022-01-21 14:42:27 +01:00
Brecht Van Lommel
0cf2fafd81 Fix T94050, T94570, T94527: Cycles Bevel and AO nodes not working with Metal
Workaround what may be a compiler bug, solution found by Michael Jones.
2022-01-13 10:40:41 +01:00
Michael Jones
efe3d60a2c Cycles: Fix Metal build
This patch fixes a couple of new Metal kernel compilation errors: 1) a kernel parameter count overflow, and 2) missing address space qualifiers.

Reviewed By: brecht

Differential Revision: https://developer.blender.org/D13763
2022-01-07 16:19:31 +00:00
Patrick Mours
8393ccd076 Cycles: Add OptiX temporal denoising support
Enables the `bpy.ops.cycles.denoise_animation()` operator again and modifies it to support
temporal denoising with OptiX. This requires renders that were done with both the "Vector"
and "Denoising Data" passes.

Differential Revision: https://developer.blender.org/D11442
2022-01-05 15:58:36 +01:00
Brecht Van Lommel
e2e7f7ea52 Fix Cycles OptiX crash with 3D curves after point cloud changes
Includes refactoring to reduce the number of bits taken by primitive types,
so they more easily fit in the OptiX limit.
2021-12-20 14:14:43 +01:00
Brecht Van Lommel
35b1e9fc3a Cycles: pointcloud rendering
This add support for rendering of the point cloud object in Blender, as a native
geometry type in Cycles that is more memory and time efficient than instancing
sphere meshes. This can be useful for rendering sand, water splashes, particles,
motion graphics, etc.

Points are currently always rendered as spheres, with backface culling. More
shapes are likely to be added later, but this is the most important one and can
be customized with shaders.

For CPU rendering the Embree primitive is used, for GPU there is our own
intersection code. Motion blur is suppored. Volumes inside points are not
currently supported.

Implemented with help from:
* Kévin Dietrich: Alembic procedural integration
* Patrick Mourse: OptiX integration
* Josh Whelchel: update for cycles-x changes

Ref T92573

Differential Revision: https://developer.blender.org/D9887
2021-12-16 20:54:04 +01:00
Michael Jones
9558fa5196 Cycles: Metal host-side code
This patch adds the Metal host-side code:

- Add all core host-side Metal backend files (device_impl, queue, etc)
- Add MetalRT BVH setup files
- Integrate with Cycles device enumeration code
- Revive `path_source_replace_includes` in util/path (required for MSL compilation)

This patch also includes a couple of small kernel-side fixes:

- Add an implementation of `lgammaf` for Metal [Nemes, Gergő (2010), "New asymptotic expansion for the Gamma function", Archiv der Mathematik](https://users.renyi.hu/~gergonemes/)
- include "work_stealing.h" inside the Metal context class because it accesses state now

Ref T92212

Reviewed By: brecht

Maniphest Tasks: T92212

Differential Revision: https://developer.blender.org/D13423
2021-12-07 15:52:21 +00:00
Campbell Barton
ac447ba1a3 Cleanup: clang-format, trailing space 2021-11-30 10:15:17 +11:00
Brecht Van Lommel
4fac3be146 Fix Cycles OptiX doing a bit too much work for almost opaque curve shadows
Found in D13353, likely has no significant impact in performance.
2021-11-29 18:41:37 +01:00
Michael Jones
f613c4c095 Cycles: MetalRT support (kernel side)
This patch adds MetalRT support to Cycles kernel code. It is mostly additive in nature or confined to Metal-specific code, however there are a few areas where this interacts with other code:

- MetalRT closely follows the Optix implementation, and in some cases (notably handling of transforms) it makes sense to extend Optix special-casing to MetalRT. For these generalisations we now have `__KERNEL_GPU_RAYTRACING__` instead of `__KERNEL_OPTIX__`.
- MetalRT doesn't support primitive offsetting (as with `primitiveIndexOffset` in Optix), so we define and populate a new kernel texture, `__object_prim_offset`, containing per-object primitive / curve-segment offsets. This is referenced and applied in MetalRT intersection handlers.
- Two new BVH layout enum values have been added: `BVH_LAYOUT_METAL` and `BVH_LAYOUT_MULTI_METAL_EMBREE` for XPU mode). Some host-side enum case handling has been updated where it is trivial to do so.

Ref T92212

Reviewed By: brecht

Maniphest Tasks: T92212

Differential Revision: https://developer.blender.org/D13353
2021-11-29 15:20:26 +00:00
Michael Jones
eb7827e797 Cycles: Fix film convert address space mismatch on Metal
This patch fixes an address space mismatch in the film convert kernels on Metal. The `film_get_pass_pixel_...` functions take a `ccl_private` result pointer, but the film convert kernels pass a `ccl_global` memory pointer. Specialising the pass-fetch functions with templates results in compilation errors on Visual Studio, so instead this patch just adds an intermediate local on Metal.

Reviewed By: brecht

Differential Revision: https://developer.blender.org/D13350
2021-11-26 13:58:48 +00:00
William Leeson
c49d2cbe92 Merge branch 'blender-v3.0-release' to bring in D13042:
Fix performance decrease with Scrambling Distance on
2021-11-25 09:41:03 +01:00
Alaska
b41c72b710 Fix performance decrease with Scrambling Distance on
With the current code in master, scrambling distance is enabled on non-hardware accelerated ray tracing devices see a measurable performance decrease when compared scrambling distance on vs off. From testing, this performance decrease comes from the large tile sizes scheduled in `tile.cpp`.

This patch attempts to address the performance decrease by using different algorithms to calculate the tile size for devices with hardware accelerated ray traversal and devices without. Large tile sizes for hardware accelerated devices and small tile sizes for others.

Most of this code is based on proposals from @brecht and @leesonw

Reviewed By: brecht, leesonw

Differential Revision: https://developer.blender.org/D13042
2021-11-25 09:32:26 +01:00
Patrick Mours
7a97e925fd Cycles: Add support for building with OptiX 7.4 SDK and use built-in catmull-rom curve type
Some enum names were changed/removed in OptiX 7.4, so some changes are necessary to
make things compile still.
In addition, OptiX 7.4 also adds built-in support for catmull-rom curves, so it is no longer
necessary to convert the catmull-rom data to cubic bsplines first, and has endcaps disabled
by default now, so can remove the special handling via any-hit programs that filtered them
out before.

Differential Revision: https://developer.blender.org/D13351
2021-11-24 16:33:04 +01:00
Brecht Van Lommel
1b94c53aa6 Cleanup: fix typos in comments and docs
Contributed by luzpaz.

Differential Revision: https://developer.blender.org/D10447
2021-11-19 13:02:16 +01:00
Michael Jones
d1f944c186 Cycles: declare constants at program scope on Metal
MSL requires that constant address space literals be declared at program
scope. This patch moves the `blackbody_table_r/g/b` and `cie_colour_match`
constants into separate files so they can be declared at the appropriate scope.

Ref T92212

Differential Revision: https://developer.blender.org/D13241
2021-11-18 14:38:05 +01:00
Michael Jones
d19e35873f Cycles: several small fixes and additions for MSL
This patch contains many small leftover fixes and additions that are
required for Metal-enablement:

- Address space fixes and a few other small compile fixes
- Addition of missing functionality to the Metal adapter headers
- Addition of various scattered `__KERNEL_METAL__` blocks (e.g. for
  atomic support & maths functions)

Ref T92212

Differential Revision: https://developer.blender.org/D13263
2021-11-18 14:38:02 +01:00
Brecht Van Lommel
063ad8635e Cycles: reduce triangle memory usage with packed_float3
Depends on D13243

Differential Revision: https://developer.blender.org/D13244
2021-11-17 17:29:41 +01:00
Brecht Van Lommel
9937d5379c Cycles: add packed_float3 type for storage
Introduce a packed_float3 type for smaller storage that is exactly 3
floats, instead of 4. For computation float3 is still used since it can
use SIMD instructions.

Ref T92212

Differential Revision: https://developer.blender.org/D13243
2021-11-17 17:29:41 +01:00
Michael Jones
64003fa4b0 Cycles: Adapt volumetric lambda functions to work on MSL
This patch adapts the existing volumetric read/write lambda functions for Metal. Lambda expressions are not supported on MSL, so two new macros `VOLUME_READ_LAMBDA` and `VOLUME_WRITE_LAMBDA` have been defined with a default implementation which, on Metal, is overridden to use inline function objects.

This patch also removes the last remaining mention of the now-unused `ccl_addr_space`.

Ref T92212

Reviewed By: leesonw

Maniphest Tasks: T92212

Differential Revision: https://developer.blender.org/D13234
2021-11-16 13:42:23 +00:00
Campbell Barton
1143bf281a Cleanup: spelling in comments, comment block formatting 2021-11-13 13:07:13 +11:00
Campbell Barton
acc800d24d Cleanup: clang-format 2021-11-13 12:47:18 +11:00
Sergey Sharybin
ce395c84a3 Merge branch 'blender-v3.0-release' 2021-11-11 15:29:35 +01:00
Sergey Sharybin
d26d3cfe19 Fix T92868: Cycles catcher with transparency crashes
The issue was caused by splitting happening twice.

Fixed by checking for split flag which is assigned to the both states
during split.

The tricky part was to write catcher data at the moment of split: the
transparency and shadow catcher sample count is to be accumulated at
that point. Now it is happening in the `intersect_closest` kernel.
The downside is that render buffer is to be passed to the kernel, but
the benefit is that extra split bounce check is not needed now.

Had to move the passes write to shadow catcher header, since include
of `film/passes.h` causes all the fun of requirement to have BSDF
data structures available.

Differential Revision: https://developer.blender.org/D13177
2021-11-11 15:21:35 +01:00