Commit Graph

1019 Commits

Author SHA1 Message Date
Michael Jones
4412e14708 Cycles: Useful Metal backend debug & profiling functionality
This patch adds some useful debugging & profiling env vars to the Metal backend:

- `CYCLES_METAL_PROFILING`: output a per-kernel timing report at the end of the render
- `CYCLES_METAL_DEBUG`: enable per-dispatch tracing (very verbose)
- `CYCLES_DEBUG_METAL_CAPTURE_KERNEL`: enable programatic .gputrace capture for a specified kernel index

Here's an example of the timing report with `CYCLES_METAL_PROFILING` enabled:

```
---------------------------------------------------------------------------------------------------
Kernel name                                 Total threads   Dispatches     Avg. T/D    Time   Time%
---------------------------------------------------------------------------------------------------
integrator_init_from_camera                   657,407,232          161    4,083,274   0.24s   0.51%
integrator_intersect_closest                1,629,288,440          681    2,392,494  15.18s  32.12%
integrator_intersect_shadow                   751,652,291          470    1,599,260   5.80s  12.28%
integrator_shade_background                   304,612,074          263    1,158,220   1.16s   2.45%
integrator_shade_surface                    1,159,764,041          676    1,715,627  20.57s  43.52%
integrator_shade_shadow                       598,885,847          418    1,432,741   1.27s   2.69%
integrator_queued_paths_array               2,969,650,130          805    3,689,006   0.35s   0.74%
integrator_queued_shadow_paths_array          593,936,619          379    1,567,115   0.14s   0.29%
integrator_terminated_paths_array              22,205,417          155      143,260   0.05s   0.10%
integrator_sorted_paths_array               2,517,140,043          676    3,723,579   1.65s   3.50%
integrator_compact_paths_array                648,912,748          155    4,186,533   0.03s   0.07%
integrator_compact_states                      20,872,687          155      134,662   0.14s   0.29%
integrator_terminated_shadow_paths_array      374,100,675          438      854,111   0.16s   0.33%
integrator_compact_shadow_paths_array         503,768,657          438    1,150,156   0.05s   0.10%
integrator_compact_shadow_states               37,664,941          202      186,460   0.23s   0.50%
integrator_reset                               25,165,824            6    4,194,304   0.06s   0.12%
film_convert_combined_half_rgba                 3,110,400            6      518,400   0.00s   0.01%
prefix_sum                                            676          676            1   0.19s   0.40%
---------------------------------------------------------------------------------------------------
                                                                 6,760               47.27s 100.00%
---------------------------------------------------------------------------------------------------
```

Reviewed By: brecht

Differential Revision: https://developer.blender.org/D15044
2022-06-07 11:08:39 +01:00
Patrick Mours
5c6053ccb1 Fix misaligned address error when rendering 3D curves in the viewport with Cycles and OptiX 7.4
Acceleration structures in the viewport default to building with the fast
build flag, but the intersection program used for curves was queried with
the fast trace flag. The resulting mismatch caused an exception in the
intersection kernel. Since it's difficult to predict whether dynamic or static
acceleration structures are going to be built at the time of kernel loading,
this fixes the mismatch by always using the fast trace flag for curves.
2022-06-03 12:24:13 +02:00
Campbell Barton
61a7e5be18 Cleanup: '*' prefix C-comment blocks 2022-06-01 15:38:48 +10:00
Brecht Van Lommel
610619c203 Merge branch 'blender-v3.2-release' 2022-05-31 17:35:16 +02:00
Brecht Van Lommel
f2cd7e08fe Fix Cycles MNEE not working for Metal
Move MNEE to own kernel, separate from shader ray-tracing. This does introduce
the limitation that a shader can't use both MNEE and AO/bevel, but that seems
like the better trade-off for now.

We can experiment with bigger kernel organization changes later.

Differential Revision: https://developer.blender.org/D15070
2022-05-31 17:24:43 +02:00
Patrick Mours
a8c81ffa83 Cycles: Add half precision float support for volumes with NanoVDB
This patch makes it possible to change the precision with which to
store volume data in the NanoVDB data structure (as float, half, or
using variable bit quantization) via the previously unused precision
field in the volume data block.
It makes it possible to further reduce memory usage during
rendering, at a slight cost to the visual detail of a volume.

Differential Revision: https://developer.blender.org/D10023
2022-05-23 19:08:01 +02:00
Campbell Barton
427a2c920a Cleanup: spelling in comments, capitalize tags
Also add missing task-ID reference & remove colon after \note as it
doesn't render properly in doxygen.
2022-05-13 09:29:25 +10:00
Michael Jones
007184bcf2 Enable inlining on Apple Silicon. Use new process-wide ShaderCache in order to safely re-enable binary archives
This patch is the same as D14763, but with a fix for unit test failures caused by ShaderCache fetch logic not working in the non-MetalRT case:

```
diff --git a/intern/cycles/device/metal/kernel.mm b/intern/cycles/device/metal/kernel.mm
index ad268ae7057..6aa1a56056e 100644
--- a/intern/cycles/device/metal/kernel.mm
+++ b/intern/cycles/device/metal/kernel.mm
@@ -203,9 +203,12 @@ bool kernel_has_intersection(DeviceKernel device_kernel)

   /* metalrt options */
   request.pipeline->use_metalrt = device->use_metalrt;
-  request.pipeline->metalrt_hair = device->kernel_features & KERNEL_FEATURE_HAIR;
-  request.pipeline->metalrt_hair_thick = device->kernel_features & KERNEL_FEATURE_HAIR_THICK;
-  request.pipeline->metalrt_pointcloud = device->kernel_features & KERNEL_FEATURE_POINTCLOUD;
+  request.pipeline->metalrt_hair = device->use_metalrt &&
+                                   (device->kernel_features & KERNEL_FEATURE_HAIR);
+  request.pipeline->metalrt_hair_thick = device->use_metalrt &&
+                                         (device->kernel_features & KERNEL_FEATURE_HAIR_THICK);
+  request.pipeline->metalrt_pointcloud = device->use_metalrt &&
+                                         (device->kernel_features & KERNEL_FEATURE_POINTCLOUD);

   {
     thread_scoped_lock lock(cache_mutex);
@@ -225,9 +228,9 @@ bool kernel_has_intersection(DeviceKernel device_kernel)

   /* metalrt options */
   bool use_metalrt = device->use_metalrt;
-  bool metalrt_hair = device->kernel_features & KERNEL_FEATURE_HAIR;
-  bool metalrt_hair_thick = device->kernel_features & KERNEL_FEATURE_HAIR_THICK;
-  bool metalrt_pointcloud = device->kernel_features & KERNEL_FEATURE_POINTCLOUD;
+  bool metalrt_hair = use_metalrt && (device->kernel_features & KERNEL_FEATURE_HAIR);
+  bool metalrt_hair_thick = use_metalrt && (device->kernel_features & KERNEL_FEATURE_HAIR_THICK);
+  bool metalrt_pointcloud = use_metalrt && (device->kernel_features & KERNEL_FEATURE_POINTCLOUD);

   MetalKernelPipeline *best_pipeline = nullptr;
   for (auto &pipeline : collection) {

```

Reviewed By: brecht

Differential Revision: https://developer.blender.org/D14923
2022-05-11 16:20:59 +01:00
Campbell Barton
12a1fa9cf4 Cleanup: format 2022-05-06 18:27:44 +10:00
Patrick Mours
6fa5d520b8 Cycles: Add support for parallel compilation of OptiX module
OptiX 7.4 adds support for splitting the costly creation of an OptiX
module into smaller tasks that can be executed in parallel on a
thread pool.
This is only really relevant for the "shader_raytrace" kernel variant
as the main one is small and compiles fast either way. It sheds of
a few seconds there (total gain is not massive currently, since it is
difficult for the compiler to split up the huge shading entry point
that is the primary one taking up time, but it is still measurable).

Differential Revision: https://developer.blender.org/D14845
2022-05-05 14:35:41 +02:00
Brecht Van Lommel
52a5f68562 Revert "Cycles: Enable inlining on Apple Silicon for 1.1x speedup"
This reverts commit b82de02e7c. It is causing
crashes in various regression tests.

Ref D14763
2022-04-28 00:46:43 +02:00
Michael Jones
b82de02e7c Cycles: Enable inlining on Apple Silicon for 1.1x speedup
This is a stripped down version of D14645 without the scene specialisation optimisations.

The two major changes in this patch are:

- Enables more aggressive inlining on Apple Silicon resulting in a 1.1x speedup and 10% reduction in spill, at the cost of longer pipeline build times
- Revival of shader binary archives through a new ShaderCache which is shared between MetalDevice instances using the same physical MTLDevice. This mitigates the extra compile times via explicit caching (rather than, as before, relying on the implicit system shader cache which can be purged without notice)

Reviewed By: brecht

Differential Revision: https://developer.blender.org/D14763
2022-04-26 22:17:16 +01:00
Sergey Sharybin
eccc9d8eba Cleanup: Remove unused function in Cycles queue
Noticed while looking into oneAPI patch.

Seems to be unused, without clear indication why/when it might be
needed. Removing the function simplifies adding the new backend.

Differential Revision: https://developer.blender.org/D14652
2022-04-19 10:32:07 +02:00
Brecht Van Lommel
2cb76a6c8d Cleanup: consistently use parallel_for without tbb namespace in Cycles 2022-04-18 19:14:36 +02:00
Brecht Van Lommel
2d472b70e5 Revert "Cycles: enable HIP for Vega and Vega II (Radeon 7) GPUs on Windows"
This is not currently working, reverting until the driver/compiler has a fix.

This reverts commit c46e58817c.
2022-04-12 19:18:58 +02:00
Brecht Van Lommel
0de0950ad5 Cycles: various Linux build fixes related to Hydra render delegate
* Add missing GLEW and hgiGL libraries for Hydra
* Fix wrong case sensitive include
* Fix link errors by adding external libs to static Hydra lib
* Work around weird Hydra link error with MAX_SAMPLES
* Use Embree by default for Hydra
* Sync external libs code with standalone
* Update version number to match Blender
* Remove unneeded CLEW/GLEW from test executable

None of this should affect Cycles in Blender.

Ref T96731
2022-04-07 19:52:53 +02:00
Brian Savery
c46e58817c Cycles: enable HIP for Vega and Vega II (Radeon 7) GPUs on Windows
Basic testing on windows only so far. Will need some testing on Linux as well
when the Linux enablement patch is ready.

Does not enable Vega APUs yet (which would be gfx902 or gfx90c).

Differential Revision: https://developer.blender.org/D14432
2022-03-24 01:12:45 +01:00
Patrick Mours
d350976ba0 Cycles: Add Hydra render delegate
This patch adds a Hydra render delegate to Cycles, allowing Cycles to be used for rendering
in applications that provide a Hydra viewport. The implementation was written from scratch
against Cycles X, for integration into the Blender repository to make it possible to continue
developing it in step with the rest of Cycles. For this purpose it follows the style of the rest of
the Cycles code and can be built with a CMake option
(`WITH_CYCLES_HYDRA_RENDER_DELEGATE=1`) similar to the existing standalone version
of Cycles.

Since Hydra render delegates need to be built against the exact USD version and other
dependencies as the target application is using, this is intended to be built separate from
Blender (`WITH_BLENDER=0` CMake option) and with support for library versions different
from what Blender is using. As such the CMake build scripts for Windows had to be modified
slightly, so that the Cycles Hydra render delegate can e.g. be built with MSVC 2017 again
even though Blender requires MSVC 2019 now, and it's possible to specify custom paths to
the USD SDK etc. The codebase supports building against the latest USD release 22.03 and all
the way back to USD 20.08 (with some limitations).

Reviewed By: brecht, LazyDodo

Differential Revision: https://developer.blender.org/D14398
2022-03-23 16:39:05 +01:00
Sergey Sharybin
8cdee3a6d4 Fix T93710: Artifacts denoising hi-res images using OPtiX
Caused by an integer overflow in the tiling utilities of OptiX SDK.

Seems for now it's easier to copy and modify code to our sources so
that we don't need to bump SDK version requirement (which might lead
to an increased driver requirement as well).

There are still some fixes needed from a newer driver to have such
denoising to work properly: Windows requires 511.79, Linux 510.54.

Thanks Patrick for investigation!

Differential Revision: https://developer.blender.org/D14300
2022-03-10 16:17:59 +01:00
Brecht Van Lommel
f130d4f211 Cleanup: fix various typos
Contributed by luzpaz.

Differential Revision: https://developer.blender.org/D14203
2022-03-07 17:28:39 +01:00
Germano Cavalcante
e1ec2d0251 Merge branch 'blender-v3.1-release' 2022-03-01 16:31:18 -03:00
Michael Jones
952a613d38 Cycles: Hide MetalRT checkbox for AMD GPUs
This patch hides the MetalRT checkbox for AMD GPUs, pending fixes for MetalRT argument encoding on AMD.

Reviewed By: brecht

Differential Revision: https://developer.blender.org/D14175
2022-03-01 16:05:47 +00:00
Jacques Lucke
472ddc6e27 Merge branch 'blender-v3.1-release' 2022-02-22 15:13:27 +01:00
Brecht Van Lommel
66f3545a0b Cleanup: compiler warning 2022-02-22 14:03:11 +01:00
Brecht Van Lommel
259f4e50ef Merge branch 'blender-v3.1-release' 2022-02-16 15:35:18 +01:00
Brecht Van Lommel
f059bdc823 Cycles: restore basic standalone GUI, now using SDL
GLUT does not support offscreen contexts, which is required for the new
display driver. So we use SDL instead. Note that this requires using a
system SDL package, the Blender precompiled SDL does not include the video
subsystem.

There is currently no text display support, instead info is printed to
the terminal. This would require adding an embedded font and GLSL shaders,
or using GUI library.

Another improvement to be made is supporting OpenColorIO display transforms,
right now we assume Rec.709 scene linear and display.

All OpenGL, GLEW and SDL code was move out of core cycles and into
app/opengl. This serves as a template for apps that want to integrate
Cycles interactive rendering, with a simple OpenGLDisplayDriver example.
In general this would be adapted to the graphics API and color management
used by the app.

Ref T91846
2022-02-16 15:30:43 +01:00
Brecht Van Lommel
ad53cb0b9d Merge branch 'blender-v3.1-release' 2022-02-11 19:44:27 +01:00
Michael Jones
40fce61a6a Cycles: enable Metal on AMD GPUs, set macOS minimum versions
* Apple Silicon support enabled on macOS 12.2+
* AMD support enabled on macOS 12.3+

This patch also fixes a device enumeration crash on certain AMD configs which
was caused by over-release of MTLDevice objects.

Differential Revision: https://developer.blender.org/D14090
2022-02-11 19:22:16 +01:00
Brecht Van Lommel
9cfc7967dd Cycles: use SPDX license headers
* Replace license text in headers with SPDX identifiers.
* Remove specific license info from outdated readme.txt, instead leave details
  to the source files.
* Add list of SPDX license identifiers used, and corresponding license texts.
* Update copyright dates while we're at it.

Ref D14069, T95597
2022-02-11 17:47:34 +01:00
Campbell Barton
c434782e3a File headers: SPDX License migration
Use a shorter/simpler license convention, stops the header taking so
much space.

Follow the SPDX license specification: https://spdx.org/licenses

- C/C++/objc/objc++
- Python
- Shell Scripts
- CMake, GNUmakefile

While most of the source tree has been included

- `./extern/` was left out.
- `./intern/cycles` & `./intern/atomic` are also excluded because they
  use different header conventions.

doc/license/SPDX-license-identifiers.txt has been added to list SPDX all
used identifiers.

See P2788 for the script that automated these edits.

Reviewed By: brecht, mont29, sergey

Ref D14069
2022-02-11 09:14:36 +11:00
Campbell Barton
1a705fa139 Cleanup: clang-format 2022-02-11 09:14:35 +11:00
Hans Goudey
29674d5e78 Merge branch 'blender-v3.1-release' 2022-02-10 11:34:20 -06:00
Michael Jones
a44366a642 Cycles: Expose "Use MetalRT" checkbox
For curve-heavy scenes, memory consumption regressed when we switched from MetalRT to bvh2. Allow users to opt in to MetalRT to workaround this.

Reviewed By: brecht

Differential Revision: https://developer.blender.org/D14071
2022-02-10 17:32:46 +00:00
Michael Jones
35dedc11d5 Fix T95477: Report error instead of crashing when Metal texture size limits exceeded.
Reviewed By: brecht

Differential Revision: https://developer.blender.org/D14074
2022-02-10 17:06:29 +00:00
Michael Jones
3d12dd59ce Cycles: Workaround for failing "bake" unit tests in Metal
Allocate "RenderBuffers" with MTLResourceStorageModeShared.

Reviewed By: brecht

Differential Revision: https://developer.blender.org/D14073
2022-02-10 17:05:13 +00:00
Michael Jones
410e4e7ce1 Workaround for T94142: Cycles Metal crash with simultaneous viewport and final render
Disable binary archives on Apple Silicon (issue stems from instancing multiple PSOs from the same binary archive). Pipeline creation still filters through the OS shader cache, mitigating any impact on setup times after the initial render.

Reviewed By: brecht

Differential Revision: https://developer.blender.org/D14072
2022-02-10 17:04:08 +00:00
Sergey Sharybin
d16e5babaf Merge branch 'blender-v3.1-release' 2022-02-10 14:12:35 +01:00
Sergey Sharybin
04d55038ee Fix size_t -> int -> size_t round trip in Cycles
There are two things achieved by this change:

- No possible downcast of size_t to int when calculating motion steps.
- Disambiguate call to `min()` which was for some reason considered
  ambiguous on 32bit platforms `min(int, unsigned int)`.
- Do the same for the `max()` call to keep them symmetrical.

On an implementation side the `min()` is defined for a fixed width
integer type to disambiguate uint from size_t on 32bit platforms,
and yet be able to use it for 32bit operands on 64bit platforms without
upcast.

This ended up in a bit bigger change as the conditional compile-in of
functions is easiest if the functions is templated. Making the functions
templated required to remove the other source of ambiguity which is
`algorithm.h` which was pulling min/max from std.

Now it is the `math.h` which is the source of truth for min/max.
It was only one place which was relying on `algorithm.h` for these
functions, hence the choice of `math.h` as the safest and least
intrusive.

Fixes 32bit platforms (such as i386) in Debian package build system.

Differential Revision: https://developer.blender.org/D14062
2022-02-10 12:39:41 +01:00
Campbell Barton
012e41fc8b Cleanup: use our own conventions for tags in comments 2022-01-31 10:49:59 +11:00
William Leeson
ae44070341 Cycles: explicitly skip self-intersection
Remember the last intersected primitive and skip any intersections with the
same primitive.

Ref D12954
2022-01-26 17:51:05 +01:00
Brecht Van Lommel
d68ce0e475 Cycles: add pointcloud implementation for Metal RT
This is not currently working, with an internal compiler error. However
we are currently using BVH2 instead of Metal RT. So this has no effect for
users, it's being committed to avoid the code getting outdated.

Ref T92573, T92212

Differential Revision: https://developer.blender.org/D13632
2022-01-21 14:42:27 +01:00
Michael Jones
f6c8a78ac6 Cycles: Fix bvh2 gen on Apple Silicon and use it to speed up renders
This patch fixes a correctness issue discovered in the `int4 select(...)` function on Apple Silicon machines, which causes bad bvh2 builds. Although the generated bvh2s give correct renders, the resulting runtime performance is terrible. This fix allows us to switch over to bvh2 on Apple Silicon giving a significant performance uplift for many of the standard benchmarking assets. It also fixes some unit test failures stemming from the use of MetalRT, and trivially enables the new pointcloud primitive.

Ref T92212

Reviewed By: brecht

Maniphest Tasks: T92212

Differential Revision: https://developer.blender.org/D13877
2022-01-20 15:37:49 +00:00
Campbell Barton
1d536c21dd Cleanup: clang-format 2022-01-20 11:59:20 +11:00
Michael Jones
17cab47ed1 Cycles: Fix T94736: Crash when modifying strength of world environment texture
This patch fixes crash T94736 on Metal in which the launch_params were not being updated to reflect destruction of MetalMem objects.

Reviewed By: brecht

Differential Revision: https://developer.blender.org/D13875
2022-01-19 18:17:37 +00:00
Brecht Van Lommel
a3deef6fff Fix Cycles CPU + GPU render not using CPU after recent changes
In some places the task scheduler was not initialized in time.
2022-01-13 10:40:41 +01:00
Brecht Van Lommel
ae28d90578 Fix T93350: Cycles renders shows black during rendering huge resolutions
The root of the issue is caused by Cycles ignoring OpenGL limitation on
the maximum resolution of textures: Cycles was allocating texture of the
final render resolution. It was exceeding limitation on certain GPUs and
driver.

The idea is simple: use multiple textures for the display, each of which
will fit into OpenGL limitations.

There is some code which allows the display driver to know when to start
the new tile. Also added some code to allow force graphics interop to be
re-created. The latter one ended up not used in the final version of the
patch, but it might be helpful for other drivers implementation.

The tile size is limited to 8K now as it is the safest size for textures
on many GPUs and OpenGL drivers.

This is an updated fix with a workaround for freezing with the NVIDIA
driver on Linux.

Differential Revision: https://developer.blender.org/D13385
2022-01-07 17:20:04 +01:00
Jagannadhan Ravi
361702f239 Fix T94310: Blender doesn't support with 128 threads well in Win11
Query TBB for the maximum allowed concurrency, which is free from a bug
in own concurrency detection code. One thing to keep in mind is that now
Cycles is limited by the number of threads in the TBB areana from which
Session is created. This isn't a problem for Blender since we do not limit
arena on Blender side. Could be something to watch out for in other Cycles
integrations.

Differential Revision: https://developer.blender.org/D13658
2022-01-07 11:37:30 +01:00
Patrick Mours
8393ccd076 Cycles: Add OptiX temporal denoising support
Enables the `bpy.ops.cycles.denoise_animation()` operator again and modifies it to support
temporal denoising with OptiX. This requires renders that were done with both the "Vector"
and "Denoising Data" passes.

Differential Revision: https://developer.blender.org/D11442
2022-01-05 15:58:36 +01:00
Brecht Van Lommel
35b1e9fc3a Cycles: pointcloud rendering
This add support for rendering of the point cloud object in Blender, as a native
geometry type in Cycles that is more memory and time efficient than instancing
sphere meshes. This can be useful for rendering sand, water splashes, particles,
motion graphics, etc.

Points are currently always rendered as spheres, with backface culling. More
shapes are likely to be added later, but this is the most important one and can
be customized with shaders.

For CPU rendering the Embree primitive is used, for GPU there is our own
intersection code. Motion blur is suppored. Volumes inside points are not
currently supported.

Implemented with help from:
* Kévin Dietrich: Alembic procedural integration
* Patrick Mourse: OptiX integration
* Josh Whelchel: update for cycles-x changes

Ref T92573

Differential Revision: https://developer.blender.org/D9887
2021-12-16 20:54:04 +01:00
Michael Jones
e688c927eb Fix T94022: Both options GPU/CPU checked under preferences cause viewport render crash. (ARM/Metal)
This fixes crash T94022 when selecting live viewport render with both GPU & CPU devices selected. It is caused by incorrect `KernelBVHLayout` assignment. Similar to `BVH_LAYOUT_MULTI_OPTIX` for Optix, this patch adds a `BVH_LAYOUT_MULTI_METAL` to correctly redirect to the correct Metal BVH layout type.

Reviewed By: brecht

Differential Revision: https://developer.blender.org/D13561
2021-12-13 22:34:48 +00:00