Commit Graph

1058 Commits

Author SHA1 Message Date
Nikita Sirgienko
82a5790d2a Cycles: oneAPI: Trigger compilation of used kernels only
JIT compilation of oneAPI kernels now happens during load stage
and proper message gets shown in the GUI during compilation.
Also, this implementation skips kernels that aren't needed for
the used scene, reducing overall (re)compilation time.
2022-10-10 16:38:11 +02:00
Xavier Hallade
3714d3c3ce Cycles: link oneAPI backend with debug version of sycl when in Debug
It fixes SYCL runtime issues in Debug builds that were due to mixing
Release and Debug MSVC runtimes.
This commit also removes specific handling of dpcpp compiler executable
to simplify the CMake implementation. Using it like clang++ works and
clang++ executable is also available from Intel oneAPI DPC++ compiler in
case it doesn't.
2022-10-07 16:14:50 +02:00
Xavier Hallade
7eeeaec6da Cycles: use direct linking for oneAPI backend
This is a minimal set of changes, allowing a lot of cleanup that can
happen afterward as it allows sycl method and objects to be used outside
of kernel.cpp.

Reviewed By: brecht, sergey

Differential Revision: https://developer.blender.org/D15397
2022-10-07 09:50:05 +02:00
Campbell Barton
6d1d1bf2b1 Cleanup: spelling in comments
Also add missing task ID.
2022-09-28 09:41:31 +10:00
Campbell Barton
72a7f107d8 Cleanup: format 2022-09-28 09:41:28 +10:00
Nikita Sirgienko
2ead05d738 Cycles: Add optional per-kernel performance statistics
When verbose level 4 is enabled, Blender prints kernel performance
data for Cycles on GPU backends (except Metal that doesn't use
debug_enqueue_* methods) for groups of kernels.
These changes introduce a new CYCLES_DEBUG_PER_KERNEL_PERFORMANCE
environment variable to allow getting timings for each kernels
separately and not grouped with others. This is done by adding
explicit synchronization after each kernel execution.

Differential Revision: https://developer.blender.org/D15971
2022-09-27 22:15:00 +02:00
Michael Jones
fc604a0be3 Cycles: Disable binary archives on macOS < 13.0
An bug with binary archives was fixed in macOS 13.0 which stops some spurious kernel recompilations. In older macOS versions, falling back on the system shader cache will prevent recompilations in most instances (this is the same behaviour as in Blender 3.1.x and 3.2.x).

Reviewed By: brecht

Differential Revision: https://developer.blender.org/D16082
2022-09-27 16:58:21 +01:00
Sebastian Herhoz
75a6d3abf7 Cycles: add Path Guiding on CPU through Intel OpenPGL
This adds path guiding features into Cycles by integrating Intel's Open Path
Guiding Library. It can be enabled in the Sampling > Path Guiding panel in the
render properties.

This feature helps reduce noise in scenes where finding a path to light is
difficult for regular path tracing.

The current implementation supports guiding directional sampling decisions on
surfaces, when the material contains a least one diffuse component, and in
volumes with isotropic and anisotropic Henyey-Greenstein phase functions.

On surfaces, the guided sampling decision is proportional to the product of
the incident radiance and the normal-oriented cosine lobe and in volumes it
is proportional to the product of the incident radiance and the phase function.

The incident radiance field of a scene is learned and updated during rendering
after each per-frame rendering iteration/progression.

At the moment, path guiding is only supported by the CPU backend. Support for
GPU backends will be added in future versions of OpenPGL.

Ref T92571

Differential Revision: https://developer.blender.org/D15286
2022-09-27 15:56:32 +02:00
Sergey Sharybin
3c2c296130 Fix compilation error on Windows after recent change 2022-09-13 11:52:11 +02:00
Patrick Mours
a45c36efae Cycles: Make OSL implementation independent from SVM
Cleans up the file structure to be more similar to that of the SVM
and also makes it possible to build kernels with OSL support, but
without having to include SVM support.

This patch was split from D15902.

Differential Revision: https://developer.blender.org/D15949
2022-09-13 10:59:28 +02:00
Sergey Sharybin
602cca671e Cycles: Include reason the oneAPI library could not be loaded
Additionally, just stick to a pure error stating. Such messages
are aimed for developers and it is rather implied that oneAPI
rendering will be disabled.
2022-09-13 10:52:18 +02:00
Josh Whelchel
74477149dd Fix T100845: wrong Cycles OptiX runtime compilation include path
Causing OptiX kernel build errors on Arch Linux.

Differential Revision: https://developer.blender.org/D15891
2022-09-06 16:11:12 +02:00
Nikita Sirgienko
e1fbb4ce89 Merge branch 'blender-v3.3-release' 2022-09-06 15:39:12 +02:00
Nikita Sirgienko
8b11ed392c Cycles: Fix crashes in oneAPI backend for scenes not fitting in dGPU memory
Differential Revision: https://developer.blender.org/D15889
2022-09-06 15:38:15 +02:00
Campbell Barton
6c6a53fad3 Cleanup: spelling in comments, formatting, move comments into headers 2022-09-06 16:25:20 +10:00
Brecht Van Lommel
74caf77361 Cycles: add option to specify OptiX runtime root directory
This allows individual users or Linux distributions to specify a directory
Cycles will automatically look for the OptiX include folder, to compile kernels
at runtime.

It is still possible to override this with the OPTIX_ROOT_DIR environment
variable at runtime.

Based on patch by Sebastian Parborg.

Ref D15792
2022-08-29 19:50:20 +02:00
Sebastian Parborg
8ffc11dbcb Cleanup OpenGL linking and related code after libepoxy merge
This cleans up the OpenGL build flags and linking.
It additionally also removes some dead code.

One of these dead code paths is WITH_X11_ALPHA which actually never was
active even with the build flag on. The call to use this was never
called because the default initializer for GHOST was set to have it off
per default. Nothing called this function with a boolean value to enable it.

These cleanups are needed to support true headless OpenGL rendering.
Without these cleanups libepoxy will fail to load the correct OpenGL
Libraries as we have already linked them to the blender binary.

Reviewed By: Brecht, Campbell, Jeroen

Differential Revision: http://developer.blender.org/D15554
2022-08-15 16:47:20 +02:00
Christian Rauch
a296b8f694 GPU: replace GLEW with libepoxy
With libepoxy we can choose between EGL and GLX at runtime, as well as
dynamically open EGL and GLX libraries without linking to them.

This will make it possible to build with Wayland, EGL, GLVND support while
still running on systems that only have X11, GLX and libGL. It also paves
the way for headless rendering through EGL.

libepoxy is a new library dependency, and is included in the precompiled
libraries. GLEW is no longer a dependency, and WITH_SYSTEM_GLEW was removed.

Includes contributions by Brecht Van Lommel, Ray Molenkamp, Campbell Barton
and Sergey Sharybin.

Ref T76428

Differential Revision: https://developer.blender.org/D15291
2022-08-15 16:10:29 +02:00
Patrick Mours
79787bf8e1 Cycles: Improve denoiser update performance when rendering with multiple GPUs
This patch causes the render buffers to be copied to the denoiser
device only once before denoising and output/display is then fed
from that single buffer on the denoiser device. That way usually all
but one copy (from all the render devices to the denoiser device)
can be eliminated, provided that the denoiser device is also the
display device (in which case interop is used to update the display).
As such this patch also adds some logic that tries to ensure the
chosen denoiser device is the same as the display device.

Differential Revision: https://developer.blender.org/D15657
2022-08-12 16:00:54 +02:00
Xavier Hallade
d706d0460c Cycles oneAPI: simplify num_concurrent_states selection
The number of Execution Units and resident "threads" (simd width * threads
per EUs) are now exposed and used to select the number of states using
a simplified heuristic.
2022-07-27 09:45:33 +02:00
Brecht Van Lommel
f26aa186b2 Cleanup: remove __KERNEL_CPU__
This was tested in some places to check if code was being compiled for the
CPU, however this is only defined in the kernel. Checking __KERNEL_GPU__
always works.
2022-07-25 17:43:35 +02:00
Brecht Van Lommel
011d3c75a7 Cleanup: compiler warning 2022-07-15 15:20:53 +02:00
Brecht Van Lommel
523bbf7065 Cycles: generalize shader sorting / locality heuristic to all GPU devices
This was added for Metal, but also gives good results with CUDA and OptiX.
Also enable it for future Apple GPUs instead of only M1 and M2, since this has
been shown to help across multiple GPUs so the better bet seems to enable
rather than disable it.

Also moves some of the logic outside of the Metal device code, and always
enables the code in the kernel since other devices don't do dynamic compile.

Time per sample with OptiX + RTX A6000:
                                         new                  old
barbershop_interior                      0.0730s              0.0727s
bmw27                                    0.0047s              0.0053s
classroom                                0.0428s              0.0464s
fishy_cat                                0.0102s              0.0108s
junkshop                                 0.0366s              0.0395s
koro                                     0.0567s              0.0578s
monster                                  0.0206s              0.0223s
pabellon                                 0.0158s              0.0174s
sponza                                   0.0088s              0.0100s
spring                                   0.1267s              0.1280s
victor                                   0.0524s              0.0531s
wdas_cloud                               0.0817s              0.0816s

Ref D15331, T87836
2022-07-15 13:42:47 +02:00
Michael Jones
da4ef05e4d Cycles: Apple Silicon optimization to specialize intersection kernels
The Metal backend now compiles and caches a second set of kernels which are
optimized for scene contents, enabled for Apple Silicon.

The implementation supports doing this both for intersection and shading
kernels. However this is currently only enabled for intersection kernels that
are quick to compile, and already give a good speedup. Enabling this for
shading kernels would be faster still, however this also causes a long wait
times and would need a good user interface to control this.

M1 Max samples per minute (macOS 13.0):

                    PSO_GENERIC  PSO_SPECIALIZED_INTERSECT  PSO_SPECIALIZED_SHADE

barbershop_interior       83.4	            89.5                   93.7
bmw27                   1486.1	          1671.0                 1825.8
classroom                175.2	           196.8                  206.3
fishy_cat                674.2	           704.3                  719.3
junkshop                 205.4	           212.0                  257.7
koro                     310.1	           336.1                  342.8
monster                  376.7	           418.6                  424.1
pabellon                 273.5	           325.4                  339.8
sponza                   830.6	           929.6                 1142.4
victor                    86.7              96.4                   96.3
wdas_cloud               111.8	           112.7                  183.1

Code contributed by Jason Fielder, Morteza Mostajabodaveh and Michael Jones

Differential Revision: https://developer.blender.org/D14645
2022-07-15 13:40:04 +02:00
Brecht Van Lommel
79da7f2a8f Cycles: refactor to move part of KernelData definition to template header
To be used for specialization on Metal in a following commit, turning these
members into compile time constants.

Ref D14645
2022-07-15 13:40:04 +02:00
Michael Jones
4b1d315017 Cycles: Improve cache usage on Apple GPUs by chunking active indices
This patch partitions the active indices into chunks prior to sorting by material in order to tradeoff some material coherence for better locality. On Apple Silicon GPUs (particularly higher end M1-family GPUs), we observe overall render time speedups of up to 15%. The partitioning is implemented by repeating the range of `shader_sort_key` for each partition, and encoding a "locator" key which distributes the indices into sorted chunks.

Reviewed By: brecht

Differential Revision: https://developer.blender.org/D15331
2022-07-14 14:26:18 +01:00
Xavier Hallade
41c10ac84a Cycles: fix support for multiple Intel GPUs
Identical Intel GPUs ended up with the same id.
Added PCI BDF to the id to make it unique.
2022-07-01 11:20:00 +02:00
Campbell Barton
b6c28002ac Cleanup: spelling in comments 2022-06-30 12:14:22 +10:00
Xavier Hallade
a02992f131 Cycles: Add support for rendering on Intel GPUs using oneAPI
This patch adds a new Cycles device with similar functionality to the
existing GPU devices.  Kernel compilation and runtime interaction happen
via oneAPI DPC++ compiler and SYCL API.

This implementation is primarly focusing on Intel® Arc™ GPUs and other
future Intel GPUs.  The first supported drivers are 101.1660 on Windows
and 22.10.22597 on Linux.

The necessary tools for compilation are:
- A SYCL compiler such as oneAPI DPC++ compiler or
  https://github.com/intel/llvm
- Intel® oneAPI Level Zero which is used for low level device queries:
  https://github.com/oneapi-src/level-zero
- To optionally generate prebuilt graphics binaries: Intel® Graphics
  Compiler All are included in Linux precompiled libraries on svn:
  https://svn.blender.org/svnroot/bf-blender/trunk/lib The same goes for
  Windows precompiled binaries but for the graphics compiler, available
  as "Intel® Graphics Offline Compiler for OpenCL™ Code" from
  https://www.intel.com/content/www/us/en/developer/articles/tool/oneapi-standalone-components.html,
  for which path can be set as OCLOC_INSTALL_DIR.

Being based on the open SYCL standard, this implementation could also be
extended to run on other compatible non-Intel hardware in the future.

Reviewed By: sergey, brecht

Differential Revision: https://developer.blender.org/D15254

Co-authored-by: Nikita Sirgienko <nikita.sirgienko@intel.com>
Co-authored-by: Stefan Werner <stefan.werner@intel.com>
2022-06-29 12:58:04 +02:00
Sayak Biswas
abfa09752f Cycles: enable Vega GPU/APU support
Enables Vega and Vega II GPUs as well as Vega APU, using changes in HIP code
to support 64-bit waves and a new HIP SDK version.

Tested with Radeon WX9100, Radeon VII GPUs and Ryzen 7 PRO 5850U with Radeon
Graphics APU.

Ref T96740, T91571

Differential Revision: https://developer.blender.org/D15242
2022-06-28 18:35:43 +02:00
Brecht Van Lommel
9b6e86ace1 Cycles: stop Metal rendering on command buffer error
If there is an error we should stop rendering, instead of finishing with a
wrong render result or reporting a wrong benchmark time.

Ref T96519

Differential Revision: https://developer.blender.org/D15287
2022-06-24 16:51:56 +02:00
Brecht Van Lommel
a5ff46e0fc Cleanup: make format 2022-06-23 19:28:39 +02:00
Michael Jones
d8e9647ae2 Cycles: Add diagnostic tracing of MTLLibrary compilation time
Reviewed By: sergey

Differential Revision: https://developer.blender.org/D15268
2022-06-23 10:06:20 +01:00
Michael Jones
532b33973b Cycles: Tidy of KernelData patchup code
Reviewed By: sergey

Differential Revision: https://developer.blender.org/D15267
2022-06-22 22:38:00 +01:00
Michael Jones
328a911379 Cycles: Distinguish Apple GPUs by core count
This patch suffixes Apple GPU device names with `(GPU - # cores)` so that variant GPUs with the same chipset can be distinguished. Currently benchmark scores for these M1 family GPUs are being incorrectly merged:

- M1: 7 or 8 cores
- M1 Pro: 14 or 16 cores
- M1 Max: 24 or 32 cores
- M1 Ultra: 48 or 64 cores

Reviewed By: brecht, sergey

Differential Revision: https://developer.blender.org/D15257
2022-06-22 22:32:56 +01:00
Brecht Van Lommel
ff1883307f Cleanup: renaming and consistency for kernel data
* Rename "texture" to "data array". This has not used textures for a long time,
  there are just global memory arrays now. (On old CUDA GPUs there was a cache
  for textures but not global memory, so we used to put all data in textures.)
* For CUDA and HIP, put globals in KernelParams struct like other devices.
* Drop __ prefix for data array names, no possibility for naming conflict now that
  these are in a struct.
2022-06-20 12:30:48 +02:00
Brecht Van Lommel
2c1bffa286 Cleanup: add verbose logging category names instead of numbers
And use them more consistently than before.
2022-06-17 14:08:14 +02:00
Michael Jones
19e0b60f3e Cycles: MetalDeviceQueue - capture of multiple dispatches, and some tidying
This patch adds a new mode of gpu capture (env var `CYCLES_DEBUG_METAL_CAPTURE_SAMPLES`) to capture a block of dispatches between "reset" calls. It also fixes member data naming inconsistencies and adds some missing OS version checks.

Screenshot showing .gputrace capture in Xcode 14.0 beta (using `CYCLES_DEBUG_METAL_CAPTURE_SAMPLES="1"` and `CYCLES_DEBUG_METAL_CAPTURE_LIMIT="10"`):

{F13155703}

Reviewed By: sergey, brecht

Differential Revision: https://developer.blender.org/D15179
2022-06-13 13:42:07 +01:00
Sergey Sharybin
0fddff027e Cleanup: Unused but set variable in Cycles Metal profiler 2022-06-09 10:20:26 +02:00
Michael Jones
4412e14708 Cycles: Useful Metal backend debug & profiling functionality
This patch adds some useful debugging & profiling env vars to the Metal backend:

- `CYCLES_METAL_PROFILING`: output a per-kernel timing report at the end of the render
- `CYCLES_METAL_DEBUG`: enable per-dispatch tracing (very verbose)
- `CYCLES_DEBUG_METAL_CAPTURE_KERNEL`: enable programatic .gputrace capture for a specified kernel index

Here's an example of the timing report with `CYCLES_METAL_PROFILING` enabled:

```
---------------------------------------------------------------------------------------------------
Kernel name                                 Total threads   Dispatches     Avg. T/D    Time   Time%
---------------------------------------------------------------------------------------------------
integrator_init_from_camera                   657,407,232          161    4,083,274   0.24s   0.51%
integrator_intersect_closest                1,629,288,440          681    2,392,494  15.18s  32.12%
integrator_intersect_shadow                   751,652,291          470    1,599,260   5.80s  12.28%
integrator_shade_background                   304,612,074          263    1,158,220   1.16s   2.45%
integrator_shade_surface                    1,159,764,041          676    1,715,627  20.57s  43.52%
integrator_shade_shadow                       598,885,847          418    1,432,741   1.27s   2.69%
integrator_queued_paths_array               2,969,650,130          805    3,689,006   0.35s   0.74%
integrator_queued_shadow_paths_array          593,936,619          379    1,567,115   0.14s   0.29%
integrator_terminated_paths_array              22,205,417          155      143,260   0.05s   0.10%
integrator_sorted_paths_array               2,517,140,043          676    3,723,579   1.65s   3.50%
integrator_compact_paths_array                648,912,748          155    4,186,533   0.03s   0.07%
integrator_compact_states                      20,872,687          155      134,662   0.14s   0.29%
integrator_terminated_shadow_paths_array      374,100,675          438      854,111   0.16s   0.33%
integrator_compact_shadow_paths_array         503,768,657          438    1,150,156   0.05s   0.10%
integrator_compact_shadow_states               37,664,941          202      186,460   0.23s   0.50%
integrator_reset                               25,165,824            6    4,194,304   0.06s   0.12%
film_convert_combined_half_rgba                 3,110,400            6      518,400   0.00s   0.01%
prefix_sum                                            676          676            1   0.19s   0.40%
---------------------------------------------------------------------------------------------------
                                                                 6,760               47.27s 100.00%
---------------------------------------------------------------------------------------------------
```

Reviewed By: brecht

Differential Revision: https://developer.blender.org/D15044
2022-06-07 11:08:39 +01:00
Patrick Mours
5c6053ccb1 Fix misaligned address error when rendering 3D curves in the viewport with Cycles and OptiX 7.4
Acceleration structures in the viewport default to building with the fast
build flag, but the intersection program used for curves was queried with
the fast trace flag. The resulting mismatch caused an exception in the
intersection kernel. Since it's difficult to predict whether dynamic or static
acceleration structures are going to be built at the time of kernel loading,
this fixes the mismatch by always using the fast trace flag for curves.
2022-06-03 12:24:13 +02:00
Campbell Barton
61a7e5be18 Cleanup: '*' prefix C-comment blocks 2022-06-01 15:38:48 +10:00
Brecht Van Lommel
610619c203 Merge branch 'blender-v3.2-release' 2022-05-31 17:35:16 +02:00
Brecht Van Lommel
f2cd7e08fe Fix Cycles MNEE not working for Metal
Move MNEE to own kernel, separate from shader ray-tracing. This does introduce
the limitation that a shader can't use both MNEE and AO/bevel, but that seems
like the better trade-off for now.

We can experiment with bigger kernel organization changes later.

Differential Revision: https://developer.blender.org/D15070
2022-05-31 17:24:43 +02:00
Patrick Mours
a8c81ffa83 Cycles: Add half precision float support for volumes with NanoVDB
This patch makes it possible to change the precision with which to
store volume data in the NanoVDB data structure (as float, half, or
using variable bit quantization) via the previously unused precision
field in the volume data block.
It makes it possible to further reduce memory usage during
rendering, at a slight cost to the visual detail of a volume.

Differential Revision: https://developer.blender.org/D10023
2022-05-23 19:08:01 +02:00
Campbell Barton
427a2c920a Cleanup: spelling in comments, capitalize tags
Also add missing task-ID reference & remove colon after \note as it
doesn't render properly in doxygen.
2022-05-13 09:29:25 +10:00
Michael Jones
007184bcf2 Enable inlining on Apple Silicon. Use new process-wide ShaderCache in order to safely re-enable binary archives
This patch is the same as D14763, but with a fix for unit test failures caused by ShaderCache fetch logic not working in the non-MetalRT case:

```
diff --git a/intern/cycles/device/metal/kernel.mm b/intern/cycles/device/metal/kernel.mm
index ad268ae7057..6aa1a56056e 100644
--- a/intern/cycles/device/metal/kernel.mm
+++ b/intern/cycles/device/metal/kernel.mm
@@ -203,9 +203,12 @@ bool kernel_has_intersection(DeviceKernel device_kernel)

   /* metalrt options */
   request.pipeline->use_metalrt = device->use_metalrt;
-  request.pipeline->metalrt_hair = device->kernel_features & KERNEL_FEATURE_HAIR;
-  request.pipeline->metalrt_hair_thick = device->kernel_features & KERNEL_FEATURE_HAIR_THICK;
-  request.pipeline->metalrt_pointcloud = device->kernel_features & KERNEL_FEATURE_POINTCLOUD;
+  request.pipeline->metalrt_hair = device->use_metalrt &&
+                                   (device->kernel_features & KERNEL_FEATURE_HAIR);
+  request.pipeline->metalrt_hair_thick = device->use_metalrt &&
+                                         (device->kernel_features & KERNEL_FEATURE_HAIR_THICK);
+  request.pipeline->metalrt_pointcloud = device->use_metalrt &&
+                                         (device->kernel_features & KERNEL_FEATURE_POINTCLOUD);

   {
     thread_scoped_lock lock(cache_mutex);
@@ -225,9 +228,9 @@ bool kernel_has_intersection(DeviceKernel device_kernel)

   /* metalrt options */
   bool use_metalrt = device->use_metalrt;
-  bool metalrt_hair = device->kernel_features & KERNEL_FEATURE_HAIR;
-  bool metalrt_hair_thick = device->kernel_features & KERNEL_FEATURE_HAIR_THICK;
-  bool metalrt_pointcloud = device->kernel_features & KERNEL_FEATURE_POINTCLOUD;
+  bool metalrt_hair = use_metalrt && (device->kernel_features & KERNEL_FEATURE_HAIR);
+  bool metalrt_hair_thick = use_metalrt && (device->kernel_features & KERNEL_FEATURE_HAIR_THICK);
+  bool metalrt_pointcloud = use_metalrt && (device->kernel_features & KERNEL_FEATURE_POINTCLOUD);

   MetalKernelPipeline *best_pipeline = nullptr;
   for (auto &pipeline : collection) {

```

Reviewed By: brecht

Differential Revision: https://developer.blender.org/D14923
2022-05-11 16:20:59 +01:00
Campbell Barton
12a1fa9cf4 Cleanup: format 2022-05-06 18:27:44 +10:00
Patrick Mours
6fa5d520b8 Cycles: Add support for parallel compilation of OptiX module
OptiX 7.4 adds support for splitting the costly creation of an OptiX
module into smaller tasks that can be executed in parallel on a
thread pool.
This is only really relevant for the "shader_raytrace" kernel variant
as the main one is small and compiles fast either way. It sheds of
a few seconds there (total gain is not massive currently, since it is
difficult for the compiler to split up the huge shading entry point
that is the primary one taking up time, but it is still measurable).

Differential Revision: https://developer.blender.org/D14845
2022-05-05 14:35:41 +02:00
Brecht Van Lommel
52a5f68562 Revert "Cycles: Enable inlining on Apple Silicon for 1.1x speedup"
This reverts commit b82de02e7c. It is causing
crashes in various regression tests.

Ref D14763
2022-04-28 00:46:43 +02:00