Commit Graph

7166 Commits

Author SHA1 Message Date
Brecht Van Lommel
3407ed5f9b Cleanup: change internal Cycles compact BVH default to match UI 2022-07-18 15:34:13 +02:00
Brecht Van Lommel
5152c7c152 Cycles: refactor rays to have start and end distance, fix precision issues
For transparency, volume and light intersection rays, adjust these distances
rather than the ray start position. This way we increment the start distance
by the smallest possible float increment to avoid self intersections, and be
sure it works as the distance compared to be will be exactly the same as
before, due to the ray start position and direction remaining the same.

Fix T98764, T96537, hair ray tracing precision issues.

Differential Revision: https://developer.blender.org/D15455
2022-07-15 18:46:24 +02:00
Brecht Van Lommel
bb376da6df Fix Cycles MetalRT error after recent specialization changes 2022-07-15 18:28:13 +02:00
Brecht Van Lommel
011d3c75a7 Cleanup: compiler warning 2022-07-15 15:20:53 +02:00
Brecht Van Lommel
523bbf7065 Cycles: generalize shader sorting / locality heuristic to all GPU devices
This was added for Metal, but also gives good results with CUDA and OptiX.
Also enable it for future Apple GPUs instead of only M1 and M2, since this has
been shown to help across multiple GPUs so the better bet seems to enable
rather than disable it.

Also moves some of the logic outside of the Metal device code, and always
enables the code in the kernel since other devices don't do dynamic compile.

Time per sample with OptiX + RTX A6000:
                                         new                  old
barbershop_interior                      0.0730s              0.0727s
bmw27                                    0.0047s              0.0053s
classroom                                0.0428s              0.0464s
fishy_cat                                0.0102s              0.0108s
junkshop                                 0.0366s              0.0395s
koro                                     0.0567s              0.0578s
monster                                  0.0206s              0.0223s
pabellon                                 0.0158s              0.0174s
sponza                                   0.0088s              0.0100s
spring                                   0.1267s              0.1280s
victor                                   0.0524s              0.0531s
wdas_cloud                               0.0817s              0.0816s

Ref D15331, T87836
2022-07-15 13:42:47 +02:00
Michael Jones
da4ef05e4d Cycles: Apple Silicon optimization to specialize intersection kernels
The Metal backend now compiles and caches a second set of kernels which are
optimized for scene contents, enabled for Apple Silicon.

The implementation supports doing this both for intersection and shading
kernels. However this is currently only enabled for intersection kernels that
are quick to compile, and already give a good speedup. Enabling this for
shading kernels would be faster still, however this also causes a long wait
times and would need a good user interface to control this.

M1 Max samples per minute (macOS 13.0):

                    PSO_GENERIC  PSO_SPECIALIZED_INTERSECT  PSO_SPECIALIZED_SHADE

barbershop_interior       83.4	            89.5                   93.7
bmw27                   1486.1	          1671.0                 1825.8
classroom                175.2	           196.8                  206.3
fishy_cat                674.2	           704.3                  719.3
junkshop                 205.4	           212.0                  257.7
koro                     310.1	           336.1                  342.8
monster                  376.7	           418.6                  424.1
pabellon                 273.5	           325.4                  339.8
sponza                   830.6	           929.6                 1142.4
victor                    86.7              96.4                   96.3
wdas_cloud               111.8	           112.7                  183.1

Code contributed by Jason Fielder, Morteza Mostajabodaveh and Michael Jones

Differential Revision: https://developer.blender.org/D14645
2022-07-15 13:40:04 +02:00
Michael Jones
5653c5fcdd Cycles: keep track of SVM nodes used in kernels
To be used for specialization in Metal, to automatically leave out unused nodes
from the kernel.

Ref D14645
2022-07-15 13:40:04 +02:00
Brecht Van Lommel
79da7f2a8f Cycles: refactor to move part of KernelData definition to template header
To be used for specialization on Metal in a following commit, turning these
members into compile time constants.

Ref D14645
2022-07-15 13:40:04 +02:00
Damien Picard
2e70d5cb98 Render: camera depth of field support for armature bone targets
This is useful when using an armature as a camera rig, to avoid creating and
targetting an empty object.

Differential Revision: https://developer.blender.org/D7012
2022-07-15 13:40:04 +02:00
Brecht Van Lommel
b8ffd43bd2 Cleanup: make format 2022-07-15 13:40:04 +02:00
Olivier Maury
1b5db02a02 Fix Cycles MNEE wrong results with area light spread
When the solve is successful, the light sample needs to be updated since the
effective shading point is now on the last refractive interface. Spread was
not taken into account, creating false caustics.

Differential Revision: https://developer.blender.org/D15449
2022-07-14 16:36:38 +02:00
Brecht Van Lommel
28c3739a9b Cleanup: replace state flow macros in the kernel with functions 2022-07-14 16:36:38 +02:00
Brecht Van Lommel
5539fb3121 Cycles: add presets to the Performance panel
With choices Default, Lower Memory and Faster Render. For convenience, and
to help communicate what the various settings do.

Differential Revision: https://developer.blender.org/D15446
2022-07-14 16:36:38 +02:00
Michael Jones
4b1d315017 Cycles: Improve cache usage on Apple GPUs by chunking active indices
This patch partitions the active indices into chunks prior to sorting by material in order to tradeoff some material coherence for better locality. On Apple Silicon GPUs (particularly higher end M1-family GPUs), we observe overall render time speedups of up to 15%. The partitioning is implemented by repeating the range of `shader_sort_key` for each partition, and encoding a "locator" key which distributes the indices into sorted chunks.

Reviewed By: brecht

Differential Revision: https://developer.blender.org/D15331
2022-07-14 14:26:18 +01:00
Campbell Barton
2d04012e57 Cleanup: spelling in comments
Also remove duplicate comments in bmesh_log.h, caused by automated
comment relocation in [0].

[0]: c4e041da23
2022-07-14 22:02:52 +10:00
Xavier Hallade
5f09440d5a Cycles: Make not-compact BVH the default for embree
Measurements shown on average a 1.08x speedup for a 1.04x increase in
memory usage which is an acceptable trade off for a default setting,
although discoverability of such settings influencing memory usage could
be improved.

Reviewed By: brecht

Differential Revision: https://developer.blender.org/D15429
2022-07-12 18:40:14 +02:00
Xavier Hallade
47dd42485e Cycles: fix and enable JIT oneAPI CentOS7 builds for drivers 23570+
The current specific CentOS7 workaround we have for AoT, which is to
disable __FAST_MATH__ by using -fhonor-nans, now also fixes the
compilation issue for JIT as well since at least driver 23570.
2022-07-12 15:55:32 +02:00
Brecht Van Lommel
6e426259b4 Fix T99218: light group add button should be disabled when name is empty
Previously it was inactive but still clickable.

Ref D15316
2022-07-11 14:02:38 +02:00
Brecht Van Lommel
8159e0a666 Curves: use consistent default radius for Cycles, Eevee, Set Curve Radius node
To avoid Cycles not showing any hair by default, and to avoid very slow render
due to many overlaps with the previous 1 meter default in the node.

Fixes T97584, T99319

Differential Revision: https://developer.blender.org/D15405
2022-07-08 16:21:32 +02:00
Xavier Hallade
0f50ae131f Cycles: enable oneAPI in Linux release builds
with a very high min-driver version requirement, placeholder until JIT
CentOS runtime compilation issue gets fixed in a defined version.
min-driver version check can be worked around by setting
CYCLES_ONEAPI_ALL_DEVICES environment variable.
2022-07-08 15:39:13 +02:00
Xavier Hallade
190ad73590 Cycles oneAPI: Remove direct dependency on Level-Zero
We used it only to access device id for explicitly allowing Arc GPUs.
It made the backend require ze_loader.dll which could be problematic if
we end up using direct linking.
I've replaced filtering based on PCI device id by using other HW properties
instead (EUs, threads per EU), that are now available through Level-Zero.
2022-07-06 18:55:38 +02:00
Xavier Hallade
debb233787 Cleanup: fix comments in oneAPI kernel.cpp 2022-07-06 18:55:38 +02:00
Nikita Sirgienko
0df574b55e Cycles: Improve an occupancy for Intel GPUs
Initially oneAPI implementation have waited after each memory
operation, even if there was no need for this. Now, the implementation
will wait only if it is really necessary - it have improved
performance noticeble for some scenes and a bit for the rest of them.
2022-07-06 17:26:23 +02:00
Xavier Hallade
41c10ac84a Cycles: fix support for multiple Intel GPUs
Identical Intel GPUs ended up with the same id.
Added PCI BDF to the id to make it unique.
2022-07-01 11:20:00 +02:00
Xavier Hallade
0554537c3c Cleanup: add missing license headers in Cycles oneAPI implementation 2022-07-01 10:13:07 +02:00
Brecht Van Lommel
fbcc00d10d Fix broken Cycles performance benchmark after recent logging changes
Ensure full render report is printed with default verbosity.
2022-06-30 19:51:50 +02:00
Andrii Symkin
f00d9e80ae Cycles: add more math functions for float4
Add more math functions for float4 to make them on par with float3 ones. It
makes it possible to change the types of float3 variables to float4 without
additional work.

Differential Revision: https://developer.blender.org/D15318
2022-06-30 16:25:21 +02:00
Campbell Barton
feeb8310c8 Cleanup: format 2022-06-30 12:14:23 +10:00
Campbell Barton
b6c28002ac Cleanup: spelling in comments 2022-06-30 12:14:22 +10:00
Xavier Hallade
a02992f131 Cycles: Add support for rendering on Intel GPUs using oneAPI
This patch adds a new Cycles device with similar functionality to the
existing GPU devices.  Kernel compilation and runtime interaction happen
via oneAPI DPC++ compiler and SYCL API.

This implementation is primarly focusing on Intel® Arc™ GPUs and other
future Intel GPUs.  The first supported drivers are 101.1660 on Windows
and 22.10.22597 on Linux.

The necessary tools for compilation are:
- A SYCL compiler such as oneAPI DPC++ compiler or
  https://github.com/intel/llvm
- Intel® oneAPI Level Zero which is used for low level device queries:
  https://github.com/oneapi-src/level-zero
- To optionally generate prebuilt graphics binaries: Intel® Graphics
  Compiler All are included in Linux precompiled libraries on svn:
  https://svn.blender.org/svnroot/bf-blender/trunk/lib The same goes for
  Windows precompiled binaries but for the graphics compiler, available
  as "Intel® Graphics Offline Compiler for OpenCL™ Code" from
  https://www.intel.com/content/www/us/en/developer/articles/tool/oneapi-standalone-components.html,
  for which path can be set as OCLOC_INSTALL_DIR.

Being based on the open SYCL standard, this implementation could also be
extended to run on other compatible non-Intel hardware in the future.

Reviewed By: sergey, brecht

Differential Revision: https://developer.blender.org/D15254

Co-authored-by: Nikita Sirgienko <nikita.sirgienko@intel.com>
Co-authored-by: Stefan Werner <stefan.werner@intel.com>
2022-06-29 12:58:04 +02:00
Brecht Van Lommel
c257443192 Fix Cycles assert with mix weights outside of 0..1 range
This could result in wrong skipping of SVM nodes in the graph. Now make the
logic consistent with the clamping in the OSL implementation and constant
folding.

Thanks to Christophe Hery for finding the problem and providing the fix.
2022-06-28 19:13:57 +02:00
Sayak Biswas
abfa09752f Cycles: enable Vega GPU/APU support
Enables Vega and Vega II GPUs as well as Vega APU, using changes in HIP code
to support 64-bit waves and a new HIP SDK version.

Tested with Radeon WX9100, Radeon VII GPUs and Ryzen 7 PRO 5850U with Radeon
Graphics APU.

Ref T96740, T91571

Differential Revision: https://developer.blender.org/D15242
2022-06-28 18:35:43 +02:00
Brecht Van Lommel
9b6e86ace1 Cycles: stop Metal rendering on command buffer error
If there is an error we should stop rendering, instead of finishing with a
wrong render result or reporting a wrong benchmark time.

Ref T96519

Differential Revision: https://developer.blender.org/D15287
2022-06-24 16:51:56 +02:00
Brecht Van Lommel
a5ff46e0fc Cleanup: make format 2022-06-23 19:28:39 +02:00
Xavier Hallade
633c2f07da Cyles: switch primitive.h inline hints to forceinline
This change helps decrease Intel GPU binaries compile time by 5-10
minutes without impacting other backends.

Reviewed By: sergey, brecht

Differential Revision: http://developer.blender.org/D15273
2022-06-23 18:36:48 +02:00
Andrii Symkin
c2a2f3553a Cycles: unify math functions names
This patch unifies the names of math functions for different data types and uses
overloading instead. The goal is to make it possible to swap out all the float3
variables containing RGB data with something else, with as few as possible
changes to the code. It's a requirement for future spectral rendering patches.

Differential Revision: https://developer.blender.org/D15276
2022-06-23 15:02:53 +02:00
Michael Jones
d8e9647ae2 Cycles: Add diagnostic tracing of MTLLibrary compilation time
Reviewed By: sergey

Differential Revision: https://developer.blender.org/D15268
2022-06-23 10:06:20 +01:00
Michael Jones
532b33973b Cycles: Tidy of KernelData patchup code
Reviewed By: sergey

Differential Revision: https://developer.blender.org/D15267
2022-06-22 22:38:00 +01:00
Michael Jones
328a911379 Cycles: Distinguish Apple GPUs by core count
This patch suffixes Apple GPU device names with `(GPU - # cores)` so that variant GPUs with the same chipset can be distinguished. Currently benchmark scores for these M1 family GPUs are being incorrectly merged:

- M1: 7 or 8 cores
- M1 Pro: 14 or 16 cores
- M1 Max: 24 or 32 cores
- M1 Ultra: 48 or 64 cores

Reviewed By: brecht, sergey

Differential Revision: https://developer.blender.org/D15257
2022-06-22 22:32:56 +01:00
Brecht Van Lommel
ff1883307f Cleanup: renaming and consistency for kernel data
* Rename "texture" to "data array". This has not used textures for a long time,
  there are just global memory arrays now. (On old CUDA GPUs there was a cache
  for textures but not global memory, so we used to put all data in textures.)
* For CUDA and HIP, put globals in KernelParams struct like other devices.
* Drop __ prefix for data array names, no possibility for naming conflict now that
  these are in a struct.
2022-06-20 12:30:48 +02:00
Brecht Van Lommel
2c1bffa286 Cleanup: add verbose logging category names instead of numbers
And use them more consistently than before.
2022-06-17 14:08:14 +02:00
Brecht Van Lommel
24246d9870 Cleanup: replace uint4 by AttributeMap struct 2022-06-17 14:08:14 +02:00
Michael Jones
19e0b60f3e Cycles: MetalDeviceQueue - capture of multiple dispatches, and some tidying
This patch adds a new mode of gpu capture (env var `CYCLES_DEBUG_METAL_CAPTURE_SAMPLES`) to capture a block of dispatches between "reset" calls. It also fixes member data naming inconsistencies and adds some missing OS version checks.

Screenshot showing .gputrace capture in Xcode 14.0 beta (using `CYCLES_DEBUG_METAL_CAPTURE_SAMPLES="1"` and `CYCLES_DEBUG_METAL_CAPTURE_LIMIT="10"`):

{F13155703}

Reviewed By: sergey, brecht

Differential Revision: https://developer.blender.org/D15179
2022-06-13 13:42:07 +01:00
Sergey Sharybin
0fddff027e Cleanup: Unused but set variable in Cycles Metal profiler 2022-06-09 10:20:26 +02:00
Aaron Carlisle
a632260828 Cleanup: Removed unused variable 2022-06-08 22:28:46 -04:00
Michael Jones
4412e14708 Cycles: Useful Metal backend debug & profiling functionality
This patch adds some useful debugging & profiling env vars to the Metal backend:

- `CYCLES_METAL_PROFILING`: output a per-kernel timing report at the end of the render
- `CYCLES_METAL_DEBUG`: enable per-dispatch tracing (very verbose)
- `CYCLES_DEBUG_METAL_CAPTURE_KERNEL`: enable programatic .gputrace capture for a specified kernel index

Here's an example of the timing report with `CYCLES_METAL_PROFILING` enabled:

```
---------------------------------------------------------------------------------------------------
Kernel name                                 Total threads   Dispatches     Avg. T/D    Time   Time%
---------------------------------------------------------------------------------------------------
integrator_init_from_camera                   657,407,232          161    4,083,274   0.24s   0.51%
integrator_intersect_closest                1,629,288,440          681    2,392,494  15.18s  32.12%
integrator_intersect_shadow                   751,652,291          470    1,599,260   5.80s  12.28%
integrator_shade_background                   304,612,074          263    1,158,220   1.16s   2.45%
integrator_shade_surface                    1,159,764,041          676    1,715,627  20.57s  43.52%
integrator_shade_shadow                       598,885,847          418    1,432,741   1.27s   2.69%
integrator_queued_paths_array               2,969,650,130          805    3,689,006   0.35s   0.74%
integrator_queued_shadow_paths_array          593,936,619          379    1,567,115   0.14s   0.29%
integrator_terminated_paths_array              22,205,417          155      143,260   0.05s   0.10%
integrator_sorted_paths_array               2,517,140,043          676    3,723,579   1.65s   3.50%
integrator_compact_paths_array                648,912,748          155    4,186,533   0.03s   0.07%
integrator_compact_states                      20,872,687          155      134,662   0.14s   0.29%
integrator_terminated_shadow_paths_array      374,100,675          438      854,111   0.16s   0.33%
integrator_compact_shadow_paths_array         503,768,657          438    1,150,156   0.05s   0.10%
integrator_compact_shadow_states               37,664,941          202      186,460   0.23s   0.50%
integrator_reset                               25,165,824            6    4,194,304   0.06s   0.12%
film_convert_combined_half_rgba                 3,110,400            6      518,400   0.00s   0.01%
prefix_sum                                            676          676            1   0.19s   0.40%
---------------------------------------------------------------------------------------------------
                                                                 6,760               47.27s 100.00%
---------------------------------------------------------------------------------------------------
```

Reviewed By: brecht

Differential Revision: https://developer.blender.org/D15044
2022-06-07 11:08:39 +01:00
Campbell Barton
263371dc4e Cleanup: spelling in comments, additional white space 2022-06-07 15:01:03 +10:00
Brecht Van Lommel
da45c12bef Merge branch 'blender-v3.2-release' 2022-06-03 19:02:46 +02:00
Patrick Mours
34f94a02f3 Fix use of OpenGL interop breaking in Hydra viewports that do not support it
Rendering directly to a resource using OpenGL interop and Hgi
doesn't work in Houdini, since it never uses the resulting resource
(it does not call `HdRenderBuffer::GetResource`). But since doing
that simultaneously disables mapping (`HdRenderBuffer::Map` is
not implemented then), nothing was displayed. To fix this, keep
track of whether a Hydra viewport does support displaying a Hgi
resource directly, by checking whether
`HdRenderBuffer::GetResource` is ever called and only enable use
of OpenGL interop if that is the case.

Differential Revision: https://developer.blender.org/D15090
2022-06-03 18:56:30 +02:00
Dalai Felinto
e7156be86e Merge remote-tracking branch 'origin/blender-v3.2-release' 2022-06-03 16:13:51 +02:00