Commit Graph

1118 Commits

Author SHA1 Message Date
Michael Jones
482fb791ce Fix #105100: Metal using wrong kernels in multi-pass renders
This fixes issue [#105100](https://projects.blender.org/blender/blender/issues/105100) where multi-pass renders can be incorrect due to kernels using stale specialisation constants (e.g. when rendering Pokedstudio).

This patch adds a new group of md5 hashes (`global_defines_md5`) to track whether the injected block of #defines is stale and regenerate the source string as appropriate. It also renames the existing group of md5 hashes from `source_md5` to `kernels_md5` to clarify that these refer to a specific kernel set rather than just the source (which might build an arbitrarily large number of kernel sets).

Pull Request #105103
2023-02-23 11:07:28 +01:00
Brecht Van Lommel
6583acb880 Fix Cycles MetalRT access of macOS 11 features when unavailable
After recent changes in 2d994de.

Pull Request #104976
2023-02-21 12:03:21 +01:00
Brecht Van Lommel
6a0b1eae8c Fix #104097: re-enable Cycles AMD Vega support
The internal compiler error appears to be gone. Unclear why it appeared in the
first place and why it's gone now. Just random kernel code changes causing it.

Pull Request #104719
2023-02-13 22:53:08 +01:00
Campbell Barton
91346755ce Cleanup: use '#' prefix for issues instead of 'T'
Match the convention from Gitea instead of Phabricator's T for tasks.
2023-02-12 14:56:05 +11:00
Michael Jones (Apple)
01480229b1 Cycles: Fix MetalRT checkbox not hooked up to device on AMD
(Follow on from D17043)
On AMD Navi2 devices the MetalRT checkbox was not hooked up properly and had no effect. This patch fixes it.

Co-authored-by: Michael Jones <michael_p_jones@apple.com>
Pull Request #104520
2023-02-10 10:55:39 +01:00
Lucas Tadeu
a1282ab015 Fix Cycles debug build error after host falback changes
Introduced in dcfb6df9ce6.

Co-authored-by: Lucas Tadeu Teixeira <lucas@lucastadeu.com>

Pull Request #104454
2023-02-08 19:27:40 +01:00
Campbell Barton
a99022e22d Cleanup: spelling in comments 2023-02-07 14:17:01 +11:00
Nikita Sirgienko
6dcfb6df9c Cycles: Abstract host memory fallback for GPU devices
Host memory fallback in CUDA and HIP devices is almost identical.
We remove duplicated code and create a shared generic version that
other devices (oneAPI) will be able to use.

Reviewed By: brecht

Differential Revision: https://developer.blender.org/D17173
2023-02-06 22:19:32 +01:00
Michael Jones
2d994de77c Cycles: MetalRT optimisation for subsurface intersection queries
This patch optimises subsurface intersection queries on MetalRT. Currently intersect_local traverses from the scene root, retrospectively discarding all non-local hits. Using a lookup of bottom level acceleration structures, we can explicitly query only the relevant instance. On M1 Max, with MetalRT selected, this can give a render speedup of 15-20% for scenes like Monster which make heavy use of subsurface scattering.

Patch authored by Marco Giordano.

Reviewed By: brecht

Differential Revision: https://developer.blender.org/D17153
2023-02-06 19:12:29 +00:00
Patrick Mours
f2538c7173 Fix T104335: MNEE + OptiX OSL results in illegal address error
The OptiX pipeline created for OSL was missing sufficient continuation
stack to handle the MNEE ray generation program.
2023-02-06 15:06:52 +01:00
Michael Jones
654e1e901b Cycles: Use local atomics for faster shader sorting (enabled on Metal)
This patch adds two new kernels: SORT_BUCKET_PASS and SORT_WRITE_PASS. These replace PREFIX_SUM and SORTED_PATHS_ARRAY on supported devices (currently implemented on Metal, but will be trivial to enable on the other backends). The new kernels exploit sort partitioning (see D15331) by sorting each partition separately using local atomics. This can give an overall render speedup of 2-3% depending on architecture. As before, we fall back to the original non-partitioned sorting when the shader count is "too high".

Reviewed By: brecht

Differential Revision: https://developer.blender.org/D16909
2023-02-06 11:18:26 +00:00
Michael Jones
be0912a402 Cycles: Prevent use of both AMD and Intel Metal devices at same time
This patch removes the option to select both AMD and Intel GPUs on system that have both. Currently both devices will be selected by default which results in crashes and other poorly understood behaviour. This patch adds precedence for using any discrete AMD GPU over an integrated Intel one. This can be overridden with CYCLES_METAL_FORCE_INTEL.

Reviewed By: brecht

Differential Revision: https://developer.blender.org/D17166
2023-02-06 11:13:33 +00:00
Michael Jones
0a3df611e7 Fix T103393: Cycles: Undefine __LIGHT_TREE__ on Metal/AMD to fix perf
This patch fixes T103393 by undefining `__LIGHT_TREE__` on Metal/AMD as it has an unexpected & major impact on performance even when light trees are not in use.

Patch authored by Prakash Kamliya.

Reviewed By: brecht

Maniphest Tasks: T103393

Differential Revision: https://developer.blender.org/D17167
2023-02-06 11:12:34 +00:00
Campbell Barton
266d8de687 Cleanup: spelling in comments 2023-02-03 12:41:01 +11:00
Xavier Hallade
8afcecdf1f Cycles: update Intel Graphics compiler to 101.4032 on Windows
A noticeable (>5%) performance regression in oneAPI backend came with
a501a2dbff. Updating to latest graphics
compiler from driver 101.4032 fixes it.

I've tested it with current min-supported drivers and it runs well but
since compatibility of graphics compiler with older drivers isn't
guaranteed, I'm also bumping the min-supported driver versions.

If end-users consider latest drivers too fresh to switch to (version
isn't released as stable on Linux as of today but should be before
Blender 3.5 release), CYCLES_ONEAPI_ALL_DEVICES=1 env variable can be
used.

Intel Graphics Compiler on Linux will be updated in a later commit
so we can then close D16984.

Reviewed By: sergey, LazyDodo
2023-01-23 19:36:34 +01:00
Brecht Van Lommel
8e56ded86d Cycles: temporarily disable AMD Vega GPU rendering due to compiler bug
To make daily builds pass while we figure this out.

Ref T104097
2023-01-23 17:30:12 +01:00
Brecht Van Lommel
fe552bf236 Cleanup: make format 2023-01-19 22:48:05 +01:00
Michael Jones
e270a198a5 Cycles: Markup to disable specialisation of kernel data fields (Metal)
This patch adds markup to specify that certain kernel data constants should not be specialised. Currently it is used for `tabulated_sobol_sequence_size` and `sobol_index_mask` which change frequently based on the aa sample count, trash the shader cache, and have little bearing on performance.

Reviewed By: brecht

Differential Revision: https://developer.blender.org/D16968
2023-01-19 17:57:42 +00:00
Michael Jones
08b3426df9 Cycles: Occupancy tuning for new higher end M2 machines
This patch adds occupancy tuning for the newly announced high-end M2 machines, giving 10-15% render speedup over a pre-tuned build.

Reviewed By: brecht

Differential Revision: https://developer.blender.org/D17037
2023-01-19 17:56:40 +00:00
Brecht Van Lommel
a84a8a528d Cycles: remove SSE3 and AVX kernel optimization levels
While keeping SSE2, SSE4.1 and AVX2. This does not affect hardware support, it
only slightly reduces performance for some older CPUs.

To reduce maintenance cost and improve compile times.

Differential Revision: https://developer.blender.org/D16978
2023-01-16 17:53:36 +01:00
Campbell Barton
63c985e0f7 Cleanup: format 2023-01-09 18:56:54 +11:00
Campbell Barton
14fc02f91d Cleanup: spelling in comments 2023-01-06 14:00:36 +11:00
Brecht Van Lommel
87f7b630b5 Cleanup: make format 2023-01-05 19:43:19 +01:00
Michael Jones
a7cc6e015c Cycles: Additional Metal kernel specialisation exposed through UI
This patch adds a new "Kernel Optimization Level" dropdown menu to control Metal kernel specialisation. Currently this defaults to "full" optimisation, on the assumption that the changes proposed in D16371 will address usability concerns around app responsiveness and shader cache housekeeping.

Reviewed By: brecht

Differential Revision: https://developer.blender.org/D16514
2023-01-04 23:36:52 +00:00
Chris Blackbourn
496d736adc Cleanup: format 2023-01-05 11:21:51 +13:00
Jacques Lucke
2540a52f91 Cleanup: quiet unused parameter warning 2023-01-04 17:30:55 +01:00
Michael Jones
77c3e67d3d Cycles: Improved render start/stop responsiveness on Metal
All kernel specialisation is now performed in the background regardless of kernel type, meaning that the first render will be visible a few seconds sooner. The only exception is during benchmark warm up, in which case we wait for all kernels to be cached. When stopping a render, we call a new `cancel()` method on the device which causes any outstanding compilation work to be cancelled, and we destroy the device in a detached thread so that any stale queued compilations can be safely purged without blocking the UI for longer than necessary.

Reviewed By: brecht

Differential Revision: https://developer.blender.org/D16371
2023-01-04 16:00:53 +00:00
Nikita Sirgienko
858fffc2df Cycles: oneAPI: add support for SYCL host task
This functionality is related only to debugging of SYCL implementation
via single-threaded CPU execution and is disabled by default.
Host device has been deprecated in SYCL 2020 spec and we removed it
in 305b92e05f.
Since this is still very useful for debugging, we're restoring a
similar functionality here through SYCL 2020 Host Task.
2023-01-03 20:47:24 +01:00
Campbell Barton
2ac6e26c25 Cleanup: cmake formatting 2022-12-17 13:33:27 +11:00
Patrick Mours
a8530d31c2 Fix T103258: Deleting a shader with OptiX OSL results in an illegal address error
Materials without connections to the output node would crash with OSL
in OptiX, since the Cycles `OSLCompiler` generates an empty shader
group reference for them, which resulted in the OptiX device
implementation setting an empty SBT entry for the corresponding direct
callables, which then crashed when calling those direct callables was
attempted in `osl_eval_nodes`. This fixes that by setting the SBT entries
for empty shader groups to a dummy direct callable that does nothing.
2022-12-16 15:41:21 +01:00
Patrick Mours
c9eb583460 Fix T103257: Enabling or disabling viewport denoising while using OptiX OSL results in an error
Switching viewport denoising causes kernels to be reloaded with a new
feature mask, which would destroy the existing OptiX pipelines. But OSL
kernels were not reloaded as well, leaving the shading pipeline
uninitialized and therefore causing an error when it is later attempted to
execute it. This fixes that by ensuring OSL kernels are always reloaded
when the normal kernels are too.
2022-12-16 14:04:03 +01:00
Hallam Roberts
a501a2dbff Images: add mirror extension type
This adds a new mirror image extension type for shaders and
geometry nodes (next to the existing repeat, extend and clip
options).

See D16432 for a more detailed explanation of `wrap_mirror`.

This also adds a new sampler flag `GPU_SAMPLER_MIRROR_REPEAT`.
It acts as a modifier to `GPU_SAMPLER_REPEAT`, so any `REPEAT`
flag must be set for the `MIRROR` flag to have an effect.

Differential Revision: https://developer.blender.org/D16432
2022-12-14 19:27:29 +01:00
Thomas Dinges
54aec4629e Cleanup: Remove unused code in Cycles
* preempt_attr was copied from CUDA, but not used in HIP.
* Remove shadowed variable before conditional in EnvironmentTextureNode code.

Differential Revision: https://developer.blender.org/D16741
2022-12-12 18:15:41 +01:00
Michael Jones
2dc51fccb8 Fix T101787, T102786. Cycles: Improved out-of-memory messaging on Metal
This patch adds a new `max_working_set_exceeded()` check on Metal so that we can display a "System is out of GPU memory" message to the user. Without this, we get obtuse "CommandBuffer failed" errors at render time due to exceeding the size limit of resident resources.

Likely fix for T101787 & T102786.

Reviewed By: brecht

Differential Revision: https://developer.blender.org/D16713
2022-12-07 13:56:21 +00:00
Weizhen Huang
ee89f213de Cycles: improve many lights sampling using light tree
Uses a light tree to more effectively sample scenes with many lights. This can
significantly reduce noise, at the cost of a somewhat longer render time per
sample.

Light tree sampling is enabled by default. It can be disabled in the Sampling >
Lights panel. Scenes using light clamping or ray visibility tricks may render
different as these are biased techniques that depend on the sampling strategy.

The implementation is currently disabled on AMD HIP. This is planned to be fixed
before the release.

Implementation by Jeffrey Liu, Weizhen Huang, Alaska and Brecht Van Lommel.

Ref T77889
2022-12-05 16:09:03 +01:00
Brecht Van Lommel
009f7de619 Cleanup: use better matching integer types for graphics interop handle
Ref D16042
2022-12-01 15:55:48 +01:00
Nikita Sirgienko
f07b09da27 Cycles: Improve oneAPI backend support for non-Intel platforms 2022-11-25 17:46:59 +01:00
Nikita Sirgienko
412642865d Cleanup: Resolve a warning for the ambiguity on the parenthesis in oneAPI code
No functional changes.
2022-11-24 18:05:02 +01:00
Patrick Mours
a859837cde Cleanup: Move OptiX denoiser code from device into denoiser class
Cycles already treats denoising fairly separate in its code, with a
dedicated `Denoiser` base class used to describe denoising
behavior. That class has been fully implemented for OIDN
(`denoiser_oidn.cpp`), but for OptiX was mostly empty
(`denoiser_optix.cpp`) and denoising was instead implemented in
the OptiX device. That meant denoising code was split over various
files and directories, making it a bit awkward to work with. This
patch moves the OptiX denoising implementation into the existing
`OptiXDenoiser` class, so that everything is in one place. There are
no functional changes, code has been mostly moved as-is. To
retain support for potential other denoiser implementations based
on a GPU device in the future, the `DeviceDenoiser` base class was
kept and slightly extended (and its file renamed to
`denoiser_gpu.cpp` to follow similar naming rules as
`path_trace_work_*.cpp`).

Differential Revision: https://developer.blender.org/D16502
2022-11-15 15:50:01 +01:00
Michael Jones
b0e2e45496 Cycles: Enable MetalRT pointclouds & other fixes
Code authored by Marco Giordano.

This fixes pointcloud rendering on MetalRT and some other subtle MetalRT bugs:
- Incorrect kernel hashing
- Missing specialisation constants
- Incorrect visibility filtering
- Missing null pointer check

Reviewed By: brecht

Differential Revision: https://developer.blender.org/D16499
2022-11-14 16:39:18 +00:00
Michael Jones
2c596319a4 Cycles: Cache only up to 5 kernels of each type on Metal
This patch adapts D14754 for the Metal backend. Kernels of the same type are already organised into subdirectories which simplifies type matching.

Reviewed By: brecht

Differential Revision: https://developer.blender.org/D16469
2022-11-11 18:10:29 +00:00
Patrick Mours
e6b38deb9d Cycles: Add basic support for using OSL with OptiX
This patch  generalizes the OSL support in Cycles to include GPU
device types and adds an implementation for that in the OptiX
device. There are some caveats still, including simplified texturing
due to lack of OIIO on the GPU and a few missing OSL intrinsics.

Note that this is incomplete and missing an update to the OSL
library before being enabled! The implementation is already
committed now to simplify further development.

Maniphest Tasks: T101222

Differential Revision: https://developer.blender.org/D15902
2022-11-09 15:30:21 +01:00
Chris Blackbourn
4b57bc4e5d Cleanup: format 2022-11-09 08:30:18 +13:00
Brecht Van Lommel
b539d425f0 Merge branch 'blender-v3.4-release' 2022-11-08 19:47:55 +01:00
Gon Solo
c306ccb67f Fix Cycles error with runtime compilation when there is no path to OptiX SDK
If no OPTIX_ROOT is set, nvcc fails to compile because there is a stray "-I"
in the arguments. Detect if the include path is empty and act accordingly.

Differential Revision: https://developer.blender.org/D16308
2022-11-08 19:40:57 +01:00
Michael Jones
74140d41b1 Cycles: Apple GPU threadgroup tuning
This patch tunes maximum threads-per-threadgroup and threads-per-block for faster renders on Apple GPUs. Appropriate tuning is selected based on the GPU architecture (M1 or M2). We see a benchmark uplift of around 5-10% on M1 family chips. Similar uplift is expected on M2 with upcoming OS changes. (Ref T101931)

Reviewed By: brecht

Maniphest Tasks: T101931

Differential Revision: https://developer.blender.org/D16299
2022-11-07 10:00:46 +00:00
Campbell Barton
6377d00a61 Cleanup: cmake comment line length 2022-11-03 12:11:08 +11:00
Xavier Hallade
454dd3f7f0 Cycles: fix up logic in oneAPI devices filtering
CYCLES_ONEAPI_ALL_DEVICES environment variable wasn't working as
intended after 305b92e05f.
2022-10-27 23:09:14 +02:00
Michael Jones
8dd7b5b26b Cycles: Metal integrator state size tuning
This patch tunes the integrator state sizing for Metal (`num_concurrent_states` and `num_concurrent_busy_states`).

On all GPUs architecture, we adjust the busy:total states ratio to be 1:4 which gives better rendering performance than the previous 1:16 ratio (independent of total state count). This gives a small performance uplift (e.g. 2-3% on M1 Ultra).

Additionally for M2 architectures, we double the overall state size if there is available headroom. Inclusive of the first change, we can expect uplift of close to 10% in future, as this results in larger dispatch sizes and minimises work submission overheads. In order to make an accurate determination of available headroom, we defer the calculation of `num_concurrent_states` and `num_concurrent_busy_states` until the time of integrator state allocation (i.e. after all of the scene data has been allocated). We also refactor `alloc_integrator_soa` to calculate an *exact* single-state-size in a first pass, right before allocating the integrator SoA buffers in a second pass.

Reviewed By: brecht

Differential Revision: https://developer.blender.org/D16313
2022-10-24 17:14:33 +01:00
Sergey Sharybin
2c108d5503 Avoid re-compilation of oneAPI AoT kernels when configuration changes
Buildbot infrastructure relies on the fact that it can enable and
disable `WITH_CYCLES_<COMPUTE>_BINARIES` without affecting speed of
incremental builds. This allows buildbot to skip GPU kernels when
doing CI regression tests which do not need GPU kernels, as well as
it allows to move GPU kernels compilation to a separate step where
all the resources are available to the GPU kernel builders.

For the oneAPI compute enabling and disabling AoT kernels has much
higher implications due to the kernels being a part of the device
implementation from the build target perspective.

This change makes it so different target names are used for JIT and
AoT configurations, which allows CMake to more fully benefit from
"caching" the compiled result.

The end goal of this change is to make it so sequential build of the
same code base on the buildbot happens super fast,

Blender binary still needs to be re-linked when the AOT of oneAPI
option is toggled, but that's already the case in the buildbot due
to the WITH_BUILDINFO.

Differential Revision: https://developer.blender.org/D16312
2022-10-21 17:17:51 +02:00