Embree 4.4 introduces an improvement in the Embree GPU
implementation by dropping shared memory usage in favor
of direct controllable memory transfers. This should allow
addressing several problems spotted in Blender regarding
multithreading and memory corruption when BVH and rendering
happen at the same time. However, to implement such
improvements, the API has changed for several functions, and
this commit adopts Blender code to these changes, making Blender
buildable and functional with all existing Embree 4.X
versions, before and after 4.4.
No functional changes in Blender behavior are expected if
using Embree versions below 4.4.
Pull Request: https://projects.blender.org/blender/blender/pulls/139061
* Add GraphicsInteropDevice to check if interop is possible with device
* Rename GraphcisInterop to GraphicsInteropBuffer
* Include display device type and memory size in GraphicsInteropBuffer
* Unnest graphics interop class to make forward declarations possible
Pull Request: https://projects.blender.org/blender/blender/pulls/137363
This commit adds some extra prints to terminal related to oneAPI driver
information in the situation that the driver version is considered
incompatible with the current version of Cycles.
Pull Request: https://projects.blender.org/blender/blender/pulls/137272
Instead of relying on the Intel extensions that may not be implemented,
we can use max_work_group_size until there is a better alternative.
Thanks to Codeplay for this proposal.
Co-authored-by: Georgi Mirazchiyski <georgi.mirazchiyski@codeplay.com>
This API is not properly implemented in other SYCL backends at the
moment and we don't want it to fail at runtime, so we conservatively
enable it only for Level-Zero.
Instead of returning 0 in case the Intel extension for getting the count
of Execution Units isn't available, we now use
sycl::info::device::max_compute_units.
We keep using the Intel extension in priority since it logically goes
with sycl::ext::intel::info::device::gpu_hw_threads_per_eu used in
get_max_num_threads_per_multiprocessor(), for which there is no
sycl::info::device::max_threads_per_compute_unit replacement yet.
Rewrite the ONEAPI Blender texture allocation code to make use of
1D images backed by linear USM memory. This increases parity
with the CUDA implementation and sets the ground work for enabling
host USM allocations in Blender. By enabling this functionality,
previously failing benchmarks are now passing.
Together with the previous commit, no functional changes are expected.
* Do oneAPI copy optimization as part of host memory alloc and free, so
it is properly released before host memory is freed.
* Synchronize after loading texture info, like CUDA and HIP.
https://projects.blender.org/blender/blender/pulls/134412
This may be used for device to do host memory allocation in a way that
is more efficient for copy the host memory to the device.
Also rename and group device memory allocation functions for clarity.
Pull Request: https://projects.blender.org/blender/blender/pulls/134412
The current usage of software-based texture operations in
the oneAPI implementation puts additional register pressure on
the GPU compiler during register allocation. And it also creates
code that requires maintenance. This commit is intended to address
this situation by utilizing a recently productized SYCL bindless
texture API to enable HW-based texture operations using
Intel GPUs' hardware sampler.
This currently translates to 1-11% rendering speedups (scene-specific)
on my Arc A770 and Arc B580. At the moment, there are small
performance regressions with NanoVDB texture operations on Arc B580
and small performance regressions in shade surface MNEE and Raytrace
kernels on Arc A770, but they look recoverable and will be handled
in the future.
Pull Request: https://projects.blender.org/blender/blender/pulls/133457
If one of the devices already used host happed memory but another not,
it would previously realloc both.
Thanks to Jorn Visser for investigating and finding this problem.
Pull Request: https://projects.blender.org/blender/blender/pulls/132912
Should be a bit more efficient, and it fixes host memory fallback bugs,
where host memory was incorrectly freed during re-copy. For the case
where memory should get reallocated on the host, a new mem_move_to_host
was added.
Thanks to Jorn Visser for investigating and finding this problem.
Pull Request: https://projects.blender.org/blender/blender/pulls/132912
Check was misc-const-correctness, combined with readability-isolate-declaration
as suggested by the docs.
Temporarily clang-format "QualifierAlignment: Left" was used to get consistency
with the prevailing order of keywords.
Pull Request: https://projects.blender.org/blender/blender/pulls/132361
* Use .empty() and .data()
* Use nullptr instead of 0
* No else after return
* Simple class member initialization
* Add override for virtual methods
* Include C++ instead of C headers
* Remove some unused includes
* Use default constructors
* Always use braces
* Consistent names in definition and declaration
* Change typedef to using
Pull Request: https://projects.blender.org/blender/blender/pulls/132361
Previously, in case of a failure during BVH transfer, when running out
of memory for example, we could get an error such as "BVH failed to
migrate to the GPU due to Embree library error (no error)", because
embree error status was actually reset before being queried.
This commit fixes its propagation.
Pull Request: https://projects.blender.org/blender/blender/pulls/129022
IGC 1.0.17384, ocloc 24.31.30508, which:
- add support for Battlemage and Lunar Lake GPUs
- recover from recent performance regression on Linux
- allow to drop older work-around
(9d5164d472) and need for a patched
version on Windows
- ocloc now needs "dg2,mtl" naming for fat binaries.
opencl-clang patches don't get applied anymore by igc build scripts
when llvm is not a git repository, hence I could also drop we can drop
current patch disabling patching.
I've only slightly pushed min-driver-version updates after carefull
testing, instead of jumping to the same version as ocloc as we use to.
Pull Request: https://projects.blender.org/blender/blender/pulls/127251
This new version of the graphics compiler solves a performance
regression on Arc, adds support for Battlemage and Lunar Lake GPUs, and
allows to drop older patch to build fat binaries with broad
compatibility.
This latter change requires using -device dg2,mtl naming instead of
passing architecture ids.
Pull Request: https://projects.blender.org/blender/blender/pulls/127371
Embree device pointer can end up being nullptr even when Embree on GPU is
expected to be used. Previous implementation overlooked this possibility,
leading to a completely silent fallback to the non-Hardware ray-tracing path,
this commit fixes it. We've noticed this as now Embree relies on a driver
component: https://github.com/intel/level-zero-raytracing-support that can
potentially be missing from a system.
Pull Request: https://projects.blender.org/blender/blender/pulls/124085
SYCL runtime currently relies on an internal driver behavior that will
break the driver version string returned by SYCL if it changes:
https://github.com/oneapi-src/unified-runtime/issues/1777
This will be fixed at SYCL runtime level but until we use a new enough
one, we need to add additional verifications to avoid blocking execution
on a driver that will change this internal behavior.
Pull Request: https://projects.blender.org/blender/blender/pulls/124084
Since #118841 there are more cases where Cycles would check for the
graphics interop support. This could lead to a crash when graphics
interop functions are called without having active graphics context.
This change makes it so there is no graphics interop calls when doing
headless render. In order to achieve this the device creation is now
aware of the headless mode.
Pull Request: https://projects.blender.org/blender/blender/pulls/122844
This enables scenes with all textures not fitting in GPU
memory to finally render. For scenes that are fitting,
no functional change or performance change is expected.
Pull Request: https://projects.blender.org/blender/blender/pulls/122385
This enables the new lazy module loading behavior introduced in OIDN 2.3,
without breaking compatibility with older versions of OIDN (using separate
code paths).
Also, the detection of OIDN support for devices is now much cleaner, and
devices do not need to be matched by PCI address or device name anymore.
Pull Request: https://projects.blender.org/blender/blender/pulls/121362
With drivers 101.4972 to 101.5085, some Arc and Meteor Lake devices
ignore the prebuilt GPU binaries and since the addition of Meteor Lake
binaries, fail caching newly generated ones on Windows.
This got fixed in drivers 101.5186 so it's preferable to require these
new drivers to be used.
NDEBUG is part of the C standard and disables asserts. Only this will
now be used to decide if asserts are enabled.
DEBUG was a Blender specific define, that has now been removed.
_DEBUG is a Visual Studio define for builds in Debug configuration.
Blender defines this for all platforms. This is still used in a few
places in the draw code, and in external libraries Bullet and Mantaflow.
Pull Request: https://projects.blender.org/blender/blender/pulls/115774
In order to speedup compilation, we upgrade IGC to 1.0.14828.26 along
with ocloc and the associated dependencies.
We also bump min-driver version accordingly to 26918.
Ref !114341
The first public Windows driver version with a higher number is
101.4824, so we bump the min-required driver version on Windows to this
one to ensure compatibility.