Some OpenXR platforms do not support OpenGL or Vulkan. To support these
platforms we use a bridge. Blender still renders in OpenGL/Vulkan, but
will copy the render result into a D3D11 swapchain.
OpenGL doesn this by importing the D3D11 swapchain into the OpenGL
context and perfor OpenGL calls to update the swapchain. However for
vulkan that could lead to construct 3 context for OpenXR
- Blender GPU Context
- OpenXR D3D Context
- New context that imports the Blender render result and the OpenXR
Swapchain image and copies them.
Due to Direct3D limitations importing into a vulkan context has known
issues (driver + extensions). Secondly we are not sure if we are running
on the same device as the OpenXR swapchain. The solution provided with
this PR is to only support CPU data transfers.
**SteamVR using d3d bridge**
SteamVR normally would use the Vulkan binding. But by changing the binding
priority in code you can make it select the D3D bridge.
<img width="1518" alt="Screenshot 2025-04-10 114534.png" src="attachments/f856bb2b-9ad5-4bb2-9cfd-a1412da9edd1">
It has been tested and validated to work using Mixed reality portal as well.
Pull Request: https://projects.blender.org/blender/blender/pulls/137264
Current Windows on ARM GPUs don't support
extenral memory. External memory is required
for OpenXR. So most likely OpenXR will not work
on these devices.
Most (read all) OpenXR platforms that support
vulkan also require external memory. So might
just be that those platforms won't work at all
on these devices.
In any case when not supported, the GHOST
OpenXR platform will use CPU for data transfer.
The goal is to reduce the affect of the fmod() used in the noise code,
which was initially reported in the comment:
https://projects.blender.org/blender/blender/pulls/119884#issuecomment-1258902
Basic idea is to benefit from SIMD vectorization on CPU.
Tested on Linux i9-11900K and macOS on M2 Ultra, in both cases performance
after this change is very close to what it could be with the fmod() commented
out (the call itself, `p = p + precision_correction`).
On macOS the penalty of fmod() was about 10%, on Linux it was closer to 30%
when built with GCC-13. With Linux builds from the buildbot it is more like 18%.
The optimization is only done for 3d and 4d noise. It might be possible to
gain some performance improvement for 1d and 2d cases, but the approach would
need to be different: we'd need to optimize scalar version fmodf(). Maybe
tricks with integer cast will be faster (since we are a bit optimistic in the
kernel and do not guarantee exact behavior in extreme cases such as NaN inputs).
Pull Request: https://projects.blender.org/blender/blender/pulls/137109
This PR add support to use a win32 handle to perform share render
result with the OpenXR vulkan instance. This is only possible when
the GPU matches. Otherwise a CPU roundtrip will be performed.
Pull Request: https://projects.blender.org/blender/blender/pulls/137093
Commit be63ebd961
This is causing issues with CUDA kernel compilation in some setups, even
though the builbot is ok. Since this isn't yet working for oneAPI anyway,
revert all the changes to Cycles kernel compilation for now.
Includes win32 specific extensions definitions when including
`vk_common.hh`. Inside `gpu_context.cc` vulkan needs to be
included before opengl, otherwise windows 10 builders will
report a warning.
```
[6421/7520] Building CXX object source\blender\gpu\CMakeFiles\bf_gpu.dir\intern\gpu_context.cc.obj
C:\Program Files (x86)\Windows Kits\10\include\10.0.19041.0\shared\minwindef.h(130): warning C4005: 'APIENTRY': macro redefinition
C:\Users\blender\git\blender-vexp\blender.git\lib\windows_x64\epoxy\include\epoxy/gl.h(59): note: see previous definition of 'APIENTRY'
```
Pull Request: https://projects.blender.org/blender/blender/pulls/137134
In the OpenXR/Vulkan specs it is mentioned that the swapchain layout
should be set to optimal color attachment. This wasn't the case and
could lead to validation errors.
Also fixes a memory size validation error. Not concerning as it was
abount already backed memory.
Pull Request: https://projects.blender.org/blender/blender/pulls/137143
This only worked for Ubuntu's discontinued Unity desktop.
Even though `libunity.so` can be used outside of Unity,
it's no longer actively developed.
Ref: !137128
When using SteamVR without starting SteamVR, the physical device cannot
be found. This isn't reported to the user nicely. This change will
tell the user why OpenXR couldn't be started.
Note that the physical device can be created when SteamVR has already
launched. OpenGL will start SteamVR automatically, but seems Vulkan
isn't able to do that.
Remove full-screen support from GHOST API's.
Note that this only had back-end implements for X11 and WIN32.
This was last used for the Game Engine to run games full-screen,
removing as it's unused and it doesn't seem likely to be used in the
future.
This doesn't impact making Blender full-screen from the window menu
which uses a window decoration setting.
Ref: !137050
Use VERBATIM to ensure spaces inside command line arguments don't get
escaped automatically.
On Linux and Windows the oneAPI kernel compilation still has problems.
There is an apparent bug with single quote escaping in add_custom_command
which means it's not easy to use VERBATIM.
At the moment MNEE locks up Cycles, or has rendering artifacts on
RDNA4 GPUs on WIndows.
This commit disables MNEE on that configuration until a fix
is avaliable.
Pull Request: https://projects.blender.org/blender/blender/pulls/136980
This fixes most "One Definition Rule" violations inside blender proper
resulting from duplicate structures of the same name. The fixes were
made similar to that of !135491. See also #120444 for how this has come
up in the past.
These were found by using the following compile options:
-flto=4 -Werror=odr -Werror=lto-type-mismatch -Werror=strict-aliasing
Note: There are still various ODR issues remaining that require
more / different fixes than what was done here.
Pull Request: https://projects.blender.org/blender/blender/pulls/136371
Implements a crash dialog for Windows.
The crash popup provides the following actions:
- Restart: reopen Blender from the last saved or auto-saved time
- Report a Bug: forward to Blender bug tracker
- View Crash Log: open the .txt file with the crash log
- Close: Closes without any further action
Pull Request: https://projects.blender.org/blender/blender/pulls/129974
Track the (per-thread) active `GHOST_Context`.
This is required to restore the active context after creating a new one
if the current context is unknown.
(Required for creating GPU contexts in the GPU module without depending
on the WM module)
Pull Request: https://projects.blender.org/blender/blender/pulls/136992
Current implementation uses a CPU roundtrip to transfer render result
to the Xr Swapchain. This PR adds support for sharing the render result
on Linux systems by using file descriptors.
To extend this solution to win32 or dx handles can be done by extending
the data transfer modes, register the correct extensions. When not
using the same GPU between Blender and OpenXR the CPU roundtrip
will still be used.
Solution has been validated with monado simulator and seems to be as
fast as OpenGL.
Performance can be improved by using GPU based synchronization.
Current API is limited as we cannot chain the different renders and
swapchains.
Pull Request: https://projects.blender.org/blender/blender/pulls/136933
Currently MetalRT interpolates transformation matrix on per-element basis
which leads to issues like #135659.
This change adds implementation of for decomposed (Scale/Rotate/Translate)
motion interpolation, matching behavior of BVH2 and other HW-RT.
This requires macOS 15 and Xcode 16 in order to use this interpolation.
On older platforms and compilers old interpolation is used.
Currently there is no changes on the user (by default) and it is only
available via CYCLES_METALRT_PCMI environment variable. This is because
there are some issues with complex motion paths that need to be looked
into. Having code available makes it easier to do further debugging.
Ref #135659
Authored by Emma Liu
Pull Request: https://projects.blender.org/blender/blender/pulls/136253
The current logic disables WITH_CYCLES_ONEAPI_BINARIES when ocloc is not
found, which is fine, but prevented building for other non-Intel SYCL
targets without (unnecessary) ocloc.
The fix here is to remove spir64_gen target when
WITH_CYCLES_ONEAPI_BINARIES is disabled, instead of forcing only spir64.
Improve the number of images that are requested in the swapchain. For
mailbox presentation mode we select tripple buffering. For V-Sync
presentation mode we select double buffering.
In future we might want to make the presentation mode and swapchain
images dynamic based on the actual action that the user is performing.
When doing action that benefit from low latency (paint cursor) we should
select mailbox, otherwise we can use fifo to reduce CPU/GPU usage and
safe some energy usage along the way. See #136923 for design task.
Pull Request: https://projects.blender.org/blender/blender/pulls/136922
It's better for performance to use a single thread pool for all areas of
Blender, and this gets us closer to that.
Bullet, Quadriflow, Mantaflow and Ceres still contain OpenMP code, but it
was already disabled.
On macOS, our OpenMP libraries are no longer compatible with the latest
Xcode 16.3. By removing OpenMP we no longer have to solve that problem.
OpenMP was disabled for bpy module builds on Windows ARM64, which also no
longer needs to be solved.
Pull Request: https://projects.blender.org/blender/blender/pulls/136865
Only the parallel sparse matrix code was updated. This is used by e.g.
LSCM and ABF unwrap, and performance seems about the same or better.
Parallel GEMM (dense matrix-matrix multiplication) is used by libmv,
for example in libmv_keyframe_selection_test for a 54 x 54 matrix.
However it appears to harm performance, removing parallelization makes
that test run 5x faster on a Apple M3 Max.
There has been no new Eigen release since 2021, however there is active
development in master and it includes support for a C++ thread pool for
GEMM. So we could upgrade, but the algorithm remains the same and
looking at the implementation it just does not seem designed for modern
many core CPUs. Unless the matrix is much larger, there's too much thread
synchronization overhead. So it does not seem useful to enable that
thread pool for us.
Pull Request: https://projects.blender.org/blender/blender/pulls/136865
To make adding a dependeny on TBB easier.
Additional changes:
* Using LIB for libmv tests, as it now brings in includes
* Removing Eigen header listing in iTaSC
Pull Request: https://projects.blender.org/blender/blender/pulls/136865
The goal is to be able to switch away from OpenMP by default.
In order to achieve this a new parallel_for() function is added,
which follows the same declaration as the one from TBB. For now
the simplest formulation is used where range is provided by start
and last indices (the last one is excluded from the range).
The side effect is that Libmv now expects the threading limits
to be imposed by the default thread area (or have an explicit
thread area with the limit in the caller). In a way this is
similar to other external libraries.
Pull Request: https://projects.blender.org/blender/blender/pulls/136834
MacOS is currently using one of our custom "hourglass" cursors during
busy times. This PR changes it to an OS-supplied cursor, a pointer
arrow with an animated blue spinner, meant for this purpose. Although
undocumented it has been used by many applications for many years.
Including Firefox for 11 years now.
Pull Request: https://projects.blender.org/blender/blender/pulls/136735
Reduce the register pressure and branching in the switch() by using
subclass and cast from void* to the base class.
This ensures intersection functions are not inlined multiple times,
bringing performance back.
Alternative could be to avoid functions (they are quite large) but
that only partially resolves the performance regression.
Pull Request: https://projects.blender.org/blender/blender/pulls/136823
This PR refactors the way how swapchains are used.
Allow scaling of the swapchain content to the actual resolution of the swapchain.
can reduce artefacts when resizing windows when supported.
When frame rate is to fast the previous implementation could use a semaphore
that were still in use, leading to unwanted stuttering on certain platforms. Waiting
when the rendering has finished (GHOST_Frame.submission_fence), before the
next image is acquired from the swap chain.
Mailbox has been disabled as it can calculate more frames then actually been
presented, leading to a lag and increased power usage on others.
Pull Request: https://projects.blender.org/blender/blender/pulls/136603
Instead of relying on the Intel extensions that may not be implemented,
we can use max_work_group_size until there is a better alternative.
Thanks to Codeplay for this proposal.
Co-authored-by: Georgi Mirazchiyski <georgi.mirazchiyski@codeplay.com>