Workaround for a compilation issue preventing kernels compiling for AMD GPUs: Avoid problematic use of templates on Metal by making `gpu_parallel_active_index_array` a wrapper macro, and moving `blocksize` to be a macro parameter.
Reviewed By: brecht
Differential Revision: https://developer.blender.org/D14081
* Apple Silicon support enabled on macOS 12.2+
* AMD support enabled on macOS 12.3+
This patch also fixes a device enumeration crash on certain AMD configs which
was caused by over-release of MTLDevice objects.
Differential Revision: https://developer.blender.org/D14090
The rbit instruction is only available starting with ARMv6T2 and
the register prefix is different from what AARCH64 uses.
Separate the 32 and 64 bit ARM branches, add missing ISA checks.
Made sure the code works as intended on macMini with Apple silicon,
and on Raspberry Pi 4 B running 32bit Raspbian OS.
Differential Revision: https://developer.blender.org/D14056
Instead of creating and destroying threads when starting and stopping renders,
keep a single thread alive for the duration of the session. This makes it so all
display driver OpenGL resource allocation and destruction can happen in the same
thread.
This was implemented as part of trying to solve another bug, but it did not
help. Still I prefer this behavior, to eliminate potential future issues wit
graphics drivers or with future Cycles display driver implementations.
Differential Revision: https://developer.blender.org/D14086
For reasons unclear, destroying and then recreating a vertex buffer in the
render OpenGL context is affecting the immediate mode vertex buffer in the
draw manager OpenGL context.
Instead just create a single vertex buffer and use it for the lifetime of
the render OpenGL context. There's not really any need to have a separate
one per tile as far as I can tell.
Differential Revision: https://developer.blender.org/D14084
When using a RGBA16 (`GL_RGBA16`, `DXGI_FORMAT_R16G16B16A16_UNORM`)
swapchain format with Quest 2, no image is presented to the headset.
This can occur when using the SteamVR runtime with an AMD graphics card
(ex. T95374).
Workaround is to move this format after the Quest 2-compatible RGBA16F
formats in the candidates list so that the RGBA16F formats are chosen
instead.
Reviewed By: Severin
Differential Revision: https://developer.blender.org/D14024
Crash was caused since the function pointers
`s_xrGetOpenGLGraphicsRequirementsKHR_fn`/
`s_xrGetD3D11GraphicsRequirementsKHR_fn` were static and were not
updated with the correct proc address after being set the first time.
As stated in the OpenXR spec: "function pointers returned by
xrGetInstanceProcAddr using one XrInstance may not be valid when used
with objects related to a different XrInstance".
Although it would seem reasonable that the proc address would not
change if the instance was the same (hence the `static XrInstance s_instance;`),
in testing, repeated calls to `xrGetInstanceProcAddress()`
with the same instance still can result in changes (at least for the
SteamVR runtime) so the workaround is to simply set the function pointers
every time, essentially trivializing their `static` designations.
Reviewed By: Severin
Maniphest Tasks: T94268
Differential Revision: https://developer.blender.org/D14023
For curve-heavy scenes, memory consumption regressed when we switched from MetalRT to bvh2. Allow users to opt in to MetalRT to workaround this.
Reviewed By: brecht
Differential Revision: https://developer.blender.org/D14071
Disable binary archives on Apple Silicon (issue stems from instancing multiple PSOs from the same binary archive). Pipeline creation still filters through the OS shader cache, mitigating any impact on setup times after the initial render.
Reviewed By: brecht
Differential Revision: https://developer.blender.org/D14072
There are two things achieved by this change:
- No possible downcast of size_t to int when calculating motion steps.
- Disambiguate call to `min()` which was for some reason considered
ambiguous on 32bit platforms `min(int, unsigned int)`.
- Do the same for the `max()` call to keep them symmetrical.
On an implementation side the `min()` is defined for a fixed width
integer type to disambiguate uint from size_t on 32bit platforms,
and yet be able to use it for 32bit operands on 64bit platforms without
upcast.
This ended up in a bit bigger change as the conditional compile-in of
functions is easiest if the functions is templated. Making the functions
templated required to remove the other source of ambiguity which is
`algorithm.h` which was pulling min/max from std.
Now it is the `math.h` which is the source of truth for min/max.
It was only one place which was relying on `algorithm.h` for these
functions, hence the choice of `math.h` as the safest and least
intrusive.
Fixes 32bit platforms (such as i386) in Debian package build system.
Differential Revision: https://developer.blender.org/D14062
There are two things achieved by this change:
- No possible downcast of size_t to int when calculating motion steps.
- Disambiguate call to min() which was for some reason considered
ambiguous on 32bit platforms `min(int, unsigned int)`.
On an implementation side the `min()` is defined for a fixed width
integer type to disambiguate uint from size_t on 32bit platforms,
and yet be able to use it for 32bit operands on 64bit platforms without
upcast.
Fixes 32bit platforms (such as i386) in Debian package build system.
Differential Revision: https://developer.blender.org/D13992
This patch reverts the normal behavior of the spotlights. In the last fix,
the returned normal of a spot light was equal to its direction. This broke
some texturing methods used by artists.
Differential Revision: https://developer.blender.org/D13991
Sometime throughout development some checks got lost during refactor.
This change makes it so that if OIDN is not supported on the current
CPU Cycles will report an error and stop rendering. This behavior is
similar to when an OptiX denoiser is requested and there is no OptiX
compatible device available.
The easiest way to verify this change is to force return false from
the `openimagedenoise_supported()`.
Fixes Cycles part of the T94127.
Differential Revision: https://developer.blender.org/D13944
Remove small ray offsets that were used to avoid self intersection, and leave
that to the newly added primitive object/prim comparison. These changes together
significantly reduce artifacts on small, large or far away objects.
The balance here is that overlapping primitives are not handled well and should
be avoided (though this was already an issue). The upside is that this is
something a user has control over, whereas the other artifacts had no good
manual solution in many cases.
There is a known issue where the Blender particle system generates overlapping
objects and in turn leads to render differences between CPU and GPU. This will
be addressed separately.
Differential Revision: https://developer.blender.org/D12954
This reverts commit 086f191169.
There was apparently a problem using APPEND which wasn't referenced
in the commit log.
Added comment noting the reason for the discrepancy.
The OSL image compilation step needed to be taught about the new UVTILE
format for UDIM textures.
A small missing feature from OIIO[1] means this is a bit uglier than it
needs to be. Once we update to a version of OIIO with the fix we can
remove the string replace part.
[1] 35cb6a83e2
Differential Revision: https://developer.blender.org/D13912
This was already done for APPLE & WIN32, which would
reference these libraries twice.
Now append BROTLI_LIBRARIES to FREETYPE_LIBRARIES when they're
required for linking.
No functional changes as all references to FREETYPE_LIBRARIES also
used BROTLI_LIBRARIES.
Only show options that are valid for the used device (CPU, GPU, Multi).
Note: The panel isn't shown for OPTIX anymore, unless Multi device is used.
Reference: https://developer.blender.org/D13592
Make the Embree RTC_SCENE_FLAG_COMPACT flag optional and enabled per default.
Disabling it makes CPU rendering a bit faster in some scenes at the cost of a higher memory usage.
Barbershop renders about 3% faster, victor about 4% on CPU with compact BVH disabled.
Differential Revision: https://developer.blender.org/D13592
With (center) position, radius and random value outputs.
Eevee does not yet support rendering point clouds, but an untested
implementation of this node was added for when it does.
Ref T92573
The UI team requested adding woff2 support to freetype.
this required a new dependency brotli.
This changes adds brotili to the builder and bumps
freetype to version 2.11.0
As freetype now depends on other libraries, for consistency
all use of ${FREETYPE_LIBRARY} in cmake has been updated to
use ${FREETYPE_LIBRARIES} adjustments have been made in the
windows platform file, all other platforms use cmake's
FindFreeType.cmake which already sets this variable.
reviewed by: brecht
Differential Revision: https://developer.blender.org/D13448
This is not currently working, with an internal compiler error. However
we are currently using BVH2 instead of Metal RT. So this has no effect for
users, it's being committed to avoid the code getting outdated.
Ref T92573, T92212
Differential Revision: https://developer.blender.org/D13632
This patch fixes a correctness issue discovered in the `int4 select(...)` function on Apple Silicon machines, which causes bad bvh2 builds. Although the generated bvh2s give correct renders, the resulting runtime performance is terrible. This fix allows us to switch over to bvh2 on Apple Silicon giving a significant performance uplift for many of the standard benchmarking assets. It also fixes some unit test failures stemming from the use of MetalRT, and trivially enables the new pointcloud primitive.
Ref T92212
Reviewed By: brecht
Maniphest Tasks: T92212
Differential Revision: https://developer.blender.org/D13877