This patch adds MetalRT support to Cycles kernel code. It is mostly additive in nature or confined to Metal-specific code, however there are a few areas where this interacts with other code:
- MetalRT closely follows the Optix implementation, and in some cases (notably handling of transforms) it makes sense to extend Optix special-casing to MetalRT. For these generalisations we now have `__KERNEL_GPU_RAYTRACING__` instead of `__KERNEL_OPTIX__`.
- MetalRT doesn't support primitive offsetting (as with `primitiveIndexOffset` in Optix), so we define and populate a new kernel texture, `__object_prim_offset`, containing per-object primitive / curve-segment offsets. This is referenced and applied in MetalRT intersection handlers.
- Two new BVH layout enum values have been added: `BVH_LAYOUT_METAL` and `BVH_LAYOUT_MULTI_METAL_EMBREE` for XPU mode). Some host-side enum case handling has been updated where it is trivial to do so.
Ref T92212
Reviewed By: brecht
Maniphest Tasks: T92212
Differential Revision: https://developer.blender.org/D13353
This patch adds new arg-type parameters to `DeviceQueue::enqueue` and its overrides. This is in preparation for the Metal backend which needs this information for correct argument encoding.
Ref T92212
Reviewed By: brecht
Maniphest Tasks: T92212
Differential Revision: https://developer.blender.org/D13357
This patch fixes an address space mismatch in the film convert kernels on Metal. The `film_get_pass_pixel_...` functions take a `ccl_private` result pointer, but the film convert kernels pass a `ccl_global` memory pointer. Specialising the pass-fetch functions with templates results in compilation errors on Visual Studio, so instead this patch just adds an intermediate local on Metal.
Reviewed By: brecht
Differential Revision: https://developer.blender.org/D13350
Was happening during rendering, causing visual artifacts when doing
CPU+GPU rendering, and giving different in-progress results on different
devices.
The root of the issue comes to the fact that math used in the approximate
shadow catcher calculation might have resulted in negative alpha channel,
and negative values for display are handled differently on CPU and GPU.
Such difference in handling is caused by an approximate conversion used on
the CPU for the performance reasons.
This change makes it so no negative alpha is generated by the approximate
shadow catcher. Not sure if we need some explicit clamping somewhere to
deal with possible negative values coming from somewhere else.
The shadow catcher cornell box tests are to be updated for the new code,
but the new result seems to be more accurate.
Differential Revision: https://developer.blender.org/D13354
Noticed when was looking into T93155. Steps to reproduce:
- Open the .blend file from the report
- Hit F12 to start rendering
- After some tiles were rendered hit Esc
The issue is caused by "sticky" cancel reported via Progress. This means
that once user hit Esc all further requests for cancel state will return
truth, which was preventing OIDN denoiser from completing the denoising
task.
Now only allow stopping the denoiser when interactive rendering requests
a very fast stopping.
Aiming the fix for 3.0 branch.
Differential Revision: https://developer.blender.org/D13352
With the current code in master, scrambling distance is enabled on non-hardware accelerated ray tracing devices see a measurable performance decrease when compared scrambling distance on vs off. From testing, this performance decrease comes from the large tile sizes scheduled in `tile.cpp`.
This patch attempts to address the performance decrease by using different algorithms to calculate the tile size for devices with hardware accelerated ray traversal and devices without. Large tile sizes for hardware accelerated devices and small tile sizes for others.
Most of this code is based on proposals from @brecht and @leesonw
Reviewed By: brecht, leesonw
Differential Revision: https://developer.blender.org/D13042
Some enum names were changed/removed in OptiX 7.4, so some changes are necessary to
make things compile still.
In addition, OptiX 7.4 also adds built-in support for catmull-rom curves, so it is no longer
necessary to convert the catmull-rom data to cubic bsplines first, and has endcaps disabled
by default now, so can remove the special handling via any-hit programs that filtered them
out before.
Differential Revision: https://developer.blender.org/D13351
21.Q4 is required, older version should not show devices in the preferences.
This adds a check for the file version of amdhip64.dll file during hipew
initialization.
Differential Revision: https://developer.blender.org/D13324
Use the correct device function (hipDeviceGet) for multi GPU setups, instead
of hipGetDevice which just returns the default device.
Differential Revision: https://developer.blender.org/D13323
BVH2 triangle intersection was broken on the GPU since packed floats can't
be loaded directly into SSE. The better long term solution for performance
would be to build a BVH2 for GPU and Embree for CPU, similar to what we do
for OptiX.
Fix problem with duplicated initial character when initiating or
switching to new windows. This is done by updating our copies of state
and modes from the new window when it receives WM_IME_SETCONTEXT
message. This problem and fix are only for the Windows platform.
* Rename "Auto Tiles" to "Use Tiling", it's not really automatic and
confusing with the old auto tile size add-on.
* Rename "Adaptive" scrambling distance to "Automatic", to avoid confusion
with adaptive sampling.
Fix problem with duplicated initial character when initiating or
switching to new windows. This is done by updating our copies of state
and modes from the new window when it receives WM_IME_SETCONTEXT
message. This problem and fix are only for the Windows platform.
* Rename "Auto Tiles" to "Use Tiling", it's not really automatic and
confusing with the old auto tile size add-on.
* Rename "Adaptive" scrambling distance to "Automatic", to avoid confusion
with adaptive sampling.
Happens when device runs out of memory and Cycles is moving some
textures to the host memory.
The delayed memory free for OptiX BVH was moving data from one
device_memory to another, leaving the original device memory in
an invalid state. This was ruining the allocation map in the CUDA
device which is using pointer to the device_memory.
This change makes it so the memory pointer is stolen from BVH
into the delayed memory free list.
Additionally, forbid copying and moving instances of device_memory
and added sanity checks in the device implementation.
Differential Revision: https://developer.blender.org/D13316
This fixes the the app crash happening when trying to render smoke as a dense
3D texture. The changes are related to matching up hipew with the actual HIP
headers.
Differential Revision: https://developer.blender.org/D13296
With very long ray distance, OptiX ends up traversing many BVH nodes due to
a feature that improves precision. However this causes very slow rendering.
We now avoid generating such long rays by rejecting the few samples that have
long ray distances and very low probability of being generated. This should not
meaningfully affect render results.
Thanks to Sergey and Patrick for the investigation.