2023-08-04 13:24:17 +10:00
|
|
|
/* SPDX-FileCopyrightText: 2021-2022 Blender Foundation
|
|
|
|
|
*
|
|
|
|
|
* SPDX-License-Identifier: Apache-2.0 */
|
Cycles: Adapt shared kernel/device/gpu layer for MSL
This patch adapts the shared kernel entrypoints so that they can be compiled as MSL (Metal Shading Language). Where possible, the adaptations avoid changes in common code.
In MSL, kernel function inputs are explicitly bound to resources. In the case of argument buffers, we declare a struct containing the kernel arguments, accessible via device pointer. This differs from CUDA and HIP where kernel function arguments are declared as traditional C-style function parameters. This patch adapts the entrypoints declared in kernel.h so that they can be translated via a new `ccl_gpu_kernel_signature` macro into the required parameter struct + kernel entrypoint pairing for MSL.
MSL buffer attribution must be applied to function parameters or non-static class data members. To allow universal access to the integrator state, kernel data, and texture fetch adapters, we wrap all of the shared kernel code in a `MetalKernelContext` class. This is achieved by bracketing the appropriate kernel headers with "context_begin.h" and "context_end.h" on Metal. When calling deeper into the kernel code, we must reference the context class (e.g. `context.integrator_init_from_camera`). This extra prefixing is performed by a set of defines in "context_end.h". These will require explicit maintenance if entrypoints change. We invite discussion on more maintainable ways to enforce correctness.
Lambda expressions are not supported on MSL, so a new `ccl_gpu_kernel_lambda` macro generates an inline function object and optionally capturing any required state. This yields the same behaviour. This approach is applied to all parallel_... implementations which are templated by operation. The lambda expressions in the film_convert... kernels don't adapt cleanly to use function objects. However, these entrypoints can be macro-generated more concisely to avoid lambda expressions entirely, instead relying on constant folding to handle the pixel/channel conversions.
A separate implementation of `gpu_parallel_active_index_array` is provided for Metal to workaround some subtle differences in SIMD width, and also to encapsulate some required thread parameters which must be declared as explicit entrypoint function parameters.
Ref T92212
Reviewed By: brecht
Maniphest Tasks: T92212
Differential Revision: https://developer.blender.org/D13109
2021-11-09 21:30:46 +00:00
|
|
|
|
2022-07-25 21:16:34 +02:00
|
|
|
/* Metal kernel entry points. */
|
Cycles: Adapt shared kernel/device/gpu layer for MSL
This patch adapts the shared kernel entrypoints so that they can be compiled as MSL (Metal Shading Language). Where possible, the adaptations avoid changes in common code.
In MSL, kernel function inputs are explicitly bound to resources. In the case of argument buffers, we declare a struct containing the kernel arguments, accessible via device pointer. This differs from CUDA and HIP where kernel function arguments are declared as traditional C-style function parameters. This patch adapts the entrypoints declared in kernel.h so that they can be translated via a new `ccl_gpu_kernel_signature` macro into the required parameter struct + kernel entrypoint pairing for MSL.
MSL buffer attribution must be applied to function parameters or non-static class data members. To allow universal access to the integrator state, kernel data, and texture fetch adapters, we wrap all of the shared kernel code in a `MetalKernelContext` class. This is achieved by bracketing the appropriate kernel headers with "context_begin.h" and "context_end.h" on Metal. When calling deeper into the kernel code, we must reference the context class (e.g. `context.integrator_init_from_camera`). This extra prefixing is performed by a set of defines in "context_end.h". These will require explicit maintenance if entrypoints change. We invite discussion on more maintainable ways to enforce correctness.
Lambda expressions are not supported on MSL, so a new `ccl_gpu_kernel_lambda` macro generates an inline function object and optionally capturing any required state. This yields the same behaviour. This approach is applied to all parallel_... implementations which are templated by operation. The lambda expressions in the film_convert... kernels don't adapt cleanly to use function objects. However, these entrypoints can be macro-generated more concisely to avoid lambda expressions entirely, instead relying on constant folding to handle the pixel/channel conversions.
A separate implementation of `gpu_parallel_active_index_array` is provided for Metal to workaround some subtle differences in SIMD width, and also to encapsulate some required thread parameters which must be declared as explicit entrypoint function parameters.
Ref T92212
Reviewed By: brecht
Maniphest Tasks: T92212
Differential Revision: https://developer.blender.org/D13109
2021-11-09 21:30:46 +00:00
|
|
|
|
2023-09-04 16:42:27 +02:00
|
|
|
/* NOTE: Must come prior to other includes. */
|
Cycles: Adapt shared kernel/device/gpu layer for MSL
This patch adapts the shared kernel entrypoints so that they can be compiled as MSL (Metal Shading Language). Where possible, the adaptations avoid changes in common code.
In MSL, kernel function inputs are explicitly bound to resources. In the case of argument buffers, we declare a struct containing the kernel arguments, accessible via device pointer. This differs from CUDA and HIP where kernel function arguments are declared as traditional C-style function parameters. This patch adapts the entrypoints declared in kernel.h so that they can be translated via a new `ccl_gpu_kernel_signature` macro into the required parameter struct + kernel entrypoint pairing for MSL.
MSL buffer attribution must be applied to function parameters or non-static class data members. To allow universal access to the integrator state, kernel data, and texture fetch adapters, we wrap all of the shared kernel code in a `MetalKernelContext` class. This is achieved by bracketing the appropriate kernel headers with "context_begin.h" and "context_end.h" on Metal. When calling deeper into the kernel code, we must reference the context class (e.g. `context.integrator_init_from_camera`). This extra prefixing is performed by a set of defines in "context_end.h". These will require explicit maintenance if entrypoints change. We invite discussion on more maintainable ways to enforce correctness.
Lambda expressions are not supported on MSL, so a new `ccl_gpu_kernel_lambda` macro generates an inline function object and optionally capturing any required state. This yields the same behaviour. This approach is applied to all parallel_... implementations which are templated by operation. The lambda expressions in the film_convert... kernels don't adapt cleanly to use function objects. However, these entrypoints can be macro-generated more concisely to avoid lambda expressions entirely, instead relying on constant folding to handle the pixel/channel conversions.
A separate implementation of `gpu_parallel_active_index_array` is provided for Metal to workaround some subtle differences in SIMD width, and also to encapsulate some required thread parameters which must be declared as explicit entrypoint function parameters.
Ref T92212
Reviewed By: brecht
Maniphest Tasks: T92212
Differential Revision: https://developer.blender.org/D13109
2021-11-09 21:30:46 +00:00
|
|
|
#include "kernel/device/metal/compat.h"
|
|
|
|
|
#include "kernel/device/metal/globals.h"
|
2023-09-04 16:42:27 +02:00
|
|
|
|
|
|
|
|
/* NOTE: Must come prior to the kernel.h. */
|
2022-07-12 15:32:46 +02:00
|
|
|
#include "kernel/device/metal/function_constants.h"
|
2023-09-04 16:42:27 +02:00
|
|
|
|
Cycles: Adapt shared kernel/device/gpu layer for MSL
This patch adapts the shared kernel entrypoints so that they can be compiled as MSL (Metal Shading Language). Where possible, the adaptations avoid changes in common code.
In MSL, kernel function inputs are explicitly bound to resources. In the case of argument buffers, we declare a struct containing the kernel arguments, accessible via device pointer. This differs from CUDA and HIP where kernel function arguments are declared as traditional C-style function parameters. This patch adapts the entrypoints declared in kernel.h so that they can be translated via a new `ccl_gpu_kernel_signature` macro into the required parameter struct + kernel entrypoint pairing for MSL.
MSL buffer attribution must be applied to function parameters or non-static class data members. To allow universal access to the integrator state, kernel data, and texture fetch adapters, we wrap all of the shared kernel code in a `MetalKernelContext` class. This is achieved by bracketing the appropriate kernel headers with "context_begin.h" and "context_end.h" on Metal. When calling deeper into the kernel code, we must reference the context class (e.g. `context.integrator_init_from_camera`). This extra prefixing is performed by a set of defines in "context_end.h". These will require explicit maintenance if entrypoints change. We invite discussion on more maintainable ways to enforce correctness.
Lambda expressions are not supported on MSL, so a new `ccl_gpu_kernel_lambda` macro generates an inline function object and optionally capturing any required state. This yields the same behaviour. This approach is applied to all parallel_... implementations which are templated by operation. The lambda expressions in the film_convert... kernels don't adapt cleanly to use function objects. However, these entrypoints can be macro-generated more concisely to avoid lambda expressions entirely, instead relying on constant folding to handle the pixel/channel conversions.
A separate implementation of `gpu_parallel_active_index_array` is provided for Metal to workaround some subtle differences in SIMD width, and also to encapsulate some required thread parameters which must be declared as explicit entrypoint function parameters.
Ref T92212
Reviewed By: brecht
Maniphest Tasks: T92212
Differential Revision: https://developer.blender.org/D13109
2021-11-09 21:30:46 +00:00
|
|
|
#include "kernel/device/gpu/kernel.h"
|
2022-07-25 21:16:34 +02:00
|
|
|
|
|
|
|
|
/* MetalRT intersection handlers. */
|
|
|
|
|
|
|
|
|
|
#ifdef __METALRT__
|
|
|
|
|
|
|
|
|
|
/* Intersection return types. */
|
|
|
|
|
|
|
|
|
|
/* For a bounding box intersection function. */
|
|
|
|
|
struct BoundingBoxIntersectionResult {
|
|
|
|
|
bool accept [[accept_intersection]];
|
|
|
|
|
bool continue_search [[continue_search]];
|
|
|
|
|
float distance [[distance]];
|
|
|
|
|
};
|
|
|
|
|
|
2023-09-13 16:02:49 +02:00
|
|
|
/* For a primitive intersection function. */
|
|
|
|
|
struct PrimitiveIntersectionResult {
|
2022-07-25 21:16:34 +02:00
|
|
|
bool accept [[accept_intersection]];
|
|
|
|
|
bool continue_search [[continue_search]];
|
|
|
|
|
};
|
|
|
|
|
|
2023-09-13 16:02:49 +02:00
|
|
|
enum { METALRT_HIT_TRIANGLE, METALRT_HIT_CURVE, METALRT_HIT_BOUNDING_BOX };
|
2022-07-25 21:16:34 +02:00
|
|
|
|
|
|
|
|
/* Hit functions. */
|
|
|
|
|
|
Cycles: MetalRT optimisations (scene_intersect_shadow + random_walk)
This PR contains optimisations and a general tidy-up of the MetalRT backend.
- Currently `scene_intersect` is used for both normal and (opaque) shadow rays, however the usage patterns are different enough to warrant specialisation. Shadow intersection tests (flagged with `PATH_RAY_SHADOW_OPAQUE`) only need a bool result, but need a larger "self" payload in order to exclude hits against target lights. By specialising we can minimise the payload size in each case (which is helps performance) and avoid some dynamic branching. This PR introduces a new `scene_intersect_shadow` function which is specialised in Metal, and currently redirects to `scene_intersect` in the other backends.
- Currently `scene_intersect_local` is implemented for worst-case payload requirements as demanded by `subsurface_disk` (where `max_hits` is 4). The random_walk case only demands 1 hit result which we can retrieve directly from the intersector object (rather than stashing it in the payload). By specialising, we significantly reduce the payload size for random_walk queries, which has a big impact on performance. Additionally, we only need to use a custom intersection function for the first ray test in a random walk (for self-primitive filtering), so this PR forces faster `opaque` intersection testing for all but the first random walk test.
- Currently `scene_intersect_volume` has a lot of redundant code to handle non-triangle primitives despite volumes only being enclosed by trimeshes. This PR removes this code.
Additionally, this PR tidies up the convoluted intersection function linking code, removes some redundant intersection handlers, and uses more consistent naming of intersection functions.
On a M3 MacBook Pro, these changes give 2-3% performance increase on typical scenes with opaque trimesh materials (e.g. barbershop, classroom junkshop), but can give over 15% performance increase for certain scenes using random walk SSS (e.g. monster).
Pull Request: https://projects.blender.org/blender/blender/pulls/121397
2024-05-10 16:38:02 +02:00
|
|
|
[[intersection(triangle, triangle_data, curve_data)]] PrimitiveIntersectionResult
|
|
|
|
|
__intersection__local_tri_single_hit(
|
|
|
|
|
ray_data MetalKernelContext::MetalRTIntersectionLocalPayload_single_hit &payload [[payload]],
|
|
|
|
|
uint primitive_id [[primitive_id]])
|
|
|
|
|
{
|
|
|
|
|
PrimitiveIntersectionResult result;
|
|
|
|
|
result.continue_search = true;
|
|
|
|
|
result.accept = (payload.self_prim != primitive_id);
|
|
|
|
|
return result;
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
[[intersection(triangle,
|
|
|
|
|
triangle_data,
|
|
|
|
|
curve_data,
|
|
|
|
|
METALRT_TAGS,
|
|
|
|
|
extended_limits)]] PrimitiveIntersectionResult
|
|
|
|
|
__intersection__local_tri_single_hit_mblur(
|
|
|
|
|
ray_data MetalKernelContext::MetalRTIntersectionLocalPayload_single_hit &payload [[payload]],
|
2024-12-03 20:24:36 +01:00
|
|
|
# if defined(__METALRT_MOTION__)
|
|
|
|
|
uint object [[instance_id]],
|
|
|
|
|
# endif
|
Cycles: MetalRT optimisations (scene_intersect_shadow + random_walk)
This PR contains optimisations and a general tidy-up of the MetalRT backend.
- Currently `scene_intersect` is used for both normal and (opaque) shadow rays, however the usage patterns are different enough to warrant specialisation. Shadow intersection tests (flagged with `PATH_RAY_SHADOW_OPAQUE`) only need a bool result, but need a larger "self" payload in order to exclude hits against target lights. By specialising we can minimise the payload size in each case (which is helps performance) and avoid some dynamic branching. This PR introduces a new `scene_intersect_shadow` function which is specialised in Metal, and currently redirects to `scene_intersect` in the other backends.
- Currently `scene_intersect_local` is implemented for worst-case payload requirements as demanded by `subsurface_disk` (where `max_hits` is 4). The random_walk case only demands 1 hit result which we can retrieve directly from the intersector object (rather than stashing it in the payload). By specialising, we significantly reduce the payload size for random_walk queries, which has a big impact on performance. Additionally, we only need to use a custom intersection function for the first ray test in a random walk (for self-primitive filtering), so this PR forces faster `opaque` intersection testing for all but the first random walk test.
- Currently `scene_intersect_volume` has a lot of redundant code to handle non-triangle primitives despite volumes only being enclosed by trimeshes. This PR removes this code.
Additionally, this PR tidies up the convoluted intersection function linking code, removes some redundant intersection handlers, and uses more consistent naming of intersection functions.
On a M3 MacBook Pro, these changes give 2-3% performance increase on typical scenes with opaque trimesh materials (e.g. barbershop, classroom junkshop), but can give over 15% performance increase for certain scenes using random walk SSS (e.g. monster).
Pull Request: https://projects.blender.org/blender/blender/pulls/121397
2024-05-10 16:38:02 +02:00
|
|
|
uint primitive_id [[primitive_id]])
|
|
|
|
|
{
|
|
|
|
|
PrimitiveIntersectionResult result;
|
|
|
|
|
result.continue_search = true;
|
2024-12-03 20:24:36 +01:00
|
|
|
# if defined(__METALRT_MOTION__)
|
|
|
|
|
result.accept = (payload.self_prim != primitive_id) && (payload.self_object == object);
|
|
|
|
|
# else
|
Cycles: MetalRT optimisations (scene_intersect_shadow + random_walk)
This PR contains optimisations and a general tidy-up of the MetalRT backend.
- Currently `scene_intersect` is used for both normal and (opaque) shadow rays, however the usage patterns are different enough to warrant specialisation. Shadow intersection tests (flagged with `PATH_RAY_SHADOW_OPAQUE`) only need a bool result, but need a larger "self" payload in order to exclude hits against target lights. By specialising we can minimise the payload size in each case (which is helps performance) and avoid some dynamic branching. This PR introduces a new `scene_intersect_shadow` function which is specialised in Metal, and currently redirects to `scene_intersect` in the other backends.
- Currently `scene_intersect_local` is implemented for worst-case payload requirements as demanded by `subsurface_disk` (where `max_hits` is 4). The random_walk case only demands 1 hit result which we can retrieve directly from the intersector object (rather than stashing it in the payload). By specialising, we significantly reduce the payload size for random_walk queries, which has a big impact on performance. Additionally, we only need to use a custom intersection function for the first ray test in a random walk (for self-primitive filtering), so this PR forces faster `opaque` intersection testing for all but the first random walk test.
- Currently `scene_intersect_volume` has a lot of redundant code to handle non-triangle primitives despite volumes only being enclosed by trimeshes. This PR removes this code.
Additionally, this PR tidies up the convoluted intersection function linking code, removes some redundant intersection handlers, and uses more consistent naming of intersection functions.
On a M3 MacBook Pro, these changes give 2-3% performance increase on typical scenes with opaque trimesh materials (e.g. barbershop, classroom junkshop), but can give over 15% performance increase for certain scenes using random walk SSS (e.g. monster).
Pull Request: https://projects.blender.org/blender/blender/pulls/121397
2024-05-10 16:38:02 +02:00
|
|
|
result.accept = (payload.self_prim != primitive_id);
|
2024-12-03 20:24:36 +01:00
|
|
|
# endif
|
Cycles: MetalRT optimisations (scene_intersect_shadow + random_walk)
This PR contains optimisations and a general tidy-up of the MetalRT backend.
- Currently `scene_intersect` is used for both normal and (opaque) shadow rays, however the usage patterns are different enough to warrant specialisation. Shadow intersection tests (flagged with `PATH_RAY_SHADOW_OPAQUE`) only need a bool result, but need a larger "self" payload in order to exclude hits against target lights. By specialising we can minimise the payload size in each case (which is helps performance) and avoid some dynamic branching. This PR introduces a new `scene_intersect_shadow` function which is specialised in Metal, and currently redirects to `scene_intersect` in the other backends.
- Currently `scene_intersect_local` is implemented for worst-case payload requirements as demanded by `subsurface_disk` (where `max_hits` is 4). The random_walk case only demands 1 hit result which we can retrieve directly from the intersector object (rather than stashing it in the payload). By specialising, we significantly reduce the payload size for random_walk queries, which has a big impact on performance. Additionally, we only need to use a custom intersection function for the first ray test in a random walk (for self-primitive filtering), so this PR forces faster `opaque` intersection testing for all but the first random walk test.
- Currently `scene_intersect_volume` has a lot of redundant code to handle non-triangle primitives despite volumes only being enclosed by trimeshes. This PR removes this code.
Additionally, this PR tidies up the convoluted intersection function linking code, removes some redundant intersection handlers, and uses more consistent naming of intersection functions.
On a M3 MacBook Pro, these changes give 2-3% performance increase on typical scenes with opaque trimesh materials (e.g. barbershop, classroom junkshop), but can give over 15% performance increase for certain scenes using random walk SSS (e.g. monster).
Pull Request: https://projects.blender.org/blender/blender/pulls/121397
2024-05-10 16:38:02 +02:00
|
|
|
return result;
|
|
|
|
|
}
|
|
|
|
|
|
2022-07-25 21:16:34 +02:00
|
|
|
template<typename TReturn, uint intersection_type>
|
Cycles: MetalRT optimisations (scene_intersect_shadow + random_walk)
This PR contains optimisations and a general tidy-up of the MetalRT backend.
- Currently `scene_intersect` is used for both normal and (opaque) shadow rays, however the usage patterns are different enough to warrant specialisation. Shadow intersection tests (flagged with `PATH_RAY_SHADOW_OPAQUE`) only need a bool result, but need a larger "self" payload in order to exclude hits against target lights. By specialising we can minimise the payload size in each case (which is helps performance) and avoid some dynamic branching. This PR introduces a new `scene_intersect_shadow` function which is specialised in Metal, and currently redirects to `scene_intersect` in the other backends.
- Currently `scene_intersect_local` is implemented for worst-case payload requirements as demanded by `subsurface_disk` (where `max_hits` is 4). The random_walk case only demands 1 hit result which we can retrieve directly from the intersector object (rather than stashing it in the payload). By specialising, we significantly reduce the payload size for random_walk queries, which has a big impact on performance. Additionally, we only need to use a custom intersection function for the first ray test in a random walk (for self-primitive filtering), so this PR forces faster `opaque` intersection testing for all but the first random walk test.
- Currently `scene_intersect_volume` has a lot of redundant code to handle non-triangle primitives despite volumes only being enclosed by trimeshes. This PR removes this code.
Additionally, this PR tidies up the convoluted intersection function linking code, removes some redundant intersection handlers, and uses more consistent naming of intersection functions.
On a M3 MacBook Pro, these changes give 2-3% performance increase on typical scenes with opaque trimesh materials (e.g. barbershop, classroom junkshop), but can give over 15% performance increase for certain scenes using random walk SSS (e.g. monster).
Pull Request: https://projects.blender.org/blender/blender/pulls/121397
2024-05-10 16:38:02 +02:00
|
|
|
TReturn metalrt_local_hit(ray_data MetalKernelContext::MetalRTIntersectionLocalPayload &payload,
|
2023-09-13 16:02:49 +02:00
|
|
|
const uint prim,
|
2022-07-25 21:16:34 +02:00
|
|
|
const float2 barycentrics,
|
|
|
|
|
const float ray_tmax)
|
|
|
|
|
{
|
|
|
|
|
TReturn result;
|
|
|
|
|
|
2023-09-13 11:03:43 -07:00
|
|
|
# ifdef __BVH_LOCAL__
|
Cycles: MetalRT optimisations (scene_intersect_shadow + random_walk)
This PR contains optimisations and a general tidy-up of the MetalRT backend.
- Currently `scene_intersect` is used for both normal and (opaque) shadow rays, however the usage patterns are different enough to warrant specialisation. Shadow intersection tests (flagged with `PATH_RAY_SHADOW_OPAQUE`) only need a bool result, but need a larger "self" payload in order to exclude hits against target lights. By specialising we can minimise the payload size in each case (which is helps performance) and avoid some dynamic branching. This PR introduces a new `scene_intersect_shadow` function which is specialised in Metal, and currently redirects to `scene_intersect` in the other backends.
- Currently `scene_intersect_local` is implemented for worst-case payload requirements as demanded by `subsurface_disk` (where `max_hits` is 4). The random_walk case only demands 1 hit result which we can retrieve directly from the intersector object (rather than stashing it in the payload). By specialising, we significantly reduce the payload size for random_walk queries, which has a big impact on performance. Additionally, we only need to use a custom intersection function for the first ray test in a random walk (for self-primitive filtering), so this PR forces faster `opaque` intersection testing for all but the first random walk test.
- Currently `scene_intersect_volume` has a lot of redundant code to handle non-triangle primitives despite volumes only being enclosed by trimeshes. This PR removes this code.
Additionally, this PR tidies up the convoluted intersection function linking code, removes some redundant intersection handlers, and uses more consistent naming of intersection functions.
On a M3 MacBook Pro, these changes give 2-3% performance increase on typical scenes with opaque trimesh materials (e.g. barbershop, classroom junkshop), but can give over 15% performance increase for certain scenes using random walk SSS (e.g. monster).
Pull Request: https://projects.blender.org/blender/blender/pulls/121397
2024-05-10 16:38:02 +02:00
|
|
|
if (payload.self_prim == prim) {
|
2023-09-14 13:25:24 +10:00
|
|
|
/* Only intersect with matching object and skip self-intersection. */
|
2022-07-25 21:16:34 +02:00
|
|
|
result.accept = false;
|
|
|
|
|
result.continue_search = true;
|
|
|
|
|
return result;
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
const short max_hits = payload.max_hits;
|
|
|
|
|
if (max_hits == 0) {
|
|
|
|
|
/* Special case for when no hit information is requested, just report that something was hit */
|
|
|
|
|
result.accept = true;
|
|
|
|
|
result.continue_search = false;
|
|
|
|
|
return result;
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
int hit = 0;
|
|
|
|
|
if (payload.has_lcg_state) {
|
Cycles: MetalRT optimisations (scene_intersect_shadow + random_walk)
This PR contains optimisations and a general tidy-up of the MetalRT backend.
- Currently `scene_intersect` is used for both normal and (opaque) shadow rays, however the usage patterns are different enough to warrant specialisation. Shadow intersection tests (flagged with `PATH_RAY_SHADOW_OPAQUE`) only need a bool result, but need a larger "self" payload in order to exclude hits against target lights. By specialising we can minimise the payload size in each case (which is helps performance) and avoid some dynamic branching. This PR introduces a new `scene_intersect_shadow` function which is specialised in Metal, and currently redirects to `scene_intersect` in the other backends.
- Currently `scene_intersect_local` is implemented for worst-case payload requirements as demanded by `subsurface_disk` (where `max_hits` is 4). The random_walk case only demands 1 hit result which we can retrieve directly from the intersector object (rather than stashing it in the payload). By specialising, we significantly reduce the payload size for random_walk queries, which has a big impact on performance. Additionally, we only need to use a custom intersection function for the first ray test in a random walk (for self-primitive filtering), so this PR forces faster `opaque` intersection testing for all but the first random walk test.
- Currently `scene_intersect_volume` has a lot of redundant code to handle non-triangle primitives despite volumes only being enclosed by trimeshes. This PR removes this code.
Additionally, this PR tidies up the convoluted intersection function linking code, removes some redundant intersection handlers, and uses more consistent naming of intersection functions.
On a M3 MacBook Pro, these changes give 2-3% performance increase on typical scenes with opaque trimesh materials (e.g. barbershop, classroom junkshop), but can give over 15% performance increase for certain scenes using random walk SSS (e.g. monster).
Pull Request: https://projects.blender.org/blender/blender/pulls/121397
2024-05-10 16:38:02 +02:00
|
|
|
for (short i = min(max_hits, short(payload.num_hits)) - 1; i >= 0; --i) {
|
|
|
|
|
if (ray_tmax == payload.hit_t[i]) {
|
2022-07-25 21:16:34 +02:00
|
|
|
result.accept = false;
|
|
|
|
|
result.continue_search = true;
|
|
|
|
|
return result;
|
|
|
|
|
}
|
|
|
|
|
}
|
|
|
|
|
|
Cycles: MetalRT optimisations (scene_intersect_shadow + random_walk)
This PR contains optimisations and a general tidy-up of the MetalRT backend.
- Currently `scene_intersect` is used for both normal and (opaque) shadow rays, however the usage patterns are different enough to warrant specialisation. Shadow intersection tests (flagged with `PATH_RAY_SHADOW_OPAQUE`) only need a bool result, but need a larger "self" payload in order to exclude hits against target lights. By specialising we can minimise the payload size in each case (which is helps performance) and avoid some dynamic branching. This PR introduces a new `scene_intersect_shadow` function which is specialised in Metal, and currently redirects to `scene_intersect` in the other backends.
- Currently `scene_intersect_local` is implemented for worst-case payload requirements as demanded by `subsurface_disk` (where `max_hits` is 4). The random_walk case only demands 1 hit result which we can retrieve directly from the intersector object (rather than stashing it in the payload). By specialising, we significantly reduce the payload size for random_walk queries, which has a big impact on performance. Additionally, we only need to use a custom intersection function for the first ray test in a random walk (for self-primitive filtering), so this PR forces faster `opaque` intersection testing for all but the first random walk test.
- Currently `scene_intersect_volume` has a lot of redundant code to handle non-triangle primitives despite volumes only being enclosed by trimeshes. This PR removes this code.
Additionally, this PR tidies up the convoluted intersection function linking code, removes some redundant intersection handlers, and uses more consistent naming of intersection functions.
On a M3 MacBook Pro, these changes give 2-3% performance increase on typical scenes with opaque trimesh materials (e.g. barbershop, classroom junkshop), but can give over 15% performance increase for certain scenes using random walk SSS (e.g. monster).
Pull Request: https://projects.blender.org/blender/blender/pulls/121397
2024-05-10 16:38:02 +02:00
|
|
|
hit = payload.num_hits;
|
|
|
|
|
if (hit < max_hits) {
|
|
|
|
|
payload.num_hits++;
|
|
|
|
|
}
|
|
|
|
|
else {
|
|
|
|
|
hit = lcg_step_uint(&payload.lcg_state) % payload.num_hits;
|
2022-07-25 21:16:34 +02:00
|
|
|
if (hit >= max_hits) {
|
|
|
|
|
result.accept = false;
|
|
|
|
|
result.continue_search = true;
|
|
|
|
|
return result;
|
|
|
|
|
}
|
|
|
|
|
}
|
|
|
|
|
}
|
|
|
|
|
else {
|
Cycles: MetalRT optimisations (scene_intersect_shadow + random_walk)
This PR contains optimisations and a general tidy-up of the MetalRT backend.
- Currently `scene_intersect` is used for both normal and (opaque) shadow rays, however the usage patterns are different enough to warrant specialisation. Shadow intersection tests (flagged with `PATH_RAY_SHADOW_OPAQUE`) only need a bool result, but need a larger "self" payload in order to exclude hits against target lights. By specialising we can minimise the payload size in each case (which is helps performance) and avoid some dynamic branching. This PR introduces a new `scene_intersect_shadow` function which is specialised in Metal, and currently redirects to `scene_intersect` in the other backends.
- Currently `scene_intersect_local` is implemented for worst-case payload requirements as demanded by `subsurface_disk` (where `max_hits` is 4). The random_walk case only demands 1 hit result which we can retrieve directly from the intersector object (rather than stashing it in the payload). By specialising, we significantly reduce the payload size for random_walk queries, which has a big impact on performance. Additionally, we only need to use a custom intersection function for the first ray test in a random walk (for self-primitive filtering), so this PR forces faster `opaque` intersection testing for all but the first random walk test.
- Currently `scene_intersect_volume` has a lot of redundant code to handle non-triangle primitives despite volumes only being enclosed by trimeshes. This PR removes this code.
Additionally, this PR tidies up the convoluted intersection function linking code, removes some redundant intersection handlers, and uses more consistent naming of intersection functions.
On a M3 MacBook Pro, these changes give 2-3% performance increase on typical scenes with opaque trimesh materials (e.g. barbershop, classroom junkshop), but can give over 15% performance increase for certain scenes using random walk SSS (e.g. monster).
Pull Request: https://projects.blender.org/blender/blender/pulls/121397
2024-05-10 16:38:02 +02:00
|
|
|
if (payload.num_hits && ray_tmax > payload.hit_t[0]) {
|
2022-07-25 21:16:34 +02:00
|
|
|
/* Record closest intersection only. Do not terminate ray here, since there is no guarantee
|
|
|
|
|
* about distance ordering in any-hit */
|
|
|
|
|
result.accept = false;
|
|
|
|
|
result.continue_search = true;
|
|
|
|
|
return result;
|
|
|
|
|
}
|
|
|
|
|
|
Cycles: MetalRT optimisations (scene_intersect_shadow + random_walk)
This PR contains optimisations and a general tidy-up of the MetalRT backend.
- Currently `scene_intersect` is used for both normal and (opaque) shadow rays, however the usage patterns are different enough to warrant specialisation. Shadow intersection tests (flagged with `PATH_RAY_SHADOW_OPAQUE`) only need a bool result, but need a larger "self" payload in order to exclude hits against target lights. By specialising we can minimise the payload size in each case (which is helps performance) and avoid some dynamic branching. This PR introduces a new `scene_intersect_shadow` function which is specialised in Metal, and currently redirects to `scene_intersect` in the other backends.
- Currently `scene_intersect_local` is implemented for worst-case payload requirements as demanded by `subsurface_disk` (where `max_hits` is 4). The random_walk case only demands 1 hit result which we can retrieve directly from the intersector object (rather than stashing it in the payload). By specialising, we significantly reduce the payload size for random_walk queries, which has a big impact on performance. Additionally, we only need to use a custom intersection function for the first ray test in a random walk (for self-primitive filtering), so this PR forces faster `opaque` intersection testing for all but the first random walk test.
- Currently `scene_intersect_volume` has a lot of redundant code to handle non-triangle primitives despite volumes only being enclosed by trimeshes. This PR removes this code.
Additionally, this PR tidies up the convoluted intersection function linking code, removes some redundant intersection handlers, and uses more consistent naming of intersection functions.
On a M3 MacBook Pro, these changes give 2-3% performance increase on typical scenes with opaque trimesh materials (e.g. barbershop, classroom junkshop), but can give over 15% performance increase for certain scenes using random walk SSS (e.g. monster).
Pull Request: https://projects.blender.org/blender/blender/pulls/121397
2024-05-10 16:38:02 +02:00
|
|
|
payload.num_hits = 1;
|
2022-07-25 21:16:34 +02:00
|
|
|
}
|
|
|
|
|
|
Cycles: MetalRT optimisations (scene_intersect_shadow + random_walk)
This PR contains optimisations and a general tidy-up of the MetalRT backend.
- Currently `scene_intersect` is used for both normal and (opaque) shadow rays, however the usage patterns are different enough to warrant specialisation. Shadow intersection tests (flagged with `PATH_RAY_SHADOW_OPAQUE`) only need a bool result, but need a larger "self" payload in order to exclude hits against target lights. By specialising we can minimise the payload size in each case (which is helps performance) and avoid some dynamic branching. This PR introduces a new `scene_intersect_shadow` function which is specialised in Metal, and currently redirects to `scene_intersect` in the other backends.
- Currently `scene_intersect_local` is implemented for worst-case payload requirements as demanded by `subsurface_disk` (where `max_hits` is 4). The random_walk case only demands 1 hit result which we can retrieve directly from the intersector object (rather than stashing it in the payload). By specialising, we significantly reduce the payload size for random_walk queries, which has a big impact on performance. Additionally, we only need to use a custom intersection function for the first ray test in a random walk (for self-primitive filtering), so this PR forces faster `opaque` intersection testing for all but the first random walk test.
- Currently `scene_intersect_volume` has a lot of redundant code to handle non-triangle primitives despite volumes only being enclosed by trimeshes. This PR removes this code.
Additionally, this PR tidies up the convoluted intersection function linking code, removes some redundant intersection handlers, and uses more consistent naming of intersection functions.
On a M3 MacBook Pro, these changes give 2-3% performance increase on typical scenes with opaque trimesh materials (e.g. barbershop, classroom junkshop), but can give over 15% performance increase for certain scenes using random walk SSS (e.g. monster).
Pull Request: https://projects.blender.org/blender/blender/pulls/121397
2024-05-10 16:38:02 +02:00
|
|
|
payload.hit_prim[hit] = prim;
|
|
|
|
|
payload.hit_t[hit] = ray_tmax;
|
|
|
|
|
payload.hit_u[hit] = barycentrics.x;
|
|
|
|
|
payload.hit_v[hit] = barycentrics.y;
|
2022-07-25 21:16:34 +02:00
|
|
|
|
|
|
|
|
/* Continue tracing (without this the trace call would return after the first hit) */
|
|
|
|
|
result.accept = false;
|
|
|
|
|
result.continue_search = true;
|
2023-09-04 16:44:27 +02:00
|
|
|
# endif
|
Cycles: MetalRT optimisations (scene_intersect_shadow + random_walk)
This PR contains optimisations and a general tidy-up of the MetalRT backend.
- Currently `scene_intersect` is used for both normal and (opaque) shadow rays, however the usage patterns are different enough to warrant specialisation. Shadow intersection tests (flagged with `PATH_RAY_SHADOW_OPAQUE`) only need a bool result, but need a larger "self" payload in order to exclude hits against target lights. By specialising we can minimise the payload size in each case (which is helps performance) and avoid some dynamic branching. This PR introduces a new `scene_intersect_shadow` function which is specialised in Metal, and currently redirects to `scene_intersect` in the other backends.
- Currently `scene_intersect_local` is implemented for worst-case payload requirements as demanded by `subsurface_disk` (where `max_hits` is 4). The random_walk case only demands 1 hit result which we can retrieve directly from the intersector object (rather than stashing it in the payload). By specialising, we significantly reduce the payload size for random_walk queries, which has a big impact on performance. Additionally, we only need to use a custom intersection function for the first ray test in a random walk (for self-primitive filtering), so this PR forces faster `opaque` intersection testing for all but the first random walk test.
- Currently `scene_intersect_volume` has a lot of redundant code to handle non-triangle primitives despite volumes only being enclosed by trimeshes. This PR removes this code.
Additionally, this PR tidies up the convoluted intersection function linking code, removes some redundant intersection handlers, and uses more consistent naming of intersection functions.
On a M3 MacBook Pro, these changes give 2-3% performance increase on typical scenes with opaque trimesh materials (e.g. barbershop, classroom junkshop), but can give over 15% performance increase for certain scenes using random walk SSS (e.g. monster).
Pull Request: https://projects.blender.org/blender/blender/pulls/121397
2024-05-10 16:38:02 +02:00
|
|
|
return result;
|
2022-07-25 21:16:34 +02:00
|
|
|
}
|
|
|
|
|
|
2023-09-13 16:02:49 +02:00
|
|
|
[[intersection(triangle, triangle_data, curve_data)]] PrimitiveIntersectionResult
|
Cycles: MetalRT optimisations (scene_intersect_shadow + random_walk)
This PR contains optimisations and a general tidy-up of the MetalRT backend.
- Currently `scene_intersect` is used for both normal and (opaque) shadow rays, however the usage patterns are different enough to warrant specialisation. Shadow intersection tests (flagged with `PATH_RAY_SHADOW_OPAQUE`) only need a bool result, but need a larger "self" payload in order to exclude hits against target lights. By specialising we can minimise the payload size in each case (which is helps performance) and avoid some dynamic branching. This PR introduces a new `scene_intersect_shadow` function which is specialised in Metal, and currently redirects to `scene_intersect` in the other backends.
- Currently `scene_intersect_local` is implemented for worst-case payload requirements as demanded by `subsurface_disk` (where `max_hits` is 4). The random_walk case only demands 1 hit result which we can retrieve directly from the intersector object (rather than stashing it in the payload). By specialising, we significantly reduce the payload size for random_walk queries, which has a big impact on performance. Additionally, we only need to use a custom intersection function for the first ray test in a random walk (for self-primitive filtering), so this PR forces faster `opaque` intersection testing for all but the first random walk test.
- Currently `scene_intersect_volume` has a lot of redundant code to handle non-triangle primitives despite volumes only being enclosed by trimeshes. This PR removes this code.
Additionally, this PR tidies up the convoluted intersection function linking code, removes some redundant intersection handlers, and uses more consistent naming of intersection functions.
On a M3 MacBook Pro, these changes give 2-3% performance increase on typical scenes with opaque trimesh materials (e.g. barbershop, classroom junkshop), but can give over 15% performance increase for certain scenes using random walk SSS (e.g. monster).
Pull Request: https://projects.blender.org/blender/blender/pulls/121397
2024-05-10 16:38:02 +02:00
|
|
|
__intersection__local_tri(ray_data MetalKernelContext::MetalRTIntersectionLocalPayload &payload
|
|
|
|
|
[[payload]],
|
|
|
|
|
uint primitive_id [[primitive_id]],
|
|
|
|
|
float2 barycentrics [[barycentric_coord]],
|
|
|
|
|
float ray_tmax [[distance]])
|
2023-02-06 19:09:51 +00:00
|
|
|
{
|
2023-09-06 14:23:01 +10:00
|
|
|
/* instance_id, aka the user_id has been removed. If we take this function we optimized the
|
|
|
|
|
* SSS for starting traversal from a primitive acceleration structure instead of the root of the
|
|
|
|
|
* global AS. this means we will always be intersecting the correct object no need for the
|
|
|
|
|
* user-id to check */
|
2023-09-13 16:02:49 +02:00
|
|
|
return metalrt_local_hit<PrimitiveIntersectionResult, METALRT_HIT_TRIANGLE>(
|
Cycles: MetalRT optimisations (scene_intersect_shadow + random_walk)
This PR contains optimisations and a general tidy-up of the MetalRT backend.
- Currently `scene_intersect` is used for both normal and (opaque) shadow rays, however the usage patterns are different enough to warrant specialisation. Shadow intersection tests (flagged with `PATH_RAY_SHADOW_OPAQUE`) only need a bool result, but need a larger "self" payload in order to exclude hits against target lights. By specialising we can minimise the payload size in each case (which is helps performance) and avoid some dynamic branching. This PR introduces a new `scene_intersect_shadow` function which is specialised in Metal, and currently redirects to `scene_intersect` in the other backends.
- Currently `scene_intersect_local` is implemented for worst-case payload requirements as demanded by `subsurface_disk` (where `max_hits` is 4). The random_walk case only demands 1 hit result which we can retrieve directly from the intersector object (rather than stashing it in the payload). By specialising, we significantly reduce the payload size for random_walk queries, which has a big impact on performance. Additionally, we only need to use a custom intersection function for the first ray test in a random walk (for self-primitive filtering), so this PR forces faster `opaque` intersection testing for all but the first random walk test.
- Currently `scene_intersect_volume` has a lot of redundant code to handle non-triangle primitives despite volumes only being enclosed by trimeshes. This PR removes this code.
Additionally, this PR tidies up the convoluted intersection function linking code, removes some redundant intersection handlers, and uses more consistent naming of intersection functions.
On a M3 MacBook Pro, these changes give 2-3% performance increase on typical scenes with opaque trimesh materials (e.g. barbershop, classroom junkshop), but can give over 15% performance increase for certain scenes using random walk SSS (e.g. monster).
Pull Request: https://projects.blender.org/blender/blender/pulls/121397
2024-05-10 16:38:02 +02:00
|
|
|
payload, primitive_id, barycentrics, ray_tmax);
|
2023-02-06 19:09:51 +00:00
|
|
|
}
|
Cycles: MetalRT optimisations (scene_intersect_shadow + random_walk)
This PR contains optimisations and a general tidy-up of the MetalRT backend.
- Currently `scene_intersect` is used for both normal and (opaque) shadow rays, however the usage patterns are different enough to warrant specialisation. Shadow intersection tests (flagged with `PATH_RAY_SHADOW_OPAQUE`) only need a bool result, but need a larger "self" payload in order to exclude hits against target lights. By specialising we can minimise the payload size in each case (which is helps performance) and avoid some dynamic branching. This PR introduces a new `scene_intersect_shadow` function which is specialised in Metal, and currently redirects to `scene_intersect` in the other backends.
- Currently `scene_intersect_local` is implemented for worst-case payload requirements as demanded by `subsurface_disk` (where `max_hits` is 4). The random_walk case only demands 1 hit result which we can retrieve directly from the intersector object (rather than stashing it in the payload). By specialising, we significantly reduce the payload size for random_walk queries, which has a big impact on performance. Additionally, we only need to use a custom intersection function for the first ray test in a random walk (for self-primitive filtering), so this PR forces faster `opaque` intersection testing for all but the first random walk test.
- Currently `scene_intersect_volume` has a lot of redundant code to handle non-triangle primitives despite volumes only being enclosed by trimeshes. This PR removes this code.
Additionally, this PR tidies up the convoluted intersection function linking code, removes some redundant intersection handlers, and uses more consistent naming of intersection functions.
On a M3 MacBook Pro, these changes give 2-3% performance increase on typical scenes with opaque trimesh materials (e.g. barbershop, classroom junkshop), but can give over 15% performance increase for certain scenes using random walk SSS (e.g. monster).
Pull Request: https://projects.blender.org/blender/blender/pulls/121397
2024-05-10 16:38:02 +02:00
|
|
|
|
2023-09-13 11:03:43 -07:00
|
|
|
[[intersection(triangle,
|
|
|
|
|
triangle_data,
|
|
|
|
|
curve_data,
|
|
|
|
|
METALRT_TAGS,
|
|
|
|
|
extended_limits)]] PrimitiveIntersectionResult
|
Cycles: MetalRT optimisations (scene_intersect_shadow + random_walk)
This PR contains optimisations and a general tidy-up of the MetalRT backend.
- Currently `scene_intersect` is used for both normal and (opaque) shadow rays, however the usage patterns are different enough to warrant specialisation. Shadow intersection tests (flagged with `PATH_RAY_SHADOW_OPAQUE`) only need a bool result, but need a larger "self" payload in order to exclude hits against target lights. By specialising we can minimise the payload size in each case (which is helps performance) and avoid some dynamic branching. This PR introduces a new `scene_intersect_shadow` function which is specialised in Metal, and currently redirects to `scene_intersect` in the other backends.
- Currently `scene_intersect_local` is implemented for worst-case payload requirements as demanded by `subsurface_disk` (where `max_hits` is 4). The random_walk case only demands 1 hit result which we can retrieve directly from the intersector object (rather than stashing it in the payload). By specialising, we significantly reduce the payload size for random_walk queries, which has a big impact on performance. Additionally, we only need to use a custom intersection function for the first ray test in a random walk (for self-primitive filtering), so this PR forces faster `opaque` intersection testing for all but the first random walk test.
- Currently `scene_intersect_volume` has a lot of redundant code to handle non-triangle primitives despite volumes only being enclosed by trimeshes. This PR removes this code.
Additionally, this PR tidies up the convoluted intersection function linking code, removes some redundant intersection handlers, and uses more consistent naming of intersection functions.
On a M3 MacBook Pro, these changes give 2-3% performance increase on typical scenes with opaque trimesh materials (e.g. barbershop, classroom junkshop), but can give over 15% performance increase for certain scenes using random walk SSS (e.g. monster).
Pull Request: https://projects.blender.org/blender/blender/pulls/121397
2024-05-10 16:38:02 +02:00
|
|
|
__intersection__local_tri_mblur(
|
2022-07-25 21:16:34 +02:00
|
|
|
ray_data MetalKernelContext::MetalRTIntersectionLocalPayload &payload [[payload]],
|
|
|
|
|
uint primitive_id [[primitive_id]],
|
2024-12-03 20:24:36 +01:00
|
|
|
# if defined(__METALRT_MOTION__)
|
|
|
|
|
uint object [[instance_id]],
|
|
|
|
|
# endif
|
2022-07-25 21:16:34 +02:00
|
|
|
float2 barycentrics [[barycentric_coord]],
|
|
|
|
|
float ray_tmax [[distance]])
|
|
|
|
|
{
|
2024-12-03 20:24:36 +01:00
|
|
|
# if defined(__METALRT_MOTION__)
|
|
|
|
|
if (payload.self_object != object) {
|
|
|
|
|
PrimitiveIntersectionResult result;
|
|
|
|
|
result.continue_search = true;
|
|
|
|
|
result.accept = false;
|
|
|
|
|
return result;
|
|
|
|
|
}
|
|
|
|
|
# endif
|
|
|
|
|
|
2023-09-13 16:02:49 +02:00
|
|
|
return metalrt_local_hit<PrimitiveIntersectionResult, METALRT_HIT_TRIANGLE>(
|
Cycles: MetalRT optimisations (scene_intersect_shadow + random_walk)
This PR contains optimisations and a general tidy-up of the MetalRT backend.
- Currently `scene_intersect` is used for both normal and (opaque) shadow rays, however the usage patterns are different enough to warrant specialisation. Shadow intersection tests (flagged with `PATH_RAY_SHADOW_OPAQUE`) only need a bool result, but need a larger "self" payload in order to exclude hits against target lights. By specialising we can minimise the payload size in each case (which is helps performance) and avoid some dynamic branching. This PR introduces a new `scene_intersect_shadow` function which is specialised in Metal, and currently redirects to `scene_intersect` in the other backends.
- Currently `scene_intersect_local` is implemented for worst-case payload requirements as demanded by `subsurface_disk` (where `max_hits` is 4). The random_walk case only demands 1 hit result which we can retrieve directly from the intersector object (rather than stashing it in the payload). By specialising, we significantly reduce the payload size for random_walk queries, which has a big impact on performance. Additionally, we only need to use a custom intersection function for the first ray test in a random walk (for self-primitive filtering), so this PR forces faster `opaque` intersection testing for all but the first random walk test.
- Currently `scene_intersect_volume` has a lot of redundant code to handle non-triangle primitives despite volumes only being enclosed by trimeshes. This PR removes this code.
Additionally, this PR tidies up the convoluted intersection function linking code, removes some redundant intersection handlers, and uses more consistent naming of intersection functions.
On a M3 MacBook Pro, these changes give 2-3% performance increase on typical scenes with opaque trimesh materials (e.g. barbershop, classroom junkshop), but can give over 15% performance increase for certain scenes using random walk SSS (e.g. monster).
Pull Request: https://projects.blender.org/blender/blender/pulls/121397
2024-05-10 16:38:02 +02:00
|
|
|
payload, primitive_id, barycentrics, ray_tmax);
|
2023-02-06 19:09:51 +00:00
|
|
|
}
|
|
|
|
|
|
2022-07-25 21:16:34 +02:00
|
|
|
template<uint intersection_type>
|
Cycles: MetalRT optimisations (scene_intersect_shadow + random_walk)
This PR contains optimisations and a general tidy-up of the MetalRT backend.
- Currently `scene_intersect` is used for both normal and (opaque) shadow rays, however the usage patterns are different enough to warrant specialisation. Shadow intersection tests (flagged with `PATH_RAY_SHADOW_OPAQUE`) only need a bool result, but need a larger "self" payload in order to exclude hits against target lights. By specialising we can minimise the payload size in each case (which is helps performance) and avoid some dynamic branching. This PR introduces a new `scene_intersect_shadow` function which is specialised in Metal, and currently redirects to `scene_intersect` in the other backends.
- Currently `scene_intersect_local` is implemented for worst-case payload requirements as demanded by `subsurface_disk` (where `max_hits` is 4). The random_walk case only demands 1 hit result which we can retrieve directly from the intersector object (rather than stashing it in the payload). By specialising, we significantly reduce the payload size for random_walk queries, which has a big impact on performance. Additionally, we only need to use a custom intersection function for the first ray test in a random walk (for self-primitive filtering), so this PR forces faster `opaque` intersection testing for all but the first random walk test.
- Currently `scene_intersect_volume` has a lot of redundant code to handle non-triangle primitives despite volumes only being enclosed by trimeshes. This PR removes this code.
Additionally, this PR tidies up the convoluted intersection function linking code, removes some redundant intersection handlers, and uses more consistent naming of intersection functions.
On a M3 MacBook Pro, these changes give 2-3% performance increase on typical scenes with opaque trimesh materials (e.g. barbershop, classroom junkshop), but can give over 15% performance increase for certain scenes using random walk SSS (e.g. monster).
Pull Request: https://projects.blender.org/blender/blender/pulls/121397
2024-05-10 16:38:02 +02:00
|
|
|
bool metalrt_shadow_all_hit(
|
|
|
|
|
constant KernelParamsMetal &launch_params_metal,
|
|
|
|
|
ray_data MetalKernelContext::MetalRTIntersectionShadowAllPayload &payload,
|
|
|
|
|
uint object,
|
|
|
|
|
uint prim,
|
|
|
|
|
const float2 barycentrics,
|
|
|
|
|
const float ray_tmax,
|
|
|
|
|
const float t = 0.0f,
|
2024-12-29 17:32:00 +01:00
|
|
|
const ccl_private Ray *ray = nullptr)
|
2022-07-25 21:16:34 +02:00
|
|
|
{
|
2023-09-13 11:03:43 -07:00
|
|
|
# ifdef __SHADOW_RECORD_ALL__
|
2023-09-13 16:02:49 +02:00
|
|
|
float u = barycentrics.x;
|
|
|
|
|
float v = barycentrics.y;
|
2022-11-14 15:35:47 +00:00
|
|
|
const int prim_type = kernel_data_fetch(objects, object).primitive_type;
|
2023-09-13 16:02:49 +02:00
|
|
|
int type;
|
|
|
|
|
|
2023-09-13 11:03:43 -07:00
|
|
|
# ifdef __HAIR__
|
2023-09-13 16:02:49 +02:00
|
|
|
if constexpr (intersection_type == METALRT_HIT_CURVE) {
|
|
|
|
|
const KernelCurveSegment segment = kernel_data_fetch(curve_segments, prim);
|
|
|
|
|
type = segment.type;
|
|
|
|
|
prim = segment.prim;
|
2023-09-13 11:03:43 -07:00
|
|
|
|
2023-09-13 16:02:49 +02:00
|
|
|
/* Filter out curve end-caps. */
|
|
|
|
|
if (u == 0.0f || u == 1.0f) {
|
|
|
|
|
/* continue search */
|
|
|
|
|
return true;
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
if (type & PRIMITIVE_CURVE_RIBBON) {
|
|
|
|
|
MetalKernelContext context(launch_params_metal);
|
2024-12-26 17:53:55 +01:00
|
|
|
if (!context.curve_ribbon_accept(nullptr, u, t, ray, object, prim, type)) {
|
2022-11-14 15:35:47 +00:00
|
|
|
/* continue search */
|
|
|
|
|
return true;
|
|
|
|
|
}
|
2022-07-25 21:16:34 +02:00
|
|
|
}
|
|
|
|
|
}
|
2023-09-04 16:44:27 +02:00
|
|
|
# endif
|
2022-07-25 21:16:34 +02:00
|
|
|
|
2023-09-13 16:02:49 +02:00
|
|
|
if constexpr (intersection_type == METALRT_HIT_BOUNDING_BOX) {
|
|
|
|
|
/* Point. */
|
|
|
|
|
type = kernel_data_fetch(objects, object).primitive_type;
|
|
|
|
|
u = 0.0f;
|
|
|
|
|
v = 0.0f;
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
if constexpr (intersection_type == METALRT_HIT_TRIANGLE) {
|
|
|
|
|
type = prim_type;
|
|
|
|
|
}
|
|
|
|
|
|
2023-09-05 17:21:49 +02:00
|
|
|
MetalKernelContext context(launch_params_metal);
|
|
|
|
|
|
|
|
|
|
if (context.intersection_skip_self_shadow(payload.self, object, prim)) {
|
2022-08-05 14:35:39 +02:00
|
|
|
/* continue search */
|
|
|
|
|
return true;
|
|
|
|
|
}
|
|
|
|
|
|
2023-09-06 15:25:30 +02:00
|
|
|
# ifdef __SHADOW_LINKING__
|
|
|
|
|
if (context.intersection_skip_shadow_link(nullptr, payload.self, object)) {
|
|
|
|
|
/* continue search */
|
|
|
|
|
return true;
|
|
|
|
|
}
|
|
|
|
|
# endif
|
|
|
|
|
|
2023-09-04 16:44:27 +02:00
|
|
|
# ifndef __TRANSPARENT_SHADOWS__
|
2022-07-25 21:16:34 +02:00
|
|
|
/* No transparent shadows support compiled in, make opaque. */
|
|
|
|
|
payload.result = true;
|
|
|
|
|
/* terminate ray */
|
|
|
|
|
return false;
|
2023-09-04 16:44:27 +02:00
|
|
|
# else
|
2022-07-25 21:16:34 +02:00
|
|
|
short max_hits = payload.max_hits;
|
|
|
|
|
short num_hits = payload.num_hits;
|
|
|
|
|
short num_recorded_hits = payload.num_recorded_hits;
|
|
|
|
|
|
|
|
|
|
/* If no transparent shadows, all light is blocked and we can stop immediately. */
|
|
|
|
|
if (num_hits >= max_hits ||
|
2024-12-26 17:53:55 +01:00
|
|
|
!(context.intersection_get_shader_flags(nullptr, prim, type) & SD_HAS_TRANSPARENT_SHADOW))
|
2023-09-04 16:44:27 +02:00
|
|
|
{
|
2022-07-25 21:16:34 +02:00
|
|
|
payload.result = true;
|
|
|
|
|
/* terminate ray */
|
|
|
|
|
return false;
|
|
|
|
|
}
|
|
|
|
|
|
2023-09-13 11:03:43 -07:00
|
|
|
# ifdef __HAIR__
|
2022-07-25 21:16:34 +02:00
|
|
|
/* Always use baked shadow transparency for curves. */
|
2023-09-13 16:02:49 +02:00
|
|
|
if constexpr (intersection_type == METALRT_HIT_CURVE) {
|
2022-07-25 21:16:34 +02:00
|
|
|
float throughput = payload.throughput;
|
2022-10-20 04:38:50 +02:00
|
|
|
throughput *= context.intersection_curve_shadow_transparency(nullptr, object, prim, type, u);
|
2022-07-25 21:16:34 +02:00
|
|
|
payload.throughput = throughput;
|
|
|
|
|
payload.num_hits += 1;
|
|
|
|
|
|
|
|
|
|
if (throughput < CURVE_SHADOW_TRANSPARENCY_CUTOFF) {
|
|
|
|
|
/* Accept result and terminate if throughput is sufficiently low */
|
|
|
|
|
payload.result = true;
|
|
|
|
|
return false;
|
|
|
|
|
}
|
|
|
|
|
else {
|
|
|
|
|
return true;
|
|
|
|
|
}
|
|
|
|
|
}
|
2023-09-13 11:03:43 -07:00
|
|
|
# endif
|
2022-07-25 21:16:34 +02:00
|
|
|
|
|
|
|
|
payload.num_hits += 1;
|
|
|
|
|
payload.num_recorded_hits += 1;
|
|
|
|
|
|
|
|
|
|
uint record_index = num_recorded_hits;
|
|
|
|
|
|
|
|
|
|
const IntegratorShadowState state = payload.state;
|
|
|
|
|
|
|
|
|
|
const uint max_record_hits = min(uint(max_hits), INTEGRATOR_SHADOW_ISECT_SIZE);
|
|
|
|
|
if (record_index >= max_record_hits) {
|
|
|
|
|
/* If maximum number of hits reached, find a hit to replace. */
|
|
|
|
|
float max_recorded_t = INTEGRATOR_STATE_ARRAY(state, shadow_isect, 0, t);
|
|
|
|
|
uint max_recorded_hit = 0;
|
|
|
|
|
|
|
|
|
|
for (int i = 1; i < max_record_hits; i++) {
|
|
|
|
|
const float isect_t = INTEGRATOR_STATE_ARRAY(state, shadow_isect, i, t);
|
|
|
|
|
if (isect_t > max_recorded_t) {
|
|
|
|
|
max_recorded_t = isect_t;
|
|
|
|
|
max_recorded_hit = i;
|
|
|
|
|
}
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
if (ray_tmax >= max_recorded_t) {
|
2023-03-29 20:20:07 +02:00
|
|
|
/* Ray hits are not guaranteed to be ordered by distance so don't exit early here.
|
|
|
|
|
* Continue search. */
|
2022-07-25 21:16:34 +02:00
|
|
|
return true;
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
record_index = max_recorded_hit;
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
INTEGRATOR_STATE_ARRAY_WRITE(state, shadow_isect, record_index, u) = u;
|
|
|
|
|
INTEGRATOR_STATE_ARRAY_WRITE(state, shadow_isect, record_index, v) = v;
|
|
|
|
|
INTEGRATOR_STATE_ARRAY_WRITE(state, shadow_isect, record_index, t) = ray_tmax;
|
|
|
|
|
INTEGRATOR_STATE_ARRAY_WRITE(state, shadow_isect, record_index, prim) = prim;
|
|
|
|
|
INTEGRATOR_STATE_ARRAY_WRITE(state, shadow_isect, record_index, object) = object;
|
|
|
|
|
INTEGRATOR_STATE_ARRAY_WRITE(state, shadow_isect, record_index, type) = type;
|
2023-09-04 16:44:27 +02:00
|
|
|
|
2022-07-25 21:16:34 +02:00
|
|
|
/* Continue tracing. */
|
2023-09-04 16:44:27 +02:00
|
|
|
# endif /* __TRANSPARENT_SHADOWS__ */
|
|
|
|
|
# endif /* __SHADOW_RECORD_ALL__ */
|
2022-07-25 21:16:34 +02:00
|
|
|
|
|
|
|
|
return true;
|
|
|
|
|
}
|
|
|
|
|
|
2023-09-13 11:03:43 -07:00
|
|
|
[[intersection(triangle,
|
|
|
|
|
triangle_data,
|
|
|
|
|
curve_data,
|
|
|
|
|
METALRT_TAGS,
|
|
|
|
|
extended_limits)]] PrimitiveIntersectionResult
|
Cycles: MetalRT optimisations (scene_intersect_shadow + random_walk)
This PR contains optimisations and a general tidy-up of the MetalRT backend.
- Currently `scene_intersect` is used for both normal and (opaque) shadow rays, however the usage patterns are different enough to warrant specialisation. Shadow intersection tests (flagged with `PATH_RAY_SHADOW_OPAQUE`) only need a bool result, but need a larger "self" payload in order to exclude hits against target lights. By specialising we can minimise the payload size in each case (which is helps performance) and avoid some dynamic branching. This PR introduces a new `scene_intersect_shadow` function which is specialised in Metal, and currently redirects to `scene_intersect` in the other backends.
- Currently `scene_intersect_local` is implemented for worst-case payload requirements as demanded by `subsurface_disk` (where `max_hits` is 4). The random_walk case only demands 1 hit result which we can retrieve directly from the intersector object (rather than stashing it in the payload). By specialising, we significantly reduce the payload size for random_walk queries, which has a big impact on performance. Additionally, we only need to use a custom intersection function for the first ray test in a random walk (for self-primitive filtering), so this PR forces faster `opaque` intersection testing for all but the first random walk test.
- Currently `scene_intersect_volume` has a lot of redundant code to handle non-triangle primitives despite volumes only being enclosed by trimeshes. This PR removes this code.
Additionally, this PR tidies up the convoluted intersection function linking code, removes some redundant intersection handlers, and uses more consistent naming of intersection functions.
On a M3 MacBook Pro, these changes give 2-3% performance increase on typical scenes with opaque trimesh materials (e.g. barbershop, classroom junkshop), but can give over 15% performance increase for certain scenes using random walk SSS (e.g. monster).
Pull Request: https://projects.blender.org/blender/blender/pulls/121397
2024-05-10 16:38:02 +02:00
|
|
|
__intersection__tri_shadow_all(
|
2022-07-25 21:16:34 +02:00
|
|
|
constant KernelParamsMetal &launch_params_metal [[buffer(1)]],
|
Cycles: MetalRT optimisations (scene_intersect_shadow + random_walk)
This PR contains optimisations and a general tidy-up of the MetalRT backend.
- Currently `scene_intersect` is used for both normal and (opaque) shadow rays, however the usage patterns are different enough to warrant specialisation. Shadow intersection tests (flagged with `PATH_RAY_SHADOW_OPAQUE`) only need a bool result, but need a larger "self" payload in order to exclude hits against target lights. By specialising we can minimise the payload size in each case (which is helps performance) and avoid some dynamic branching. This PR introduces a new `scene_intersect_shadow` function which is specialised in Metal, and currently redirects to `scene_intersect` in the other backends.
- Currently `scene_intersect_local` is implemented for worst-case payload requirements as demanded by `subsurface_disk` (where `max_hits` is 4). The random_walk case only demands 1 hit result which we can retrieve directly from the intersector object (rather than stashing it in the payload). By specialising, we significantly reduce the payload size for random_walk queries, which has a big impact on performance. Additionally, we only need to use a custom intersection function for the first ray test in a random walk (for self-primitive filtering), so this PR forces faster `opaque` intersection testing for all but the first random walk test.
- Currently `scene_intersect_volume` has a lot of redundant code to handle non-triangle primitives despite volumes only being enclosed by trimeshes. This PR removes this code.
Additionally, this PR tidies up the convoluted intersection function linking code, removes some redundant intersection handlers, and uses more consistent naming of intersection functions.
On a M3 MacBook Pro, these changes give 2-3% performance increase on typical scenes with opaque trimesh materials (e.g. barbershop, classroom junkshop), but can give over 15% performance increase for certain scenes using random walk SSS (e.g. monster).
Pull Request: https://projects.blender.org/blender/blender/pulls/121397
2024-05-10 16:38:02 +02:00
|
|
|
ray_data MetalKernelContext::MetalRTIntersectionShadowAllPayload &payload [[payload]],
|
2023-09-13 16:02:49 +02:00
|
|
|
const unsigned int object [[instance_id]],
|
|
|
|
|
const unsigned int primitive_id [[primitive_id]],
|
|
|
|
|
const uint primitive_id_offset [[user_instance_id]],
|
|
|
|
|
const float2 barycentrics [[barycentric_coord]],
|
|
|
|
|
const float ray_tmax [[distance]])
|
2022-07-25 21:16:34 +02:00
|
|
|
{
|
2023-09-13 16:02:49 +02:00
|
|
|
uint prim = primitive_id + primitive_id_offset;
|
2022-07-25 21:16:34 +02:00
|
|
|
|
2023-09-13 16:02:49 +02:00
|
|
|
PrimitiveIntersectionResult result;
|
2022-07-25 21:16:34 +02:00
|
|
|
result.continue_search = metalrt_shadow_all_hit<METALRT_HIT_TRIANGLE>(
|
|
|
|
|
launch_params_metal, payload, object, prim, barycentrics, ray_tmax);
|
|
|
|
|
result.accept = !result.continue_search;
|
|
|
|
|
return result;
|
|
|
|
|
}
|
|
|
|
|
|
2023-09-25 22:41:27 +02:00
|
|
|
[[intersection(triangle,
|
|
|
|
|
triangle_data,
|
|
|
|
|
curve_data,
|
|
|
|
|
METALRT_TAGS,
|
|
|
|
|
extended_limits)]] PrimitiveIntersectionResult
|
Cycles: MetalRT optimisations (scene_intersect_shadow + random_walk)
This PR contains optimisations and a general tidy-up of the MetalRT backend.
- Currently `scene_intersect` is used for both normal and (opaque) shadow rays, however the usage patterns are different enough to warrant specialisation. Shadow intersection tests (flagged with `PATH_RAY_SHADOW_OPAQUE`) only need a bool result, but need a larger "self" payload in order to exclude hits against target lights. By specialising we can minimise the payload size in each case (which is helps performance) and avoid some dynamic branching. This PR introduces a new `scene_intersect_shadow` function which is specialised in Metal, and currently redirects to `scene_intersect` in the other backends.
- Currently `scene_intersect_local` is implemented for worst-case payload requirements as demanded by `subsurface_disk` (where `max_hits` is 4). The random_walk case only demands 1 hit result which we can retrieve directly from the intersector object (rather than stashing it in the payload). By specialising, we significantly reduce the payload size for random_walk queries, which has a big impact on performance. Additionally, we only need to use a custom intersection function for the first ray test in a random walk (for self-primitive filtering), so this PR forces faster `opaque` intersection testing for all but the first random walk test.
- Currently `scene_intersect_volume` has a lot of redundant code to handle non-triangle primitives despite volumes only being enclosed by trimeshes. This PR removes this code.
Additionally, this PR tidies up the convoluted intersection function linking code, removes some redundant intersection handlers, and uses more consistent naming of intersection functions.
On a M3 MacBook Pro, these changes give 2-3% performance increase on typical scenes with opaque trimesh materials (e.g. barbershop, classroom junkshop), but can give over 15% performance increase for certain scenes using random walk SSS (e.g. monster).
Pull Request: https://projects.blender.org/blender/blender/pulls/121397
2024-05-10 16:38:02 +02:00
|
|
|
__intersection__volume_tri(constant KernelParamsMetal &launch_params_metal [[buffer(1)]],
|
|
|
|
|
ray_data MetalKernelContext::MetalRTIntersectionShadowPayload &payload
|
|
|
|
|
[[payload]],
|
|
|
|
|
const unsigned int object [[instance_id]],
|
|
|
|
|
const unsigned int primitive_id [[primitive_id]],
|
|
|
|
|
const uint primitive_id_offset [[user_instance_id]])
|
2023-09-25 22:41:27 +02:00
|
|
|
{
|
|
|
|
|
PrimitiveIntersectionResult result;
|
|
|
|
|
result.continue_search = true;
|
|
|
|
|
|
|
|
|
|
if ((kernel_data_fetch(object_flag, object) & SD_OBJECT_HAS_VOLUME) == 0) {
|
|
|
|
|
result.accept = false;
|
|
|
|
|
return result;
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
uint prim = primitive_id + primitive_id_offset;
|
|
|
|
|
MetalKernelContext context(launch_params_metal);
|
|
|
|
|
if (context.intersection_skip_self(payload.self, object, prim)) {
|
|
|
|
|
result.accept = false;
|
|
|
|
|
return result;
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
result.accept = true;
|
|
|
|
|
return result;
|
|
|
|
|
}
|
|
|
|
|
|
2022-07-25 21:16:34 +02:00
|
|
|
template<typename TReturnType, uint intersection_type>
|
|
|
|
|
inline TReturnType metalrt_visibility_test(
|
|
|
|
|
constant KernelParamsMetal &launch_params_metal,
|
|
|
|
|
ray_data MetalKernelContext::MetalRTIntersectionPayload &payload,
|
|
|
|
|
const uint object,
|
2022-08-13 16:53:30 +02:00
|
|
|
uint prim,
|
2023-09-13 16:02:49 +02:00
|
|
|
const float u,
|
|
|
|
|
const float t = 0.0f,
|
2024-12-29 17:32:00 +01:00
|
|
|
const ccl_private Ray *ray = nullptr)
|
2022-07-25 21:16:34 +02:00
|
|
|
{
|
|
|
|
|
TReturnType result;
|
|
|
|
|
|
2023-09-13 11:03:43 -07:00
|
|
|
# ifdef __HAIR__
|
2023-09-13 16:02:49 +02:00
|
|
|
if constexpr (intersection_type == METALRT_HIT_CURVE) {
|
2023-09-08 16:58:00 +10:00
|
|
|
/* Filter out curve end-caps. */
|
2022-07-25 21:16:34 +02:00
|
|
|
if (u == 0.0f || u == 1.0f) {
|
|
|
|
|
result.accept = false;
|
|
|
|
|
result.continue_search = true;
|
|
|
|
|
return result;
|
|
|
|
|
}
|
2023-09-13 11:03:43 -07:00
|
|
|
|
2023-09-13 16:02:49 +02:00
|
|
|
const KernelCurveSegment segment = kernel_data_fetch(curve_segments, prim);
|
|
|
|
|
int type = segment.type;
|
|
|
|
|
prim = segment.prim;
|
|
|
|
|
|
|
|
|
|
if (type & PRIMITIVE_CURVE_RIBBON) {
|
|
|
|
|
MetalKernelContext context(launch_params_metal);
|
2024-12-26 17:53:55 +01:00
|
|
|
if (!context.curve_ribbon_accept(nullptr, u, t, ray, object, prim, type)) {
|
2023-09-13 16:02:49 +02:00
|
|
|
result.accept = false;
|
|
|
|
|
result.continue_search = true;
|
|
|
|
|
return result;
|
|
|
|
|
}
|
|
|
|
|
}
|
2022-07-25 21:16:34 +02:00
|
|
|
}
|
2023-09-04 16:44:27 +02:00
|
|
|
# endif
|
2022-07-25 21:16:34 +02:00
|
|
|
|
Cycles: MetalRT optimisations (scene_intersect_shadow + random_walk)
This PR contains optimisations and a general tidy-up of the MetalRT backend.
- Currently `scene_intersect` is used for both normal and (opaque) shadow rays, however the usage patterns are different enough to warrant specialisation. Shadow intersection tests (flagged with `PATH_RAY_SHADOW_OPAQUE`) only need a bool result, but need a larger "self" payload in order to exclude hits against target lights. By specialising we can minimise the payload size in each case (which is helps performance) and avoid some dynamic branching. This PR introduces a new `scene_intersect_shadow` function which is specialised in Metal, and currently redirects to `scene_intersect` in the other backends.
- Currently `scene_intersect_local` is implemented for worst-case payload requirements as demanded by `subsurface_disk` (where `max_hits` is 4). The random_walk case only demands 1 hit result which we can retrieve directly from the intersector object (rather than stashing it in the payload). By specialising, we significantly reduce the payload size for random_walk queries, which has a big impact on performance. Additionally, we only need to use a custom intersection function for the first ray test in a random walk (for self-primitive filtering), so this PR forces faster `opaque` intersection testing for all but the first random walk test.
- Currently `scene_intersect_volume` has a lot of redundant code to handle non-triangle primitives despite volumes only being enclosed by trimeshes. This PR removes this code.
Additionally, this PR tidies up the convoluted intersection function linking code, removes some redundant intersection handlers, and uses more consistent naming of intersection functions.
On a M3 MacBook Pro, these changes give 2-3% performance increase on typical scenes with opaque trimesh materials (e.g. barbershop, classroom junkshop), but can give over 15% performance increase for certain scenes using random walk SSS (e.g. monster).
Pull Request: https://projects.blender.org/blender/blender/pulls/121397
2024-05-10 16:38:02 +02:00
|
|
|
if (payload.self_object == object && payload.self_prim == prim) {
|
|
|
|
|
result.accept = false;
|
|
|
|
|
result.continue_search = true;
|
|
|
|
|
return result;
|
|
|
|
|
}
|
|
|
|
|
result.accept = true;
|
|
|
|
|
result.continue_search = true;
|
|
|
|
|
return result;
|
|
|
|
|
}
|
2022-08-05 14:35:39 +02:00
|
|
|
|
Cycles: MetalRT optimisations (scene_intersect_shadow + random_walk)
This PR contains optimisations and a general tidy-up of the MetalRT backend.
- Currently `scene_intersect` is used for both normal and (opaque) shadow rays, however the usage patterns are different enough to warrant specialisation. Shadow intersection tests (flagged with `PATH_RAY_SHADOW_OPAQUE`) only need a bool result, but need a larger "self" payload in order to exclude hits against target lights. By specialising we can minimise the payload size in each case (which is helps performance) and avoid some dynamic branching. This PR introduces a new `scene_intersect_shadow` function which is specialised in Metal, and currently redirects to `scene_intersect` in the other backends.
- Currently `scene_intersect_local` is implemented for worst-case payload requirements as demanded by `subsurface_disk` (where `max_hits` is 4). The random_walk case only demands 1 hit result which we can retrieve directly from the intersector object (rather than stashing it in the payload). By specialising, we significantly reduce the payload size for random_walk queries, which has a big impact on performance. Additionally, we only need to use a custom intersection function for the first ray test in a random walk (for self-primitive filtering), so this PR forces faster `opaque` intersection testing for all but the first random walk test.
- Currently `scene_intersect_volume` has a lot of redundant code to handle non-triangle primitives despite volumes only being enclosed by trimeshes. This PR removes this code.
Additionally, this PR tidies up the convoluted intersection function linking code, removes some redundant intersection handlers, and uses more consistent naming of intersection functions.
On a M3 MacBook Pro, these changes give 2-3% performance increase on typical scenes with opaque trimesh materials (e.g. barbershop, classroom junkshop), but can give over 15% performance increase for certain scenes using random walk SSS (e.g. monster).
Pull Request: https://projects.blender.org/blender/blender/pulls/121397
2024-05-10 16:38:02 +02:00
|
|
|
template<typename TReturnType, uint intersection_type>
|
|
|
|
|
inline TReturnType metalrt_visibility_test_shadow(
|
|
|
|
|
constant KernelParamsMetal &launch_params_metal,
|
|
|
|
|
ray_data MetalKernelContext::MetalRTIntersectionShadowPayload &payload,
|
|
|
|
|
const uint object,
|
|
|
|
|
uint prim,
|
|
|
|
|
const float u,
|
|
|
|
|
const float t = 0.0f,
|
2024-12-29 17:32:00 +01:00
|
|
|
const ccl_private Ray *ray = nullptr)
|
Cycles: MetalRT optimisations (scene_intersect_shadow + random_walk)
This PR contains optimisations and a general tidy-up of the MetalRT backend.
- Currently `scene_intersect` is used for both normal and (opaque) shadow rays, however the usage patterns are different enough to warrant specialisation. Shadow intersection tests (flagged with `PATH_RAY_SHADOW_OPAQUE`) only need a bool result, but need a larger "self" payload in order to exclude hits against target lights. By specialising we can minimise the payload size in each case (which is helps performance) and avoid some dynamic branching. This PR introduces a new `scene_intersect_shadow` function which is specialised in Metal, and currently redirects to `scene_intersect` in the other backends.
- Currently `scene_intersect_local` is implemented for worst-case payload requirements as demanded by `subsurface_disk` (where `max_hits` is 4). The random_walk case only demands 1 hit result which we can retrieve directly from the intersector object (rather than stashing it in the payload). By specialising, we significantly reduce the payload size for random_walk queries, which has a big impact on performance. Additionally, we only need to use a custom intersection function for the first ray test in a random walk (for self-primitive filtering), so this PR forces faster `opaque` intersection testing for all but the first random walk test.
- Currently `scene_intersect_volume` has a lot of redundant code to handle non-triangle primitives despite volumes only being enclosed by trimeshes. This PR removes this code.
Additionally, this PR tidies up the convoluted intersection function linking code, removes some redundant intersection handlers, and uses more consistent naming of intersection functions.
On a M3 MacBook Pro, these changes give 2-3% performance increase on typical scenes with opaque trimesh materials (e.g. barbershop, classroom junkshop), but can give over 15% performance increase for certain scenes using random walk SSS (e.g. monster).
Pull Request: https://projects.blender.org/blender/blender/pulls/121397
2024-05-10 16:38:02 +02:00
|
|
|
{
|
|
|
|
|
TReturnType result;
|
2023-09-05 17:21:49 +02:00
|
|
|
|
Cycles: MetalRT optimisations (scene_intersect_shadow + random_walk)
This PR contains optimisations and a general tidy-up of the MetalRT backend.
- Currently `scene_intersect` is used for both normal and (opaque) shadow rays, however the usage patterns are different enough to warrant specialisation. Shadow intersection tests (flagged with `PATH_RAY_SHADOW_OPAQUE`) only need a bool result, but need a larger "self" payload in order to exclude hits against target lights. By specialising we can minimise the payload size in each case (which is helps performance) and avoid some dynamic branching. This PR introduces a new `scene_intersect_shadow` function which is specialised in Metal, and currently redirects to `scene_intersect` in the other backends.
- Currently `scene_intersect_local` is implemented for worst-case payload requirements as demanded by `subsurface_disk` (where `max_hits` is 4). The random_walk case only demands 1 hit result which we can retrieve directly from the intersector object (rather than stashing it in the payload). By specialising, we significantly reduce the payload size for random_walk queries, which has a big impact on performance. Additionally, we only need to use a custom intersection function for the first ray test in a random walk (for self-primitive filtering), so this PR forces faster `opaque` intersection testing for all but the first random walk test.
- Currently `scene_intersect_volume` has a lot of redundant code to handle non-triangle primitives despite volumes only being enclosed by trimeshes. This PR removes this code.
Additionally, this PR tidies up the convoluted intersection function linking code, removes some redundant intersection handlers, and uses more consistent naming of intersection functions.
On a M3 MacBook Pro, these changes give 2-3% performance increase on typical scenes with opaque trimesh materials (e.g. barbershop, classroom junkshop), but can give over 15% performance increase for certain scenes using random walk SSS (e.g. monster).
Pull Request: https://projects.blender.org/blender/blender/pulls/121397
2024-05-10 16:38:02 +02:00
|
|
|
# ifdef __HAIR__
|
|
|
|
|
if constexpr (intersection_type == METALRT_HIT_CURVE) {
|
|
|
|
|
/* Filter out curve end-caps. */
|
|
|
|
|
if (u == 0.0f || u == 1.0f) {
|
2023-09-06 15:25:30 +02:00
|
|
|
result.accept = false;
|
|
|
|
|
result.continue_search = true;
|
|
|
|
|
return result;
|
|
|
|
|
}
|
|
|
|
|
|
Cycles: MetalRT optimisations (scene_intersect_shadow + random_walk)
This PR contains optimisations and a general tidy-up of the MetalRT backend.
- Currently `scene_intersect` is used for both normal and (opaque) shadow rays, however the usage patterns are different enough to warrant specialisation. Shadow intersection tests (flagged with `PATH_RAY_SHADOW_OPAQUE`) only need a bool result, but need a larger "self" payload in order to exclude hits against target lights. By specialising we can minimise the payload size in each case (which is helps performance) and avoid some dynamic branching. This PR introduces a new `scene_intersect_shadow` function which is specialised in Metal, and currently redirects to `scene_intersect` in the other backends.
- Currently `scene_intersect_local` is implemented for worst-case payload requirements as demanded by `subsurface_disk` (where `max_hits` is 4). The random_walk case only demands 1 hit result which we can retrieve directly from the intersector object (rather than stashing it in the payload). By specialising, we significantly reduce the payload size for random_walk queries, which has a big impact on performance. Additionally, we only need to use a custom intersection function for the first ray test in a random walk (for self-primitive filtering), so this PR forces faster `opaque` intersection testing for all but the first random walk test.
- Currently `scene_intersect_volume` has a lot of redundant code to handle non-triangle primitives despite volumes only being enclosed by trimeshes. This PR removes this code.
Additionally, this PR tidies up the convoluted intersection function linking code, removes some redundant intersection handlers, and uses more consistent naming of intersection functions.
On a M3 MacBook Pro, these changes give 2-3% performance increase on typical scenes with opaque trimesh materials (e.g. barbershop, classroom junkshop), but can give over 15% performance increase for certain scenes using random walk SSS (e.g. monster).
Pull Request: https://projects.blender.org/blender/blender/pulls/121397
2024-05-10 16:38:02 +02:00
|
|
|
const KernelCurveSegment segment = kernel_data_fetch(curve_segments, prim);
|
|
|
|
|
int type = segment.type;
|
|
|
|
|
prim = segment.prim;
|
|
|
|
|
|
|
|
|
|
if (type & PRIMITIVE_CURVE_RIBBON) {
|
|
|
|
|
MetalKernelContext context(launch_params_metal);
|
2024-12-26 17:53:55 +01:00
|
|
|
if (!context.curve_ribbon_accept(nullptr, u, t, ray, object, prim, type)) {
|
Cycles: MetalRT optimisations (scene_intersect_shadow + random_walk)
This PR contains optimisations and a general tidy-up of the MetalRT backend.
- Currently `scene_intersect` is used for both normal and (opaque) shadow rays, however the usage patterns are different enough to warrant specialisation. Shadow intersection tests (flagged with `PATH_RAY_SHADOW_OPAQUE`) only need a bool result, but need a larger "self" payload in order to exclude hits against target lights. By specialising we can minimise the payload size in each case (which is helps performance) and avoid some dynamic branching. This PR introduces a new `scene_intersect_shadow` function which is specialised in Metal, and currently redirects to `scene_intersect` in the other backends.
- Currently `scene_intersect_local` is implemented for worst-case payload requirements as demanded by `subsurface_disk` (where `max_hits` is 4). The random_walk case only demands 1 hit result which we can retrieve directly from the intersector object (rather than stashing it in the payload). By specialising, we significantly reduce the payload size for random_walk queries, which has a big impact on performance. Additionally, we only need to use a custom intersection function for the first ray test in a random walk (for self-primitive filtering), so this PR forces faster `opaque` intersection testing for all but the first random walk test.
- Currently `scene_intersect_volume` has a lot of redundant code to handle non-triangle primitives despite volumes only being enclosed by trimeshes. This PR removes this code.
Additionally, this PR tidies up the convoluted intersection function linking code, removes some redundant intersection handlers, and uses more consistent naming of intersection functions.
On a M3 MacBook Pro, these changes give 2-3% performance increase on typical scenes with opaque trimesh materials (e.g. barbershop, classroom junkshop), but can give over 15% performance increase for certain scenes using random walk SSS (e.g. monster).
Pull Request: https://projects.blender.org/blender/blender/pulls/121397
2024-05-10 16:38:02 +02:00
|
|
|
result.accept = false;
|
|
|
|
|
result.continue_search = true;
|
|
|
|
|
return result;
|
|
|
|
|
}
|
2022-07-25 21:16:34 +02:00
|
|
|
}
|
|
|
|
|
}
|
Cycles: MetalRT optimisations (scene_intersect_shadow + random_walk)
This PR contains optimisations and a general tidy-up of the MetalRT backend.
- Currently `scene_intersect` is used for both normal and (opaque) shadow rays, however the usage patterns are different enough to warrant specialisation. Shadow intersection tests (flagged with `PATH_RAY_SHADOW_OPAQUE`) only need a bool result, but need a larger "self" payload in order to exclude hits against target lights. By specialising we can minimise the payload size in each case (which is helps performance) and avoid some dynamic branching. This PR introduces a new `scene_intersect_shadow` function which is specialised in Metal, and currently redirects to `scene_intersect` in the other backends.
- Currently `scene_intersect_local` is implemented for worst-case payload requirements as demanded by `subsurface_disk` (where `max_hits` is 4). The random_walk case only demands 1 hit result which we can retrieve directly from the intersector object (rather than stashing it in the payload). By specialising, we significantly reduce the payload size for random_walk queries, which has a big impact on performance. Additionally, we only need to use a custom intersection function for the first ray test in a random walk (for self-primitive filtering), so this PR forces faster `opaque` intersection testing for all but the first random walk test.
- Currently `scene_intersect_volume` has a lot of redundant code to handle non-triangle primitives despite volumes only being enclosed by trimeshes. This PR removes this code.
Additionally, this PR tidies up the convoluted intersection function linking code, removes some redundant intersection handlers, and uses more consistent naming of intersection functions.
On a M3 MacBook Pro, these changes give 2-3% performance increase on typical scenes with opaque trimesh materials (e.g. barbershop, classroom junkshop), but can give over 15% performance increase for certain scenes using random walk SSS (e.g. monster).
Pull Request: https://projects.blender.org/blender/blender/pulls/121397
2024-05-10 16:38:02 +02:00
|
|
|
# endif
|
|
|
|
|
|
|
|
|
|
MetalKernelContext context(launch_params_metal);
|
|
|
|
|
|
|
|
|
|
/* Shadow ray early termination. */
|
|
|
|
|
# ifdef __SHADOW_LINKING__
|
|
|
|
|
if (context.intersection_skip_shadow_link(nullptr, payload.self, object)) {
|
|
|
|
|
result.accept = false;
|
|
|
|
|
result.continue_search = true;
|
|
|
|
|
return result;
|
|
|
|
|
}
|
|
|
|
|
# endif
|
|
|
|
|
|
|
|
|
|
if (context.intersection_skip_self_shadow(payload.self, object, prim)) {
|
|
|
|
|
result.accept = false;
|
|
|
|
|
result.continue_search = true;
|
|
|
|
|
return result;
|
|
|
|
|
}
|
2022-07-25 21:16:34 +02:00
|
|
|
else {
|
Cycles: MetalRT optimisations (scene_intersect_shadow + random_walk)
This PR contains optimisations and a general tidy-up of the MetalRT backend.
- Currently `scene_intersect` is used for both normal and (opaque) shadow rays, however the usage patterns are different enough to warrant specialisation. Shadow intersection tests (flagged with `PATH_RAY_SHADOW_OPAQUE`) only need a bool result, but need a larger "self" payload in order to exclude hits against target lights. By specialising we can minimise the payload size in each case (which is helps performance) and avoid some dynamic branching. This PR introduces a new `scene_intersect_shadow` function which is specialised in Metal, and currently redirects to `scene_intersect` in the other backends.
- Currently `scene_intersect_local` is implemented for worst-case payload requirements as demanded by `subsurface_disk` (where `max_hits` is 4). The random_walk case only demands 1 hit result which we can retrieve directly from the intersector object (rather than stashing it in the payload). By specialising, we significantly reduce the payload size for random_walk queries, which has a big impact on performance. Additionally, we only need to use a custom intersection function for the first ray test in a random walk (for self-primitive filtering), so this PR forces faster `opaque` intersection testing for all but the first random walk test.
- Currently `scene_intersect_volume` has a lot of redundant code to handle non-triangle primitives despite volumes only being enclosed by trimeshes. This PR removes this code.
Additionally, this PR tidies up the convoluted intersection function linking code, removes some redundant intersection handlers, and uses more consistent naming of intersection functions.
On a M3 MacBook Pro, these changes give 2-3% performance increase on typical scenes with opaque trimesh materials (e.g. barbershop, classroom junkshop), but can give over 15% performance increase for certain scenes using random walk SSS (e.g. monster).
Pull Request: https://projects.blender.org/blender/blender/pulls/121397
2024-05-10 16:38:02 +02:00
|
|
|
result.accept = true;
|
|
|
|
|
result.continue_search = false;
|
|
|
|
|
return result;
|
2022-07-25 21:16:34 +02:00
|
|
|
}
|
|
|
|
|
|
|
|
|
|
result.accept = true;
|
|
|
|
|
result.continue_search = true;
|
|
|
|
|
return result;
|
|
|
|
|
}
|
|
|
|
|
|
2023-09-13 11:03:43 -07:00
|
|
|
[[intersection(triangle,
|
|
|
|
|
triangle_data,
|
|
|
|
|
curve_data,
|
|
|
|
|
METALRT_TAGS,
|
|
|
|
|
extended_limits)]] PrimitiveIntersectionResult
|
Cycles: MetalRT optimisations (scene_intersect_shadow + random_walk)
This PR contains optimisations and a general tidy-up of the MetalRT backend.
- Currently `scene_intersect` is used for both normal and (opaque) shadow rays, however the usage patterns are different enough to warrant specialisation. Shadow intersection tests (flagged with `PATH_RAY_SHADOW_OPAQUE`) only need a bool result, but need a larger "self" payload in order to exclude hits against target lights. By specialising we can minimise the payload size in each case (which is helps performance) and avoid some dynamic branching. This PR introduces a new `scene_intersect_shadow` function which is specialised in Metal, and currently redirects to `scene_intersect` in the other backends.
- Currently `scene_intersect_local` is implemented for worst-case payload requirements as demanded by `subsurface_disk` (where `max_hits` is 4). The random_walk case only demands 1 hit result which we can retrieve directly from the intersector object (rather than stashing it in the payload). By specialising, we significantly reduce the payload size for random_walk queries, which has a big impact on performance. Additionally, we only need to use a custom intersection function for the first ray test in a random walk (for self-primitive filtering), so this PR forces faster `opaque` intersection testing for all but the first random walk test.
- Currently `scene_intersect_volume` has a lot of redundant code to handle non-triangle primitives despite volumes only being enclosed by trimeshes. This PR removes this code.
Additionally, this PR tidies up the convoluted intersection function linking code, removes some redundant intersection handlers, and uses more consistent naming of intersection functions.
On a M3 MacBook Pro, these changes give 2-3% performance increase on typical scenes with opaque trimesh materials (e.g. barbershop, classroom junkshop), but can give over 15% performance increase for certain scenes using random walk SSS (e.g. monster).
Pull Request: https://projects.blender.org/blender/blender/pulls/121397
2024-05-10 16:38:02 +02:00
|
|
|
__intersection__tri(constant KernelParamsMetal &launch_params_metal [[buffer(1)]],
|
|
|
|
|
ray_data MetalKernelContext::MetalRTIntersectionPayload &payload [[payload]],
|
|
|
|
|
const unsigned int object [[instance_id]],
|
|
|
|
|
const uint primitive_id_offset [[user_instance_id]],
|
|
|
|
|
const unsigned int primitive_id [[primitive_id]])
|
2022-07-25 21:16:34 +02:00
|
|
|
{
|
Cycles: MetalRT optimisations (scene_intersect_shadow + random_walk)
This PR contains optimisations and a general tidy-up of the MetalRT backend.
- Currently `scene_intersect` is used for both normal and (opaque) shadow rays, however the usage patterns are different enough to warrant specialisation. Shadow intersection tests (flagged with `PATH_RAY_SHADOW_OPAQUE`) only need a bool result, but need a larger "self" payload in order to exclude hits against target lights. By specialising we can minimise the payload size in each case (which is helps performance) and avoid some dynamic branching. This PR introduces a new `scene_intersect_shadow` function which is specialised in Metal, and currently redirects to `scene_intersect` in the other backends.
- Currently `scene_intersect_local` is implemented for worst-case payload requirements as demanded by `subsurface_disk` (where `max_hits` is 4). The random_walk case only demands 1 hit result which we can retrieve directly from the intersector object (rather than stashing it in the payload). By specialising, we significantly reduce the payload size for random_walk queries, which has a big impact on performance. Additionally, we only need to use a custom intersection function for the first ray test in a random walk (for self-primitive filtering), so this PR forces faster `opaque` intersection testing for all but the first random walk test.
- Currently `scene_intersect_volume` has a lot of redundant code to handle non-triangle primitives despite volumes only being enclosed by trimeshes. This PR removes this code.
Additionally, this PR tidies up the convoluted intersection function linking code, removes some redundant intersection handlers, and uses more consistent naming of intersection functions.
On a M3 MacBook Pro, these changes give 2-3% performance increase on typical scenes with opaque trimesh materials (e.g. barbershop, classroom junkshop), but can give over 15% performance increase for certain scenes using random walk SSS (e.g. monster).
Pull Request: https://projects.blender.org/blender/blender/pulls/121397
2024-05-10 16:38:02 +02:00
|
|
|
PrimitiveIntersectionResult result;
|
|
|
|
|
result.continue_search = true;
|
|
|
|
|
result.accept = (payload.self_object != object ||
|
|
|
|
|
payload.self_prim != (primitive_id + primitive_id_offset));
|
2022-07-25 21:16:34 +02:00
|
|
|
return result;
|
|
|
|
|
}
|
|
|
|
|
|
Cycles: MetalRT optimisations (scene_intersect_shadow + random_walk)
This PR contains optimisations and a general tidy-up of the MetalRT backend.
- Currently `scene_intersect` is used for both normal and (opaque) shadow rays, however the usage patterns are different enough to warrant specialisation. Shadow intersection tests (flagged with `PATH_RAY_SHADOW_OPAQUE`) only need a bool result, but need a larger "self" payload in order to exclude hits against target lights. By specialising we can minimise the payload size in each case (which is helps performance) and avoid some dynamic branching. This PR introduces a new `scene_intersect_shadow` function which is specialised in Metal, and currently redirects to `scene_intersect` in the other backends.
- Currently `scene_intersect_local` is implemented for worst-case payload requirements as demanded by `subsurface_disk` (where `max_hits` is 4). The random_walk case only demands 1 hit result which we can retrieve directly from the intersector object (rather than stashing it in the payload). By specialising, we significantly reduce the payload size for random_walk queries, which has a big impact on performance. Additionally, we only need to use a custom intersection function for the first ray test in a random walk (for self-primitive filtering), so this PR forces faster `opaque` intersection testing for all but the first random walk test.
- Currently `scene_intersect_volume` has a lot of redundant code to handle non-triangle primitives despite volumes only being enclosed by trimeshes. This PR removes this code.
Additionally, this PR tidies up the convoluted intersection function linking code, removes some redundant intersection handlers, and uses more consistent naming of intersection functions.
On a M3 MacBook Pro, these changes give 2-3% performance increase on typical scenes with opaque trimesh materials (e.g. barbershop, classroom junkshop), but can give over 15% performance increase for certain scenes using random walk SSS (e.g. monster).
Pull Request: https://projects.blender.org/blender/blender/pulls/121397
2024-05-10 16:38:02 +02:00
|
|
|
[[intersection(triangle,
|
2023-09-13 11:03:43 -07:00
|
|
|
triangle_data,
|
|
|
|
|
curve_data,
|
|
|
|
|
METALRT_TAGS,
|
Cycles: MetalRT optimisations (scene_intersect_shadow + random_walk)
This PR contains optimisations and a general tidy-up of the MetalRT backend.
- Currently `scene_intersect` is used for both normal and (opaque) shadow rays, however the usage patterns are different enough to warrant specialisation. Shadow intersection tests (flagged with `PATH_RAY_SHADOW_OPAQUE`) only need a bool result, but need a larger "self" payload in order to exclude hits against target lights. By specialising we can minimise the payload size in each case (which is helps performance) and avoid some dynamic branching. This PR introduces a new `scene_intersect_shadow` function which is specialised in Metal, and currently redirects to `scene_intersect` in the other backends.
- Currently `scene_intersect_local` is implemented for worst-case payload requirements as demanded by `subsurface_disk` (where `max_hits` is 4). The random_walk case only demands 1 hit result which we can retrieve directly from the intersector object (rather than stashing it in the payload). By specialising, we significantly reduce the payload size for random_walk queries, which has a big impact on performance. Additionally, we only need to use a custom intersection function for the first ray test in a random walk (for self-primitive filtering), so this PR forces faster `opaque` intersection testing for all but the first random walk test.
- Currently `scene_intersect_volume` has a lot of redundant code to handle non-triangle primitives despite volumes only being enclosed by trimeshes. This PR removes this code.
Additionally, this PR tidies up the convoluted intersection function linking code, removes some redundant intersection handlers, and uses more consistent naming of intersection functions.
On a M3 MacBook Pro, these changes give 2-3% performance increase on typical scenes with opaque trimesh materials (e.g. barbershop, classroom junkshop), but can give over 15% performance increase for certain scenes using random walk SSS (e.g. monster).
Pull Request: https://projects.blender.org/blender/blender/pulls/121397
2024-05-10 16:38:02 +02:00
|
|
|
extended_limits)]] PrimitiveIntersectionResult
|
|
|
|
|
__intersection__tri_shadow(constant KernelParamsMetal &launch_params_metal [[buffer(1)]],
|
|
|
|
|
ray_data MetalKernelContext::MetalRTIntersectionShadowPayload &payload
|
|
|
|
|
[[payload]],
|
|
|
|
|
const unsigned int object [[instance_id]],
|
|
|
|
|
const uint primitive_id_offset [[user_instance_id]],
|
|
|
|
|
const unsigned int primitive_id [[primitive_id]])
|
2022-07-25 21:16:34 +02:00
|
|
|
{
|
Cycles: MetalRT optimisations (scene_intersect_shadow + random_walk)
This PR contains optimisations and a general tidy-up of the MetalRT backend.
- Currently `scene_intersect` is used for both normal and (opaque) shadow rays, however the usage patterns are different enough to warrant specialisation. Shadow intersection tests (flagged with `PATH_RAY_SHADOW_OPAQUE`) only need a bool result, but need a larger "self" payload in order to exclude hits against target lights. By specialising we can minimise the payload size in each case (which is helps performance) and avoid some dynamic branching. This PR introduces a new `scene_intersect_shadow` function which is specialised in Metal, and currently redirects to `scene_intersect` in the other backends.
- Currently `scene_intersect_local` is implemented for worst-case payload requirements as demanded by `subsurface_disk` (where `max_hits` is 4). The random_walk case only demands 1 hit result which we can retrieve directly from the intersector object (rather than stashing it in the payload). By specialising, we significantly reduce the payload size for random_walk queries, which has a big impact on performance. Additionally, we only need to use a custom intersection function for the first ray test in a random walk (for self-primitive filtering), so this PR forces faster `opaque` intersection testing for all but the first random walk test.
- Currently `scene_intersect_volume` has a lot of redundant code to handle non-triangle primitives despite volumes only being enclosed by trimeshes. This PR removes this code.
Additionally, this PR tidies up the convoluted intersection function linking code, removes some redundant intersection handlers, and uses more consistent naming of intersection functions.
On a M3 MacBook Pro, these changes give 2-3% performance increase on typical scenes with opaque trimesh materials (e.g. barbershop, classroom junkshop), but can give over 15% performance increase for certain scenes using random walk SSS (e.g. monster).
Pull Request: https://projects.blender.org/blender/blender/pulls/121397
2024-05-10 16:38:02 +02:00
|
|
|
uint prim = primitive_id + primitive_id_offset;
|
|
|
|
|
PrimitiveIntersectionResult result =
|
|
|
|
|
metalrt_visibility_test_shadow<PrimitiveIntersectionResult, METALRT_HIT_TRIANGLE>(
|
|
|
|
|
launch_params_metal, payload, object, prim, 0.0f);
|
2022-07-25 21:16:34 +02:00
|
|
|
return result;
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
/* Primitive intersection functions. */
|
|
|
|
|
|
2023-09-13 11:03:43 -07:00
|
|
|
[[intersection(
|
|
|
|
|
curve, triangle_data, curve_data, METALRT_TAGS, extended_limits)]] PrimitiveIntersectionResult
|
2023-09-13 16:02:49 +02:00
|
|
|
__intersection__curve(constant KernelParamsMetal &launch_params_metal [[buffer(1)]],
|
2023-09-13 11:03:43 -07:00
|
|
|
ray_data MetalKernelContext::MetalRTIntersectionPayload &payload [[payload]],
|
2023-09-13 16:02:49 +02:00
|
|
|
const uint object [[instance_id]],
|
|
|
|
|
const uint primitive_id [[primitive_id]],
|
|
|
|
|
const uint primitive_id_offset [[user_instance_id]],
|
2023-09-13 11:03:43 -07:00
|
|
|
float distance [[distance]],
|
2023-09-13 16:02:49 +02:00
|
|
|
const float3 ray_P [[origin]],
|
|
|
|
|
const float3 ray_D [[direction]],
|
|
|
|
|
float u [[curve_parameter]],
|
|
|
|
|
const float ray_tmin [[min_distance]],
|
|
|
|
|
const float ray_tmax [[max_distance]]
|
2024-04-30 12:56:22 +02:00
|
|
|
# if defined(__METALRT_MOTION__)
|
2023-09-13 11:03:43 -07:00
|
|
|
,
|
|
|
|
|
const float time [[time]]
|
2024-04-30 12:56:22 +02:00
|
|
|
# endif
|
2023-09-13 11:03:43 -07:00
|
|
|
)
|
2022-07-25 21:16:34 +02:00
|
|
|
{
|
2023-09-13 16:02:49 +02:00
|
|
|
uint prim = primitive_id + primitive_id_offset;
|
2022-07-25 21:16:34 +02:00
|
|
|
|
2023-09-13 16:02:49 +02:00
|
|
|
Ray ray;
|
|
|
|
|
ray.P = ray_P;
|
|
|
|
|
ray.D = ray_D;
|
2024-04-30 12:56:22 +02:00
|
|
|
# if defined(__METALRT_MOTION__)
|
2023-09-13 16:02:49 +02:00
|
|
|
ray.time = time;
|
2024-04-30 12:56:22 +02:00
|
|
|
# endif
|
2022-07-25 21:16:34 +02:00
|
|
|
|
2023-09-13 16:02:49 +02:00
|
|
|
PrimitiveIntersectionResult result =
|
2023-09-13 11:03:43 -07:00
|
|
|
metalrt_visibility_test<PrimitiveIntersectionResult, METALRT_HIT_CURVE>(
|
|
|
|
|
launch_params_metal, payload, object, prim, u, distance, &ray);
|
2022-07-25 21:16:34 +02:00
|
|
|
|
|
|
|
|
return result;
|
|
|
|
|
}
|
|
|
|
|
|
2023-09-13 11:03:43 -07:00
|
|
|
[[intersection(
|
|
|
|
|
curve, triangle_data, curve_data, METALRT_TAGS, extended_limits)]] PrimitiveIntersectionResult
|
|
|
|
|
__intersection__curve_shadow(constant KernelParamsMetal &launch_params_metal [[buffer(1)]],
|
|
|
|
|
ray_data MetalKernelContext::MetalRTIntersectionShadowPayload &payload
|
|
|
|
|
[[payload]],
|
|
|
|
|
const uint object [[instance_id]],
|
|
|
|
|
const uint primitive_id [[primitive_id]],
|
|
|
|
|
const uint primitive_id_offset [[user_instance_id]],
|
Cycles: MetalRT optimisations (scene_intersect_shadow + random_walk)
This PR contains optimisations and a general tidy-up of the MetalRT backend.
- Currently `scene_intersect` is used for both normal and (opaque) shadow rays, however the usage patterns are different enough to warrant specialisation. Shadow intersection tests (flagged with `PATH_RAY_SHADOW_OPAQUE`) only need a bool result, but need a larger "self" payload in order to exclude hits against target lights. By specialising we can minimise the payload size in each case (which is helps performance) and avoid some dynamic branching. This PR introduces a new `scene_intersect_shadow` function which is specialised in Metal, and currently redirects to `scene_intersect` in the other backends.
- Currently `scene_intersect_local` is implemented for worst-case payload requirements as demanded by `subsurface_disk` (where `max_hits` is 4). The random_walk case only demands 1 hit result which we can retrieve directly from the intersector object (rather than stashing it in the payload). By specialising, we significantly reduce the payload size for random_walk queries, which has a big impact on performance. Additionally, we only need to use a custom intersection function for the first ray test in a random walk (for self-primitive filtering), so this PR forces faster `opaque` intersection testing for all but the first random walk test.
- Currently `scene_intersect_volume` has a lot of redundant code to handle non-triangle primitives despite volumes only being enclosed by trimeshes. This PR removes this code.
Additionally, this PR tidies up the convoluted intersection function linking code, removes some redundant intersection handlers, and uses more consistent naming of intersection functions.
On a M3 MacBook Pro, these changes give 2-3% performance increase on typical scenes with opaque trimesh materials (e.g. barbershop, classroom junkshop), but can give over 15% performance increase for certain scenes using random walk SSS (e.g. monster).
Pull Request: https://projects.blender.org/blender/blender/pulls/121397
2024-05-10 16:38:02 +02:00
|
|
|
float distance [[distance]],
|
2023-09-13 11:03:43 -07:00
|
|
|
const float3 ray_P [[origin]],
|
|
|
|
|
const float3 ray_D [[direction]],
|
|
|
|
|
float u [[curve_parameter]],
|
Cycles: MetalRT optimisations (scene_intersect_shadow + random_walk)
This PR contains optimisations and a general tidy-up of the MetalRT backend.
- Currently `scene_intersect` is used for both normal and (opaque) shadow rays, however the usage patterns are different enough to warrant specialisation. Shadow intersection tests (flagged with `PATH_RAY_SHADOW_OPAQUE`) only need a bool result, but need a larger "self" payload in order to exclude hits against target lights. By specialising we can minimise the payload size in each case (which is helps performance) and avoid some dynamic branching. This PR introduces a new `scene_intersect_shadow` function which is specialised in Metal, and currently redirects to `scene_intersect` in the other backends.
- Currently `scene_intersect_local` is implemented for worst-case payload requirements as demanded by `subsurface_disk` (where `max_hits` is 4). The random_walk case only demands 1 hit result which we can retrieve directly from the intersector object (rather than stashing it in the payload). By specialising, we significantly reduce the payload size for random_walk queries, which has a big impact on performance. Additionally, we only need to use a custom intersection function for the first ray test in a random walk (for self-primitive filtering), so this PR forces faster `opaque` intersection testing for all but the first random walk test.
- Currently `scene_intersect_volume` has a lot of redundant code to handle non-triangle primitives despite volumes only being enclosed by trimeshes. This PR removes this code.
Additionally, this PR tidies up the convoluted intersection function linking code, removes some redundant intersection handlers, and uses more consistent naming of intersection functions.
On a M3 MacBook Pro, these changes give 2-3% performance increase on typical scenes with opaque trimesh materials (e.g. barbershop, classroom junkshop), but can give over 15% performance increase for certain scenes using random walk SSS (e.g. monster).
Pull Request: https://projects.blender.org/blender/blender/pulls/121397
2024-05-10 16:38:02 +02:00
|
|
|
const float ray_tmin [[min_distance]],
|
|
|
|
|
const float ray_tmax [[max_distance]]
|
2024-04-30 12:56:22 +02:00
|
|
|
# if defined(__METALRT_MOTION__)
|
Cycles: MetalRT optimisations (scene_intersect_shadow + random_walk)
This PR contains optimisations and a general tidy-up of the MetalRT backend.
- Currently `scene_intersect` is used for both normal and (opaque) shadow rays, however the usage patterns are different enough to warrant specialisation. Shadow intersection tests (flagged with `PATH_RAY_SHADOW_OPAQUE`) only need a bool result, but need a larger "self" payload in order to exclude hits against target lights. By specialising we can minimise the payload size in each case (which is helps performance) and avoid some dynamic branching. This PR introduces a new `scene_intersect_shadow` function which is specialised in Metal, and currently redirects to `scene_intersect` in the other backends.
- Currently `scene_intersect_local` is implemented for worst-case payload requirements as demanded by `subsurface_disk` (where `max_hits` is 4). The random_walk case only demands 1 hit result which we can retrieve directly from the intersector object (rather than stashing it in the payload). By specialising, we significantly reduce the payload size for random_walk queries, which has a big impact on performance. Additionally, we only need to use a custom intersection function for the first ray test in a random walk (for self-primitive filtering), so this PR forces faster `opaque` intersection testing for all but the first random walk test.
- Currently `scene_intersect_volume` has a lot of redundant code to handle non-triangle primitives despite volumes only being enclosed by trimeshes. This PR removes this code.
Additionally, this PR tidies up the convoluted intersection function linking code, removes some redundant intersection handlers, and uses more consistent naming of intersection functions.
On a M3 MacBook Pro, these changes give 2-3% performance increase on typical scenes with opaque trimesh materials (e.g. barbershop, classroom junkshop), but can give over 15% performance increase for certain scenes using random walk SSS (e.g. monster).
Pull Request: https://projects.blender.org/blender/blender/pulls/121397
2024-05-10 16:38:02 +02:00
|
|
|
,
|
|
|
|
|
const float time [[time]]
|
2024-04-30 12:56:22 +02:00
|
|
|
# endif
|
Cycles: MetalRT optimisations (scene_intersect_shadow + random_walk)
This PR contains optimisations and a general tidy-up of the MetalRT backend.
- Currently `scene_intersect` is used for both normal and (opaque) shadow rays, however the usage patterns are different enough to warrant specialisation. Shadow intersection tests (flagged with `PATH_RAY_SHADOW_OPAQUE`) only need a bool result, but need a larger "self" payload in order to exclude hits against target lights. By specialising we can minimise the payload size in each case (which is helps performance) and avoid some dynamic branching. This PR introduces a new `scene_intersect_shadow` function which is specialised in Metal, and currently redirects to `scene_intersect` in the other backends.
- Currently `scene_intersect_local` is implemented for worst-case payload requirements as demanded by `subsurface_disk` (where `max_hits` is 4). The random_walk case only demands 1 hit result which we can retrieve directly from the intersector object (rather than stashing it in the payload). By specialising, we significantly reduce the payload size for random_walk queries, which has a big impact on performance. Additionally, we only need to use a custom intersection function for the first ray test in a random walk (for self-primitive filtering), so this PR forces faster `opaque` intersection testing for all but the first random walk test.
- Currently `scene_intersect_volume` has a lot of redundant code to handle non-triangle primitives despite volumes only being enclosed by trimeshes. This PR removes this code.
Additionally, this PR tidies up the convoluted intersection function linking code, removes some redundant intersection handlers, and uses more consistent naming of intersection functions.
On a M3 MacBook Pro, these changes give 2-3% performance increase on typical scenes with opaque trimesh materials (e.g. barbershop, classroom junkshop), but can give over 15% performance increase for certain scenes using random walk SSS (e.g. monster).
Pull Request: https://projects.blender.org/blender/blender/pulls/121397
2024-05-10 16:38:02 +02:00
|
|
|
)
|
|
|
|
|
{
|
|
|
|
|
uint prim = primitive_id + primitive_id_offset;
|
|
|
|
|
|
|
|
|
|
Ray ray;
|
|
|
|
|
ray.P = ray_P;
|
|
|
|
|
ray.D = ray_D;
|
|
|
|
|
# if defined(__METALRT_MOTION__)
|
|
|
|
|
ray.time = time;
|
|
|
|
|
# endif
|
|
|
|
|
|
|
|
|
|
PrimitiveIntersectionResult result =
|
|
|
|
|
metalrt_visibility_test_shadow<PrimitiveIntersectionResult, METALRT_HIT_CURVE>(
|
|
|
|
|
launch_params_metal, payload, object, prim, u, distance, &ray);
|
|
|
|
|
|
|
|
|
|
return result;
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
[[intersection(
|
|
|
|
|
curve, triangle_data, curve_data, METALRT_TAGS, extended_limits)]] PrimitiveIntersectionResult
|
|
|
|
|
__intersection__curve_shadow_all(
|
|
|
|
|
constant KernelParamsMetal &launch_params_metal [[buffer(1)]],
|
|
|
|
|
ray_data MetalKernelContext::MetalRTIntersectionShadowAllPayload &payload [[payload]],
|
|
|
|
|
const uint object [[instance_id]],
|
|
|
|
|
const uint primitive_id [[primitive_id]],
|
|
|
|
|
const uint primitive_id_offset [[user_instance_id]],
|
|
|
|
|
const float3 ray_P [[origin]],
|
|
|
|
|
const float3 ray_D [[direction]],
|
|
|
|
|
float u [[curve_parameter]],
|
|
|
|
|
float t [[distance]],
|
|
|
|
|
# if defined(__METALRT_MOTION__)
|
|
|
|
|
const float time [[time]],
|
|
|
|
|
# endif
|
|
|
|
|
const float ray_tmin [[min_distance]],
|
|
|
|
|
const float ray_tmax [[max_distance]])
|
2022-07-25 21:16:34 +02:00
|
|
|
{
|
2023-09-13 16:02:49 +02:00
|
|
|
uint prim = primitive_id + primitive_id_offset;
|
2022-07-25 21:16:34 +02:00
|
|
|
|
2023-09-13 16:02:49 +02:00
|
|
|
PrimitiveIntersectionResult result;
|
2022-07-25 21:16:34 +02:00
|
|
|
|
2023-09-13 16:02:49 +02:00
|
|
|
Ray ray;
|
|
|
|
|
ray.P = ray_P;
|
|
|
|
|
ray.D = ray_D;
|
2024-04-30 12:56:22 +02:00
|
|
|
# if defined(__METALRT_MOTION__)
|
2023-09-13 16:02:49 +02:00
|
|
|
ray.time = time;
|
2024-04-30 12:56:22 +02:00
|
|
|
# endif
|
2022-07-25 21:16:34 +02:00
|
|
|
|
2023-09-13 16:02:49 +02:00
|
|
|
result.continue_search = metalrt_shadow_all_hit<METALRT_HIT_CURVE>(
|
|
|
|
|
launch_params_metal, payload, object, prim, float2(u, 0), ray_tmax, t, &ray);
|
|
|
|
|
result.accept = !result.continue_search;
|
2022-07-25 21:16:34 +02:00
|
|
|
|
|
|
|
|
return result;
|
|
|
|
|
}
|
|
|
|
|
|
2023-09-04 16:44:27 +02:00
|
|
|
# ifdef __POINTCLOUD__
|
Cycles: MetalRT optimisations (scene_intersect_shadow + random_walk)
This PR contains optimisations and a general tidy-up of the MetalRT backend.
- Currently `scene_intersect` is used for both normal and (opaque) shadow rays, however the usage patterns are different enough to warrant specialisation. Shadow intersection tests (flagged with `PATH_RAY_SHADOW_OPAQUE`) only need a bool result, but need a larger "self" payload in order to exclude hits against target lights. By specialising we can minimise the payload size in each case (which is helps performance) and avoid some dynamic branching. This PR introduces a new `scene_intersect_shadow` function which is specialised in Metal, and currently redirects to `scene_intersect` in the other backends.
- Currently `scene_intersect_local` is implemented for worst-case payload requirements as demanded by `subsurface_disk` (where `max_hits` is 4). The random_walk case only demands 1 hit result which we can retrieve directly from the intersector object (rather than stashing it in the payload). By specialising, we significantly reduce the payload size for random_walk queries, which has a big impact on performance. Additionally, we only need to use a custom intersection function for the first ray test in a random walk (for self-primitive filtering), so this PR forces faster `opaque` intersection testing for all but the first random walk test.
- Currently `scene_intersect_volume` has a lot of redundant code to handle non-triangle primitives despite volumes only being enclosed by trimeshes. This PR removes this code.
Additionally, this PR tidies up the convoluted intersection function linking code, removes some redundant intersection handlers, and uses more consistent naming of intersection functions.
On a M3 MacBook Pro, these changes give 2-3% performance increase on typical scenes with opaque trimesh materials (e.g. barbershop, classroom junkshop), but can give over 15% performance increase for certain scenes using random walk SSS (e.g. monster).
Pull Request: https://projects.blender.org/blender/blender/pulls/121397
2024-05-10 16:38:02 +02:00
|
|
|
ccl_device_inline void metalrt_intersection_point_shadow_all(
|
2022-07-25 21:16:34 +02:00
|
|
|
constant KernelParamsMetal &launch_params_metal,
|
Cycles: MetalRT optimisations (scene_intersect_shadow + random_walk)
This PR contains optimisations and a general tidy-up of the MetalRT backend.
- Currently `scene_intersect` is used for both normal and (opaque) shadow rays, however the usage patterns are different enough to warrant specialisation. Shadow intersection tests (flagged with `PATH_RAY_SHADOW_OPAQUE`) only need a bool result, but need a larger "self" payload in order to exclude hits against target lights. By specialising we can minimise the payload size in each case (which is helps performance) and avoid some dynamic branching. This PR introduces a new `scene_intersect_shadow` function which is specialised in Metal, and currently redirects to `scene_intersect` in the other backends.
- Currently `scene_intersect_local` is implemented for worst-case payload requirements as demanded by `subsurface_disk` (where `max_hits` is 4). The random_walk case only demands 1 hit result which we can retrieve directly from the intersector object (rather than stashing it in the payload). By specialising, we significantly reduce the payload size for random_walk queries, which has a big impact on performance. Additionally, we only need to use a custom intersection function for the first ray test in a random walk (for self-primitive filtering), so this PR forces faster `opaque` intersection testing for all but the first random walk test.
- Currently `scene_intersect_volume` has a lot of redundant code to handle non-triangle primitives despite volumes only being enclosed by trimeshes. This PR removes this code.
Additionally, this PR tidies up the convoluted intersection function linking code, removes some redundant intersection handlers, and uses more consistent naming of intersection functions.
On a M3 MacBook Pro, these changes give 2-3% performance increase on typical scenes with opaque trimesh materials (e.g. barbershop, classroom junkshop), but can give over 15% performance increase for certain scenes using random walk SSS (e.g. monster).
Pull Request: https://projects.blender.org/blender/blender/pulls/121397
2024-05-10 16:38:02 +02:00
|
|
|
ray_data MetalKernelContext::MetalRTIntersectionShadowAllPayload &payload,
|
2022-07-25 21:16:34 +02:00
|
|
|
const uint object,
|
|
|
|
|
const uint prim,
|
|
|
|
|
const uint type,
|
|
|
|
|
const float3 ray_P,
|
|
|
|
|
const float3 ray_D,
|
|
|
|
|
float time,
|
|
|
|
|
const float ray_tmin,
|
|
|
|
|
const float ray_tmax,
|
|
|
|
|
thread BoundingBoxIntersectionResult &result)
|
|
|
|
|
{
|
|
|
|
|
Intersection isect;
|
|
|
|
|
isect.t = ray_tmax;
|
|
|
|
|
|
|
|
|
|
MetalKernelContext context(launch_params_metal);
|
|
|
|
|
if (context.point_intersect(
|
2024-12-26 17:53:55 +01:00
|
|
|
nullptr, &isect, ray_P, ray_D, ray_tmin, isect.t, object, prim, time, type))
|
2023-09-04 16:44:27 +02:00
|
|
|
{
|
2022-07-25 21:16:34 +02:00
|
|
|
result.continue_search = metalrt_shadow_all_hit<METALRT_HIT_BOUNDING_BOX>(
|
|
|
|
|
launch_params_metal, payload, object, prim, float2(isect.u, isect.v), ray_tmax);
|
|
|
|
|
result.accept = !result.continue_search;
|
|
|
|
|
|
|
|
|
|
if (result.accept) {
|
|
|
|
|
result.distance = isect.t;
|
|
|
|
|
}
|
|
|
|
|
}
|
|
|
|
|
}
|
|
|
|
|
|
2023-09-13 11:03:43 -07:00
|
|
|
[[intersection(bounding_box,
|
|
|
|
|
triangle_data,
|
|
|
|
|
curve_data,
|
|
|
|
|
METALRT_TAGS,
|
|
|
|
|
extended_limits)]] BoundingBoxIntersectionResult
|
2022-07-25 21:16:34 +02:00
|
|
|
__intersection__point(constant KernelParamsMetal &launch_params_metal [[buffer(1)]],
|
|
|
|
|
ray_data MetalKernelContext::MetalRTIntersectionPayload &payload [[payload]],
|
2023-09-13 16:02:49 +02:00
|
|
|
const uint object [[instance_id]],
|
2022-07-25 21:16:34 +02:00
|
|
|
const uint primitive_id [[primitive_id]],
|
2023-09-13 16:02:49 +02:00
|
|
|
const uint primitive_id_offset [[user_instance_id]],
|
2022-07-25 21:16:34 +02:00
|
|
|
const float3 ray_origin [[origin]],
|
|
|
|
|
const float3 ray_direction [[direction]],
|
Cycles: MetalRT optimisations (scene_intersect_shadow + random_walk)
This PR contains optimisations and a general tidy-up of the MetalRT backend.
- Currently `scene_intersect` is used for both normal and (opaque) shadow rays, however the usage patterns are different enough to warrant specialisation. Shadow intersection tests (flagged with `PATH_RAY_SHADOW_OPAQUE`) only need a bool result, but need a larger "self" payload in order to exclude hits against target lights. By specialising we can minimise the payload size in each case (which is helps performance) and avoid some dynamic branching. This PR introduces a new `scene_intersect_shadow` function which is specialised in Metal, and currently redirects to `scene_intersect` in the other backends.
- Currently `scene_intersect_local` is implemented for worst-case payload requirements as demanded by `subsurface_disk` (where `max_hits` is 4). The random_walk case only demands 1 hit result which we can retrieve directly from the intersector object (rather than stashing it in the payload). By specialising, we significantly reduce the payload size for random_walk queries, which has a big impact on performance. Additionally, we only need to use a custom intersection function for the first ray test in a random walk (for self-primitive filtering), so this PR forces faster `opaque` intersection testing for all but the first random walk test.
- Currently `scene_intersect_volume` has a lot of redundant code to handle non-triangle primitives despite volumes only being enclosed by trimeshes. This PR removes this code.
Additionally, this PR tidies up the convoluted intersection function linking code, removes some redundant intersection handlers, and uses more consistent naming of intersection functions.
On a M3 MacBook Pro, these changes give 2-3% performance increase on typical scenes with opaque trimesh materials (e.g. barbershop, classroom junkshop), but can give over 15% performance increase for certain scenes using random walk SSS (e.g. monster).
Pull Request: https://projects.blender.org/blender/blender/pulls/121397
2024-05-10 16:38:02 +02:00
|
|
|
# if defined(__METALRT_MOTION__)
|
2023-09-13 16:02:49 +02:00
|
|
|
const float time [[time]],
|
Cycles: MetalRT optimisations (scene_intersect_shadow + random_walk)
This PR contains optimisations and a general tidy-up of the MetalRT backend.
- Currently `scene_intersect` is used for both normal and (opaque) shadow rays, however the usage patterns are different enough to warrant specialisation. Shadow intersection tests (flagged with `PATH_RAY_SHADOW_OPAQUE`) only need a bool result, but need a larger "self" payload in order to exclude hits against target lights. By specialising we can minimise the payload size in each case (which is helps performance) and avoid some dynamic branching. This PR introduces a new `scene_intersect_shadow` function which is specialised in Metal, and currently redirects to `scene_intersect` in the other backends.
- Currently `scene_intersect_local` is implemented for worst-case payload requirements as demanded by `subsurface_disk` (where `max_hits` is 4). The random_walk case only demands 1 hit result which we can retrieve directly from the intersector object (rather than stashing it in the payload). By specialising, we significantly reduce the payload size for random_walk queries, which has a big impact on performance. Additionally, we only need to use a custom intersection function for the first ray test in a random walk (for self-primitive filtering), so this PR forces faster `opaque` intersection testing for all but the first random walk test.
- Currently `scene_intersect_volume` has a lot of redundant code to handle non-triangle primitives despite volumes only being enclosed by trimeshes. This PR removes this code.
Additionally, this PR tidies up the convoluted intersection function linking code, removes some redundant intersection handlers, and uses more consistent naming of intersection functions.
On a M3 MacBook Pro, these changes give 2-3% performance increase on typical scenes with opaque trimesh materials (e.g. barbershop, classroom junkshop), but can give over 15% performance increase for certain scenes using random walk SSS (e.g. monster).
Pull Request: https://projects.blender.org/blender/blender/pulls/121397
2024-05-10 16:38:02 +02:00
|
|
|
# endif
|
2022-07-25 21:16:34 +02:00
|
|
|
const float ray_tmin [[min_distance]],
|
|
|
|
|
const float ray_tmax [[max_distance]])
|
|
|
|
|
{
|
2023-09-13 16:02:49 +02:00
|
|
|
const uint prim = primitive_id + primitive_id_offset;
|
2022-07-25 21:16:34 +02:00
|
|
|
const int type = kernel_data_fetch(objects, object).primitive_type;
|
|
|
|
|
|
|
|
|
|
BoundingBoxIntersectionResult result;
|
|
|
|
|
result.accept = false;
|
|
|
|
|
result.continue_search = true;
|
|
|
|
|
result.distance = ray_tmax;
|
|
|
|
|
|
Cycles: MetalRT optimisations (scene_intersect_shadow + random_walk)
This PR contains optimisations and a general tidy-up of the MetalRT backend.
- Currently `scene_intersect` is used for both normal and (opaque) shadow rays, however the usage patterns are different enough to warrant specialisation. Shadow intersection tests (flagged with `PATH_RAY_SHADOW_OPAQUE`) only need a bool result, but need a larger "self" payload in order to exclude hits against target lights. By specialising we can minimise the payload size in each case (which is helps performance) and avoid some dynamic branching. This PR introduces a new `scene_intersect_shadow` function which is specialised in Metal, and currently redirects to `scene_intersect` in the other backends.
- Currently `scene_intersect_local` is implemented for worst-case payload requirements as demanded by `subsurface_disk` (where `max_hits` is 4). The random_walk case only demands 1 hit result which we can retrieve directly from the intersector object (rather than stashing it in the payload). By specialising, we significantly reduce the payload size for random_walk queries, which has a big impact on performance. Additionally, we only need to use a custom intersection function for the first ray test in a random walk (for self-primitive filtering), so this PR forces faster `opaque` intersection testing for all but the first random walk test.
- Currently `scene_intersect_volume` has a lot of redundant code to handle non-triangle primitives despite volumes only being enclosed by trimeshes. This PR removes this code.
Additionally, this PR tidies up the convoluted intersection function linking code, removes some redundant intersection handlers, and uses more consistent naming of intersection functions.
On a M3 MacBook Pro, these changes give 2-3% performance increase on typical scenes with opaque trimesh materials (e.g. barbershop, classroom junkshop), but can give over 15% performance increase for certain scenes using random walk SSS (e.g. monster).
Pull Request: https://projects.blender.org/blender/blender/pulls/121397
2024-05-10 16:38:02 +02:00
|
|
|
Intersection isect;
|
|
|
|
|
isect.t = ray_tmax;
|
2024-04-30 12:56:22 +02:00
|
|
|
|
Cycles: MetalRT optimisations (scene_intersect_shadow + random_walk)
This PR contains optimisations and a general tidy-up of the MetalRT backend.
- Currently `scene_intersect` is used for both normal and (opaque) shadow rays, however the usage patterns are different enough to warrant specialisation. Shadow intersection tests (flagged with `PATH_RAY_SHADOW_OPAQUE`) only need a bool result, but need a larger "self" payload in order to exclude hits against target lights. By specialising we can minimise the payload size in each case (which is helps performance) and avoid some dynamic branching. This PR introduces a new `scene_intersect_shadow` function which is specialised in Metal, and currently redirects to `scene_intersect` in the other backends.
- Currently `scene_intersect_local` is implemented for worst-case payload requirements as demanded by `subsurface_disk` (where `max_hits` is 4). The random_walk case only demands 1 hit result which we can retrieve directly from the intersector object (rather than stashing it in the payload). By specialising, we significantly reduce the payload size for random_walk queries, which has a big impact on performance. Additionally, we only need to use a custom intersection function for the first ray test in a random walk (for self-primitive filtering), so this PR forces faster `opaque` intersection testing for all but the first random walk test.
- Currently `scene_intersect_volume` has a lot of redundant code to handle non-triangle primitives despite volumes only being enclosed by trimeshes. This PR removes this code.
Additionally, this PR tidies up the convoluted intersection function linking code, removes some redundant intersection handlers, and uses more consistent naming of intersection functions.
On a M3 MacBook Pro, these changes give 2-3% performance increase on typical scenes with opaque trimesh materials (e.g. barbershop, classroom junkshop), but can give over 15% performance increase for certain scenes using random walk SSS (e.g. monster).
Pull Request: https://projects.blender.org/blender/blender/pulls/121397
2024-05-10 16:38:02 +02:00
|
|
|
# ifndef __METALRT_MOTION__
|
|
|
|
|
const float time = 0.0f;
|
2023-09-04 16:44:27 +02:00
|
|
|
# endif
|
2024-04-30 12:56:22 +02:00
|
|
|
|
Cycles: MetalRT optimisations (scene_intersect_shadow + random_walk)
This PR contains optimisations and a general tidy-up of the MetalRT backend.
- Currently `scene_intersect` is used for both normal and (opaque) shadow rays, however the usage patterns are different enough to warrant specialisation. Shadow intersection tests (flagged with `PATH_RAY_SHADOW_OPAQUE`) only need a bool result, but need a larger "self" payload in order to exclude hits against target lights. By specialising we can minimise the payload size in each case (which is helps performance) and avoid some dynamic branching. This PR introduces a new `scene_intersect_shadow` function which is specialised in Metal, and currently redirects to `scene_intersect` in the other backends.
- Currently `scene_intersect_local` is implemented for worst-case payload requirements as demanded by `subsurface_disk` (where `max_hits` is 4). The random_walk case only demands 1 hit result which we can retrieve directly from the intersector object (rather than stashing it in the payload). By specialising, we significantly reduce the payload size for random_walk queries, which has a big impact on performance. Additionally, we only need to use a custom intersection function for the first ray test in a random walk (for self-primitive filtering), so this PR forces faster `opaque` intersection testing for all but the first random walk test.
- Currently `scene_intersect_volume` has a lot of redundant code to handle non-triangle primitives despite volumes only being enclosed by trimeshes. This PR removes this code.
Additionally, this PR tidies up the convoluted intersection function linking code, removes some redundant intersection handlers, and uses more consistent naming of intersection functions.
On a M3 MacBook Pro, these changes give 2-3% performance increase on typical scenes with opaque trimesh materials (e.g. barbershop, classroom junkshop), but can give over 15% performance increase for certain scenes using random walk SSS (e.g. monster).
Pull Request: https://projects.blender.org/blender/blender/pulls/121397
2024-05-10 16:38:02 +02:00
|
|
|
MetalKernelContext context(launch_params_metal);
|
|
|
|
|
if (context.point_intersect(
|
2024-12-26 17:53:55 +01:00
|
|
|
nullptr, &isect, ray_origin, ray_direction, ray_tmin, isect.t, object, prim, time, type))
|
Cycles: MetalRT optimisations (scene_intersect_shadow + random_walk)
This PR contains optimisations and a general tidy-up of the MetalRT backend.
- Currently `scene_intersect` is used for both normal and (opaque) shadow rays, however the usage patterns are different enough to warrant specialisation. Shadow intersection tests (flagged with `PATH_RAY_SHADOW_OPAQUE`) only need a bool result, but need a larger "self" payload in order to exclude hits against target lights. By specialising we can minimise the payload size in each case (which is helps performance) and avoid some dynamic branching. This PR introduces a new `scene_intersect_shadow` function which is specialised in Metal, and currently redirects to `scene_intersect` in the other backends.
- Currently `scene_intersect_local` is implemented for worst-case payload requirements as demanded by `subsurface_disk` (where `max_hits` is 4). The random_walk case only demands 1 hit result which we can retrieve directly from the intersector object (rather than stashing it in the payload). By specialising, we significantly reduce the payload size for random_walk queries, which has a big impact on performance. Additionally, we only need to use a custom intersection function for the first ray test in a random walk (for self-primitive filtering), so this PR forces faster `opaque` intersection testing for all but the first random walk test.
- Currently `scene_intersect_volume` has a lot of redundant code to handle non-triangle primitives despite volumes only being enclosed by trimeshes. This PR removes this code.
Additionally, this PR tidies up the convoluted intersection function linking code, removes some redundant intersection handlers, and uses more consistent naming of intersection functions.
On a M3 MacBook Pro, these changes give 2-3% performance increase on typical scenes with opaque trimesh materials (e.g. barbershop, classroom junkshop), but can give over 15% performance increase for certain scenes using random walk SSS (e.g. monster).
Pull Request: https://projects.blender.org/blender/blender/pulls/121397
2024-05-10 16:38:02 +02:00
|
|
|
{
|
|
|
|
|
result = metalrt_visibility_test<BoundingBoxIntersectionResult, METALRT_HIT_BOUNDING_BOX>(
|
|
|
|
|
launch_params_metal, payload, object, prim, isect.u);
|
|
|
|
|
if (result.accept) {
|
|
|
|
|
result.distance = isect.t;
|
|
|
|
|
}
|
|
|
|
|
}
|
2022-07-25 21:16:34 +02:00
|
|
|
return result;
|
|
|
|
|
}
|
|
|
|
|
|
Cycles: MetalRT optimisations (scene_intersect_shadow + random_walk)
This PR contains optimisations and a general tidy-up of the MetalRT backend.
- Currently `scene_intersect` is used for both normal and (opaque) shadow rays, however the usage patterns are different enough to warrant specialisation. Shadow intersection tests (flagged with `PATH_RAY_SHADOW_OPAQUE`) only need a bool result, but need a larger "self" payload in order to exclude hits against target lights. By specialising we can minimise the payload size in each case (which is helps performance) and avoid some dynamic branching. This PR introduces a new `scene_intersect_shadow` function which is specialised in Metal, and currently redirects to `scene_intersect` in the other backends.
- Currently `scene_intersect_local` is implemented for worst-case payload requirements as demanded by `subsurface_disk` (where `max_hits` is 4). The random_walk case only demands 1 hit result which we can retrieve directly from the intersector object (rather than stashing it in the payload). By specialising, we significantly reduce the payload size for random_walk queries, which has a big impact on performance. Additionally, we only need to use a custom intersection function for the first ray test in a random walk (for self-primitive filtering), so this PR forces faster `opaque` intersection testing for all but the first random walk test.
- Currently `scene_intersect_volume` has a lot of redundant code to handle non-triangle primitives despite volumes only being enclosed by trimeshes. This PR removes this code.
Additionally, this PR tidies up the convoluted intersection function linking code, removes some redundant intersection handlers, and uses more consistent naming of intersection functions.
On a M3 MacBook Pro, these changes give 2-3% performance increase on typical scenes with opaque trimesh materials (e.g. barbershop, classroom junkshop), but can give over 15% performance increase for certain scenes using random walk SSS (e.g. monster).
Pull Request: https://projects.blender.org/blender/blender/pulls/121397
2024-05-10 16:38:02 +02:00
|
|
|
# endif /* __POINTCLOUD__ */
|
|
|
|
|
|
2023-09-13 11:03:43 -07:00
|
|
|
[[intersection(bounding_box,
|
|
|
|
|
triangle_data,
|
|
|
|
|
curve_data,
|
|
|
|
|
METALRT_TAGS,
|
|
|
|
|
extended_limits)]] BoundingBoxIntersectionResult
|
2022-07-25 21:16:34 +02:00
|
|
|
__intersection__point_shadow(constant KernelParamsMetal &launch_params_metal [[buffer(1)]],
|
|
|
|
|
ray_data MetalKernelContext::MetalRTIntersectionShadowPayload &payload
|
|
|
|
|
[[payload]],
|
2023-09-13 16:02:49 +02:00
|
|
|
const uint object [[instance_id]],
|
2022-07-25 21:16:34 +02:00
|
|
|
const uint primitive_id [[primitive_id]],
|
2023-09-13 16:02:49 +02:00
|
|
|
const uint primitive_id_offset [[user_instance_id]],
|
2022-07-25 21:16:34 +02:00
|
|
|
const float3 ray_origin [[origin]],
|
|
|
|
|
const float3 ray_direction [[direction]],
|
2024-04-30 12:56:22 +02:00
|
|
|
# if defined(__METALRT_MOTION__)
|
2023-09-13 16:02:49 +02:00
|
|
|
const float time [[time]],
|
2024-04-30 12:56:22 +02:00
|
|
|
# endif
|
2022-07-25 21:16:34 +02:00
|
|
|
const float ray_tmin [[min_distance]],
|
|
|
|
|
const float ray_tmax [[max_distance]])
|
|
|
|
|
{
|
2023-09-13 16:02:49 +02:00
|
|
|
const uint prim = primitive_id + primitive_id_offset;
|
2022-07-25 21:16:34 +02:00
|
|
|
const int type = kernel_data_fetch(objects, object).primitive_type;
|
|
|
|
|
|
|
|
|
|
BoundingBoxIntersectionResult result;
|
|
|
|
|
result.accept = false;
|
|
|
|
|
result.continue_search = true;
|
|
|
|
|
result.distance = ray_tmax;
|
|
|
|
|
|
2024-04-30 12:56:22 +02:00
|
|
|
# ifdef __POINTCLOUD__
|
|
|
|
|
|
Cycles: MetalRT optimisations (scene_intersect_shadow + random_walk)
This PR contains optimisations and a general tidy-up of the MetalRT backend.
- Currently `scene_intersect` is used for both normal and (opaque) shadow rays, however the usage patterns are different enough to warrant specialisation. Shadow intersection tests (flagged with `PATH_RAY_SHADOW_OPAQUE`) only need a bool result, but need a larger "self" payload in order to exclude hits against target lights. By specialising we can minimise the payload size in each case (which is helps performance) and avoid some dynamic branching. This PR introduces a new `scene_intersect_shadow` function which is specialised in Metal, and currently redirects to `scene_intersect` in the other backends.
- Currently `scene_intersect_local` is implemented for worst-case payload requirements as demanded by `subsurface_disk` (where `max_hits` is 4). The random_walk case only demands 1 hit result which we can retrieve directly from the intersector object (rather than stashing it in the payload). By specialising, we significantly reduce the payload size for random_walk queries, which has a big impact on performance. Additionally, we only need to use a custom intersection function for the first ray test in a random walk (for self-primitive filtering), so this PR forces faster `opaque` intersection testing for all but the first random walk test.
- Currently `scene_intersect_volume` has a lot of redundant code to handle non-triangle primitives despite volumes only being enclosed by trimeshes. This PR removes this code.
Additionally, this PR tidies up the convoluted intersection function linking code, removes some redundant intersection handlers, and uses more consistent naming of intersection functions.
On a M3 MacBook Pro, these changes give 2-3% performance increase on typical scenes with opaque trimesh materials (e.g. barbershop, classroom junkshop), but can give over 15% performance increase for certain scenes using random walk SSS (e.g. monster).
Pull Request: https://projects.blender.org/blender/blender/pulls/121397
2024-05-10 16:38:02 +02:00
|
|
|
Intersection isect;
|
|
|
|
|
isect.t = ray_tmax;
|
|
|
|
|
|
|
|
|
|
# ifndef __METALRT_MOTION__
|
|
|
|
|
const float time = 0.0f;
|
|
|
|
|
# endif
|
|
|
|
|
|
|
|
|
|
MetalKernelContext context(launch_params_metal);
|
|
|
|
|
if (context.point_intersect(
|
2024-12-26 17:53:55 +01:00
|
|
|
nullptr, &isect, ray_origin, ray_direction, ray_tmin, isect.t, object, prim, time, type))
|
Cycles: MetalRT optimisations (scene_intersect_shadow + random_walk)
This PR contains optimisations and a general tidy-up of the MetalRT backend.
- Currently `scene_intersect` is used for both normal and (opaque) shadow rays, however the usage patterns are different enough to warrant specialisation. Shadow intersection tests (flagged with `PATH_RAY_SHADOW_OPAQUE`) only need a bool result, but need a larger "self" payload in order to exclude hits against target lights. By specialising we can minimise the payload size in each case (which is helps performance) and avoid some dynamic branching. This PR introduces a new `scene_intersect_shadow` function which is specialised in Metal, and currently redirects to `scene_intersect` in the other backends.
- Currently `scene_intersect_local` is implemented for worst-case payload requirements as demanded by `subsurface_disk` (where `max_hits` is 4). The random_walk case only demands 1 hit result which we can retrieve directly from the intersector object (rather than stashing it in the payload). By specialising, we significantly reduce the payload size for random_walk queries, which has a big impact on performance. Additionally, we only need to use a custom intersection function for the first ray test in a random walk (for self-primitive filtering), so this PR forces faster `opaque` intersection testing for all but the first random walk test.
- Currently `scene_intersect_volume` has a lot of redundant code to handle non-triangle primitives despite volumes only being enclosed by trimeshes. This PR removes this code.
Additionally, this PR tidies up the convoluted intersection function linking code, removes some redundant intersection handlers, and uses more consistent naming of intersection functions.
On a M3 MacBook Pro, these changes give 2-3% performance increase on typical scenes with opaque trimesh materials (e.g. barbershop, classroom junkshop), but can give over 15% performance increase for certain scenes using random walk SSS (e.g. monster).
Pull Request: https://projects.blender.org/blender/blender/pulls/121397
2024-05-10 16:38:02 +02:00
|
|
|
{
|
|
|
|
|
result =
|
|
|
|
|
metalrt_visibility_test_shadow<BoundingBoxIntersectionResult, METALRT_HIT_BOUNDING_BOX>(
|
|
|
|
|
launch_params_metal, payload, object, prim, isect.u);
|
|
|
|
|
if (result.accept) {
|
|
|
|
|
result.distance = isect.t;
|
|
|
|
|
}
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
# endif /* __POINTCLOUD__ */
|
|
|
|
|
|
|
|
|
|
return result;
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
[[intersection(bounding_box,
|
|
|
|
|
triangle_data,
|
|
|
|
|
curve_data,
|
|
|
|
|
METALRT_TAGS,
|
|
|
|
|
extended_limits)]] BoundingBoxIntersectionResult
|
|
|
|
|
__intersection__point_shadow_all(
|
|
|
|
|
constant KernelParamsMetal &launch_params_metal [[buffer(1)]],
|
|
|
|
|
ray_data MetalKernelContext::MetalRTIntersectionShadowAllPayload &payload [[payload]],
|
|
|
|
|
const uint object [[instance_id]],
|
|
|
|
|
const uint primitive_id [[primitive_id]],
|
|
|
|
|
const uint primitive_id_offset [[user_instance_id]],
|
|
|
|
|
const float3 ray_origin [[origin]],
|
|
|
|
|
const float3 ray_direction [[direction]],
|
|
|
|
|
# if defined(__METALRT_MOTION__)
|
|
|
|
|
const float time [[time]],
|
|
|
|
|
# endif
|
|
|
|
|
const float ray_tmin [[min_distance]],
|
|
|
|
|
const float ray_tmax [[max_distance]])
|
|
|
|
|
{
|
|
|
|
|
const uint prim = primitive_id + primitive_id_offset;
|
|
|
|
|
const int type = kernel_data_fetch(objects, object).primitive_type;
|
|
|
|
|
|
|
|
|
|
BoundingBoxIntersectionResult result;
|
|
|
|
|
result.accept = false;
|
|
|
|
|
result.continue_search = true;
|
|
|
|
|
result.distance = ray_tmax;
|
|
|
|
|
|
|
|
|
|
# ifdef __POINTCLOUD__
|
|
|
|
|
|
|
|
|
|
metalrt_intersection_point_shadow_all(launch_params_metal,
|
|
|
|
|
payload,
|
|
|
|
|
object,
|
|
|
|
|
prim,
|
|
|
|
|
type,
|
|
|
|
|
ray_origin,
|
|
|
|
|
ray_direction,
|
2023-09-04 16:44:27 +02:00
|
|
|
# if defined(__METALRT_MOTION__)
|
Cycles: MetalRT optimisations (scene_intersect_shadow + random_walk)
This PR contains optimisations and a general tidy-up of the MetalRT backend.
- Currently `scene_intersect` is used for both normal and (opaque) shadow rays, however the usage patterns are different enough to warrant specialisation. Shadow intersection tests (flagged with `PATH_RAY_SHADOW_OPAQUE`) only need a bool result, but need a larger "self" payload in order to exclude hits against target lights. By specialising we can minimise the payload size in each case (which is helps performance) and avoid some dynamic branching. This PR introduces a new `scene_intersect_shadow` function which is specialised in Metal, and currently redirects to `scene_intersect` in the other backends.
- Currently `scene_intersect_local` is implemented for worst-case payload requirements as demanded by `subsurface_disk` (where `max_hits` is 4). The random_walk case only demands 1 hit result which we can retrieve directly from the intersector object (rather than stashing it in the payload). By specialising, we significantly reduce the payload size for random_walk queries, which has a big impact on performance. Additionally, we only need to use a custom intersection function for the first ray test in a random walk (for self-primitive filtering), so this PR forces faster `opaque` intersection testing for all but the first random walk test.
- Currently `scene_intersect_volume` has a lot of redundant code to handle non-triangle primitives despite volumes only being enclosed by trimeshes. This PR removes this code.
Additionally, this PR tidies up the convoluted intersection function linking code, removes some redundant intersection handlers, and uses more consistent naming of intersection functions.
On a M3 MacBook Pro, these changes give 2-3% performance increase on typical scenes with opaque trimesh materials (e.g. barbershop, classroom junkshop), but can give over 15% performance increase for certain scenes using random walk SSS (e.g. monster).
Pull Request: https://projects.blender.org/blender/blender/pulls/121397
2024-05-10 16:38:02 +02:00
|
|
|
time,
|
2023-09-04 16:44:27 +02:00
|
|
|
# else
|
Cycles: MetalRT optimisations (scene_intersect_shadow + random_walk)
This PR contains optimisations and a general tidy-up of the MetalRT backend.
- Currently `scene_intersect` is used for both normal and (opaque) shadow rays, however the usage patterns are different enough to warrant specialisation. Shadow intersection tests (flagged with `PATH_RAY_SHADOW_OPAQUE`) only need a bool result, but need a larger "self" payload in order to exclude hits against target lights. By specialising we can minimise the payload size in each case (which is helps performance) and avoid some dynamic branching. This PR introduces a new `scene_intersect_shadow` function which is specialised in Metal, and currently redirects to `scene_intersect` in the other backends.
- Currently `scene_intersect_local` is implemented for worst-case payload requirements as demanded by `subsurface_disk` (where `max_hits` is 4). The random_walk case only demands 1 hit result which we can retrieve directly from the intersector object (rather than stashing it in the payload). By specialising, we significantly reduce the payload size for random_walk queries, which has a big impact on performance. Additionally, we only need to use a custom intersection function for the first ray test in a random walk (for self-primitive filtering), so this PR forces faster `opaque` intersection testing for all but the first random walk test.
- Currently `scene_intersect_volume` has a lot of redundant code to handle non-triangle primitives despite volumes only being enclosed by trimeshes. This PR removes this code.
Additionally, this PR tidies up the convoluted intersection function linking code, removes some redundant intersection handlers, and uses more consistent naming of intersection functions.
On a M3 MacBook Pro, these changes give 2-3% performance increase on typical scenes with opaque trimesh materials (e.g. barbershop, classroom junkshop), but can give over 15% performance increase for certain scenes using random walk SSS (e.g. monster).
Pull Request: https://projects.blender.org/blender/blender/pulls/121397
2024-05-10 16:38:02 +02:00
|
|
|
0.0f,
|
2023-09-04 16:44:27 +02:00
|
|
|
# endif
|
Cycles: MetalRT optimisations (scene_intersect_shadow + random_walk)
This PR contains optimisations and a general tidy-up of the MetalRT backend.
- Currently `scene_intersect` is used for both normal and (opaque) shadow rays, however the usage patterns are different enough to warrant specialisation. Shadow intersection tests (flagged with `PATH_RAY_SHADOW_OPAQUE`) only need a bool result, but need a larger "self" payload in order to exclude hits against target lights. By specialising we can minimise the payload size in each case (which is helps performance) and avoid some dynamic branching. This PR introduces a new `scene_intersect_shadow` function which is specialised in Metal, and currently redirects to `scene_intersect` in the other backends.
- Currently `scene_intersect_local` is implemented for worst-case payload requirements as demanded by `subsurface_disk` (where `max_hits` is 4). The random_walk case only demands 1 hit result which we can retrieve directly from the intersector object (rather than stashing it in the payload). By specialising, we significantly reduce the payload size for random_walk queries, which has a big impact on performance. Additionally, we only need to use a custom intersection function for the first ray test in a random walk (for self-primitive filtering), so this PR forces faster `opaque` intersection testing for all but the first random walk test.
- Currently `scene_intersect_volume` has a lot of redundant code to handle non-triangle primitives despite volumes only being enclosed by trimeshes. This PR removes this code.
Additionally, this PR tidies up the convoluted intersection function linking code, removes some redundant intersection handlers, and uses more consistent naming of intersection functions.
On a M3 MacBook Pro, these changes give 2-3% performance increase on typical scenes with opaque trimesh materials (e.g. barbershop, classroom junkshop), but can give over 15% performance increase for certain scenes using random walk SSS (e.g. monster).
Pull Request: https://projects.blender.org/blender/blender/pulls/121397
2024-05-10 16:38:02 +02:00
|
|
|
ray_tmin,
|
|
|
|
|
ray_tmax,
|
|
|
|
|
result);
|
2022-07-25 21:16:34 +02:00
|
|
|
|
2024-04-30 12:56:22 +02:00
|
|
|
# endif /* __POINTCLOUD__ */
|
|
|
|
|
|
2022-07-25 21:16:34 +02:00
|
|
|
return result;
|
|
|
|
|
}
|
2024-04-30 12:56:22 +02:00
|
|
|
|
|
|
|
|
#endif /* __METALRT__ */
|