This PR fixes the (currently unused) scene-based selective feature compilation macros. These feature based macros haven't been used for a few years, and enabling them currently results in compilation errors.
The only functional change in this PR is in geom/primitive.h where undef-ing `__HAIR__` had exposed an inconsistency in how pointcloud attributes were being fetched. Using the more general `primitive_surface_attribute_float4` (instead of `curve_attribute_float4`) fixed a compilation error that occurred when rendering pointcloud unit test scenes with adaptive compilation enabled.
Pull Request: https://projects.blender.org/blender/blender/pulls/121216
fdc2962beb indirectly introduced a change
in inlining (light_tree_pdf started getting inlined) that led to a 5-10%
drop in performance for most scenes.
Dropping the noinline keyword for oneAPI device recovers it.
It however brings another performance regression to MNEE and Raytrace
kernels, that we'll look into separately.
This commit updates all defines, compiler flags and cleans up some code for unused CPU capabilities.
There should be no functional change, unless it's run on a CPU that supports sse41 but not sse42. It will fallback to the SSE2 kernel in this case.
In preparation for the new SSE4.2 minimum in Blender 4.2.
Pull Request: https://projects.blender.org/blender/blender/pulls/118043
Along with the 4.1 libraries upgrade, we are bumping the clang-format
version from 8-12 to 17. This affects quite a few files.
If not already the case, you may consider pointing your IDE to the
clang-format binary bundled with the Blender precompiled libraries.
The NanoVDB headers are not compatible with Metal due to missing address
space qualifiers. We currently have a big patch for NanoVDB header
files, which is difficult to update for OpenVDB 11. Instead extract a
few hundred lines of code from NanoVDB to do just what we need.
Pull Request: https://projects.blender.org/blender/blender/pulls/115992
This makes the GPU tricubic implementation more efficient. The dense
grid code implemented this in terms of trilinear lookups that are
hardware accelerated, but for NanoVDB this just causes unnecessary voxel
reads. Instead match the CPU code.
Pull Request: https://projects.blender.org/blender/blender/pulls/115992
OpenImageDenoise V2 comes with GPU support for various backends. This adds a new class, OIDNDenoiserGPU, in order to add this functionality into the existing Cycles post processing pipeline without having to change it much. OptiX and OIDN CPU denoising remain as they are. Rendering on a supported Intel GPU will automatically select the GPU denoiser.
Device support is initially limited to the oneAPI devices that are supported by Cycles, but can be extended.
Ref #115045
Co-authored-by: Stefan Werner <stefan.werner@intel.com>
Co-authored-by: Ray Molenkamp <github@lazydodo.com>
Pull Request: https://projects.blender.org/blender/blender/pulls/108314
The first problem was triangles with motion blur were all grouped into
one category without separating the ones with and without triangle
motion steps.
The second problem was HIP RT uses the generic motion triangle
intersection function and this function checks prim_visibility buffer.
HIP RT doesn't provide the buffer per primitive but passes it to HIP RT
core per instance.
The buffer name was changed to prim_visibility from visibility to be
the same as what Cycles uses but when the motion triangle intersection
function is called from HIP RT kernels, the instance id is passed to
the function instead of primitive id.
Pull Request: https://projects.blender.org/blender/blender/pulls/114555
The last good commit was 8474716abb.
After this commits from main were pushed to blender-v4.0-release. These are
being reverted.
Commits a4880576dc from to b26f176d1a that happend afterwards were meant for
4.0, and their contents is preserved.
Speckles and missing lights were experienced in scenes with Nishita Sky
Texture and a Sun Size smaller than 1.5°, such as in Lone Monk and Attic
scenes.
We previously worked around these by using a more precise
software implementation of cosine.
After recent changes in Cycles, it turns out this workaround isn't
currently needed.
This PR fixes T39823, the sole failing unit test when running with MetalRT. It does so by implementing and binding a missing intersection handler (`__anyhit__cycles_metalrt_volume_test_tri`) which is required for `scene_intersect_volume` (as used by `integrator_volume_stack_update_for_subsurface`) to work as intended. This scene exposed the error as it uses subsurface scattering on a sphere which is intersected by volume.
Pull Request: https://projects.blender.org/blender/blender/pulls/112876
This patch updates the experimental MetalRT code path to use new [curve primitives](https://developer.apple.com/videos/play/wwdc2023/10128/) which were recently added in macOS 14. This replaces the previous custom box intersection implementation, allowing the driver to better optimise curve acceleration structures for the GPU. On existing hardware, this can speed up MetalRT renders by up to 40% for scenes that use hair / curve primitives extensively.
The MetalRT option will only be available on macOS >= 14, and requires Xcode >= 15 to build (otherwise the option will be compiled out).
Authored by Marco Giordano, Michael Jones, and Jason Fielder
---
Before / after render times (M1 Max MacBook Pro, macOS 14 beta, MetalRT enabled):
```
Custom box intersection MetalRT curve primitives Speedup
fishy_cat 111.5 80.5 1.39
koro 114.4 86.7 1.32
sinosauropteryx 291.8 279.2 1.05
spring 142.3 142.2 1.00
victor 442.7 347.7 1.27
```
---
Pull Request: https://projects.blender.org/blender/blender/pulls/111795
The API for the kernels library is defined, there is no need to
export more than that. This change only affects linux since hidden
visiblity is the default on Windows.
Use the common BVH utilities header for this.
Added a special type qualifier ccl_ray_data which is defined to ccl_private
for all platforms but Metal. On Metal it is defined to ray_data.
The tricky part is that the BVH utilities are wrapped into the Metal context
class. In some of the BVH functions the context has been already constructed,
but it wasn't done in all the callbacks.
From a quick render tests of the Junkshop benchmark scene there is no render
time difference,
No functional changes are expected.
Pull Request: https://projects.blender.org/blender/blender/pulls/111967
<algorithm> header include is missing from some sycl headers, this will
be fixed upstream with https://github.com/intel/llvm/pull/10424,
meanwhile, we work around it by including it directly.
For repeat / extend / mirror mode, both wrap and read_clip functions did
the bounds check. Removing it improves performance between 0.5% and 1.5%
in the classroom scene in one test. Clip mode is unchanged.
Pull Request: https://projects.blender.org/blender/blender/pulls/109304