For repeat / extend / mirror mode, both wrap and read_clip functions did
the bounds check. Removing it improves performance between 0.5% and 1.5%
in the classroom scene in one test. Clip mode is unchanged.
Pull Request: https://projects.blender.org/blender/blender/pulls/109304
In Embree, tfar modification is taken into account by rtcIntersect1
only when hits are accepted. In order to overcome this, we now check
manually for a max_t value in the filter function.
Pull Request: https://projects.blender.org/blender/blender/pulls/108706
We should be recording only the N closest hits in case the number of
hits is exceeding the maximum allowed or the size of the hits stack.
Previously, some cases made it record hits beyond the furthest recorded
one due to lack of hit distance check.
NanoVDB headers have unused code using "double" type, which is not supported on Arc GPUs.
Recent DPC++ changes enforced runtime verifications:
7663dc201d
which prevents execution when such type has been present even if unused.
This is a solution to avoid double to be compiled at all, similar as how it is done for Metal.
HIP RT enables AMD hardware ray tracing on RDNA2 and above, and falls back to a
to shader implementation for older graphics cards. It offers an average 25%
sample rendering rate improvement in Cycles benchmarks, on a W6800 card.
The ray tracing feature functions are accessed through HIP RT SDK, available on
GPUOpen. HIP RT traversal functionality is pre-compiled in bitcode format and
shipped with the SDK.
This is not yet enabled as there are issues to be resolved, but landing the
code now makes testing and further changes easier.
Known limitations:
* Not working yet with current public AMD drivers.
* Visual artifact in motion blur.
* One of the buffers allocated for traversal has a static size. Allocating it
dynamically would reduce memory usage.
* This is for Windows only currently, no Linux support.
Co-authored-by: Brecht Van Lommel <brecht@blender.org>
Ref #105538
This patch fixes the failing shader/bevel unit test when MetalRT is enabled. The ray was being transformed into local object space even when the SD_OBJECT_TRANSFORM_APPLIED flag was set.
Pull Request: https://projects.blender.org/blender/blender/pulls/107292
When running oneAPI with AoT binaries, on hardware that's not compatible with
these, recompilation could have been missing from the kernels loading phase and
happen during execution instead.
These changes fixes it, any kernel compilation will now happen during the
kernels loading phase.
Updated Embree 4 library with GPU support is required for it to be
compiled - compatiblity with Embree 3 and Embree 4 without GPU support
is maintained.
Enabling hardware raytracing is an opt-in user setting for now.
Pull Request: https://projects.blender.org/blender/blender/pulls/106266
This patch replaces `dispatchThreadgroups` with `dispatchThreads` which takes care of non-uniform threadgroup bounds. This allows us to remove the bounds guards in the integrator kernel entry points.
Pull Request: https://projects.blender.org/blender/blender/pulls/106217
This patch fixes a MetalRT issue where viable shadow hits are discounted based on the false assumption that hits are ordered by distance. With this patch, the following unit tests now pass:
- openvdb smoke
- shadow catcher pt transparent lamp only 0.8
- shadow catcher pt transparent lamp only 1.0
Pull Request: https://projects.blender.org/blender/blender/pulls/106276
To improve mesh upload speeds and reduce the size of the scene data which allows larger scenes to be rendered.
The meshes in Cycles are currently stored as flattened meshes, where each triangle is stored as a set of 3 vertices. Unflattening writes out the vertices in a list according to the index buffer. This uses a lot of memory and for current hardware does not provide a noticeable benefit. This change unflattens the mesh by directly using the meshes vertex and index buffers directly and skips the unflattening. This change allows for larger scenes and also a reduction in the sizes of the meshes. Further it results in a decrease the amount of time it takes to upload the data to a GPU. This is especially important for when multiple GPUs are used in a single machine.
Pull Request #105173
Contributed by Yulia Kuznetcova at Apple.
NanoVDB is patched to give add address spaces required by Metal. We hope that
in the future Metal will support the generic address space.
For AMD and Intel this is currently not available since it causes a performance
regression also on scenes without volumes.
Pull Request #104837
This patch optimises subsurface intersection queries on MetalRT. Currently intersect_local traverses from the scene root, retrospectively discarding all non-local hits. Using a lookup of bottom level acceleration structures, we can explicitly query only the relevant instance. On M1 Max, with MetalRT selected, this can give a render speedup of 15-20% for scenes like Monster which make heavy use of subsurface scattering.
Patch authored by Marco Giordano.
Reviewed By: brecht
Differential Revision: https://developer.blender.org/D17153
This patch adds two new kernels: SORT_BUCKET_PASS and SORT_WRITE_PASS. These replace PREFIX_SUM and SORTED_PATHS_ARRAY on supported devices (currently implemented on Metal, but will be trivial to enable on the other backends). The new kernels exploit sort partitioning (see D15331) by sorting each partition separately using local atomics. This can give an overall render speedup of 2-3% depending on architecture. As before, we fall back to the original non-partitioned sorting when the shader count is "too high".
Reviewed By: brecht
Differential Revision: https://developer.blender.org/D16909
Also minor changes in comments:
- Reference BLENDER_HISTORY_FILE instead of the literal file-name
(simplifies looking up usage).
- Use usernames in tags, as noted in code-style.
Speckles and missing lights were experienced in scenes with Nishita Sky
Texture and a Sun Size smaller than 1.5°, such as in Lone Monk and Attic
scenes.
Increasing the precision of cosf fixes it.
While keeping SSE2, SSE4.1 and AVX2. This does not affect hardware support, it
only slightly reduces performance for some older CPUs.
To reduce maintenance cost and improve compile times.
Differential Revision: https://developer.blender.org/D16978