After the recent HIP SDK 6.3 update on Windows, the minimum GPU driver
required to use HIP in Cycles has increased.
This commit increases the required driver version listed in the UI and
adds a check to avoid showing HIP devices if they're below a certain
driver version number as they don't work properly.
Pull Request: https://projects.blender.org/blender/blender/pulls/134965
Previously point cloud rendering was disabled on the HIPRT backend due
to unexpected performance regressions introduce by it.
With the recent update to HIP SDK 6.3 and HIPRT 2.5, these performance
regressions have been resolved and so this commit re-enables
point cloud rendering on HIPRT.
Pull Request: https://projects.blender.org/blender/blender/pulls/134902
This change brings the following improvements on the user level
- Support of GPUs with gfx12 architecture
- New HIP-RT library which in addition to the gfx12 support brings
various bug-fixes.
The known limitation of gfx12 is that OpenImageDenoiser does not yet
support this GPU architecture. This means that while Cycles will use the
full advantage of the gfx12 (including hardware accelerated ray-tracing),
denoising will only be possible on CPU, or secondary gfx11 or below GPU.
This is something that requires a change in OIDN and it is to late to do
it for Blender 4.4, but it is something to look forward for Blender 4.5.
The gfx12 changes for the pre-compiled kernels is rather trivial,
so it comes together (in the same PR) as the bigger HIP-RT change.
On the development side this change brings the following improvements:
- One step compile and link (much simpler CMake rules)
- Embedding BVH binaries in hiprt dll (which makes it easier to package
and load, without relying on special path configuration)
Co-authored-by: Sahar Kashi <sahar.kashi@amd.com>
Co-authored-by: Sergey Sharybin <sergey@blender.org>
Co-authored-by: Brecht Van Lommel <brecht@blender.org>
Pull Request: https://projects.blender.org/blender/blender/pulls/133129
There is a bug in Embree that makes BVH updates crash. Disabling multithreaded
BVH updates after the initial BVH build appears to work around it, at the cost
of some performance.
This will not affect performance of the initial BVH build, transforming objects
or editing a single mesh. It will only affect performance when multiple smaller
meshes are edited together, as those can no longer have their BVH updated in
parallel or benefit from parallellization over many primitives.
Pull Request: https://projects.blender.org/blender/blender/pulls/134747
The original report stumbled upon this issue with a more tricky
configuration when light linking is combined with light tress.
However, the actual contributing factor was a mesh with emission
shader which is not assigned to any triangles. This triggered a
bug in the BoundBox::transformed() which converted non-valid bounds
to bounds by performing per-corner growing.
Additionally fix incorrect handling of shared nodes which only
worked for leaf nodes. This was due to the fact how the measure
was accumulated: it is possible that add() is called with an empty
measure.
Pull Request: https://projects.blender.org/blender/blender/pulls/134699
The issue here is that the individual components of TypeDesc are 8-bit each,
but the code was applying a 4-bit mask. It just so happens that the only
case where this matters is TypeDesc::MATRIX44, since its value is 16.
In Cycles lights can be given a light falloff node to control their
light falloff.
This worked by multiplying the light's strength by different
combinations of the ray length, which would be FLT_MAX for
distant lights. This resulting in almost every configuration of the
light falloff node overflowing when used on distant lights, which is
undesirable.
This commit fixes this issue by ignoring most of the functions of the
light falloff node when used on a distant light.
And in the process fixes a small discrepancy between SVM and OSL when
using the light falloff node on distant lights.
Pull Request: https://projects.blender.org/blender/blender/pulls/134539
free_memory queries were disabled due to runtime driver issues on Linux
when using jemalloc.
compute-runtime introduced a fix for these issues with
8527779778
which is part of versions 31740 and higher, and matches the currently
required min-driver version, so we can restore this feature.
Pull Request: https://projects.blender.org/blender/blender/pulls/134542
This commit fixes a issue where ray depth for emissive objects
(E.g. Lights) was incorrect when using the ray depth output of the
light path node in Cycles OSL.
Pull Request: https://projects.blender.org/blender/blender/pulls/134496
* Do oneAPI copy optimization as part of host memory alloc and free, so
it is properly released before host memory is freed.
* Synchronize after loading texture info, like CUDA and HIP.
https://projects.blender.org/blender/blender/pulls/134412
This may be used for device to do host memory allocation in a way that
is more efficient for copy the host memory to the device.
Also rename and group device memory allocation functions for clarity.
Pull Request: https://projects.blender.org/blender/blender/pulls/134412
The issue was caused by c0ba800f64.
Simple solution for now: check the data size and free the memory to
allow the device memory to be re-allocated. Seems the safest for the
upcoming release.
Ideally we'd need to avoid having these manual tricks with the device
memory pointers.
Pull Request: https://projects.blender.org/blender/blender/pulls/134516
The current usage of software-based texture operations in
the oneAPI implementation puts additional register pressure on
the GPU compiler during register allocation. And it also creates
code that requires maintenance. This commit is intended to address
this situation by utilizing a recently productized SYCL bindless
texture API to enable HW-based texture operations using
Intel GPUs' hardware sampler.
This currently translates to 1-11% rendering speedups (scene-specific)
on my Arc A770 and Arc B580. At the moment, there are small
performance regressions with NanoVDB texture operations on Arc B580
and small performance regressions in shade surface MNEE and Raytrace
kernels on Arc A770, but they look recoverable and will be handled
in the future.
Pull Request: https://projects.blender.org/blender/blender/pulls/133457
There is now a non-experimental API for this_work_item functionality, so
let's use it for better code quality and also to avoid the deprecation
warning during compilation.
No functional or performance changes are expected.
Pull Request: https://projects.blender.org/blender/blender/pulls/133472
The issue also happens on macOS 15.3.
This is a Metal driver bug, a fix is coming in macOS 15.4. Until then
disable refitting the viewport. There is no perceptible benefit from
refitting, so while it might be less that ideal it allows to side step
the problem and still benefit from the HWRT.
Pull Request: https://projects.blender.org/blender/blender/pulls/134399
This is not a complete solution because there may be indirect changes
to the compiler other than the binary that require a rebuild, but this
should catch the simple cases at least.
Enables building of a Cubin for GPUs based on Blackwell architecture
if CUDA toolkit version 12.8 or higher is installed.
Only added sm_120 to the default set, since it is the one relevant for
consumer GPUs (RTX 5090 etc.) that are generally used with Blender.
Pull Request: https://projects.blender.org/blender/blender/pulls/134170
This commit adds the `light:random` attribute to OSL, allowing the
object info node to now match between SVM and OSL when using the
random output on a light.
Pull Request: https://projects.blender.org/blender/blender/pulls/134095
The derivatives of the normal were simply not computed.
The offsetted normals are computed by perturbating the barycentric
coordinates. At triangle boundaries, the normals are extrapolated,
so discontinuities might be visible.
Currently only supported on triangles.
Pull Request: https://projects.blender.org/blender/blender/pulls/133769
Use sub-pixel differentials for bump mapping helps with reducing
artifacts when objects are moving or when textures have high frequency
details.
Currently we scale it by 0.1 because it seems to work good in practice,
we can adjust the value in the future if it turns out to be impractical.
Ref: #122892
Pull Request: https://projects.blender.org/blender/blender/pulls/133991
- Rename dx/dy -> dfdx/dfdy to match the actual computed quantity
- Add template functions to compute dfdx/dfdy on triangles for sharing
among different data types
- Add documentation to some functions
- Some code shuffling that makes it easier to scale dfdx/dfdy in the
future
- Some other trivial changes
The issue here is that originally, the step count for the geometry's
motion and the object transform's motion were tied together, so a
single variable is used to store that step count.
However, when using the velocity attribute, it's possible for the step
counts to differ, which will lead to an incorrect interpolated object
transform in the kernel.
Pull Request: https://projects.blender.org/blender/blender/pulls/133788
That version has a bunch of API changes, so by dropping support for older
versions we can remove old compatibility code.
Also, that version is required for OptiX support, so building a fully-featured
Cycles wasn't possible with older OSL anyways.
Pull Request: https://projects.blender.org/blender/blender/pulls/133746
Regardless of what mem info reports. We can't move this to the host, so
might as well try because the free memory might not be a reliable predictor
of success.
Pull Request: https://projects.blender.org/blender/blender/pulls/132912
With host mapped memory these can be shared, and we can't get back the
original host pointer unless we make a copy which is inefficient.
Also add asserts to verify this doesn't happen.
Pull Request: https://projects.blender.org/blender/blender/pulls/132912
This was not thread safe. And it's better to do them one by one to avoid
moving more than is needed, when another thread already freed up enough.
Thanks to Jorn Visser for investigating and finding this problem.
Pull Request: https://projects.blender.org/blender/blender/pulls/132912
If one of the devices already used host happed memory but another not,
it would previously realloc both.
Thanks to Jorn Visser for investigating and finding this problem.
Pull Request: https://projects.blender.org/blender/blender/pulls/132912