HIP-RT device:
- Add missing flags from the common flags query to the final compiler options
- Switch logging utility from printf to LOG_INFO_IMPORTANT
- Remove redundant compiler options already covered by common flags
HIP device:
- Add compiler command to logging
- Update C++ standard to C++17 to resolve compiler warnings
Pull Request: https://projects.blender.org/blender/blender/pulls/145284
Experiments have shown that the OptiX denoiser performs best when
operating on images that have their origin at the top-left corner,
while Blender renders with the origin at the bottom-left corner.
Simply flipping the image vertically before and after denoising is a
relatively trivial operation, so this patch introduces this as an
additional preprocessing and postprocessing step for denoising when the
OptiX denoiser is used. Additionally, this patch also removes an unused
helper function, now that OptiX 8.0 is the minimum.
Pull Request: https://projects.blender.org/blender/blender/pulls/145358
There are several Driver versions which are constructing the wrong,
semantically, version which would force Blender to decline the Intel
device for oneAPI backend usage, based on this. Unfortunately,
the upstream fix is taking a long time to be finally delivered to
the distros and end-users, so it is better if Blender will detect
this wrong version string and parse it properly, allowing these
devices to be used - as the wrong driver version string is the only
issue here, besides this the driver functionality is fine.
Pull Request: https://projects.blender.org/blender/blender/pulls/145658
This pull request removes ROCm 5 code path and adds ROCm 7 runtime to
library search list.
ROCm 5 runtime is no longer shipped with AMD drivers, and ROCm 5 compiler
is no longer compatible with newer driver versions.
It also adds ROCm 7 runtime to the list of runtime libraries to look for.
Starting later this year, ROCm 7 runtime will be bundled with the driver
installer, and all future runtime fixes and improvements will target ROCm 7.
Once ROCm 7 runtime is rolled out, ROCm 6 compiler will continue to work
with it for about a year as a transitional measure. Beyond that, compatibility
is not guaranteed.
Pull Request: https://projects.blender.org/blender/blender/pulls/145279
This re-applies pull request #140671, but with a fix for #144713 where the
non-pointer part of IntegratorStateGPU was not initialized.
This PR is a more extensive follow on from #123551 (removal of AMD and Intel
GPU support).
All supported Apple GPUs have Metal 3 and tier 2 argument buffer support.
The invariant resource properties `gpuAddress` and `gpuResourceID` can be
written directly into GPU structs once at setup time rather than once per
dispatch. More background info can be found in this article:
https://developer.apple.com/documentation/metal/improving-cpu-performance-by-using-argument-buffers?language=objc
Code changes:
- All code relating to `MTLArgumentEncoder` is removed
- `KernelParamsMetal` updates are directly written into
`id<MTLBuffer> launch_params_buffer` which is used for the "static"
dispatch arguments
- Dynamic dispatch arguments are small enough to be encoded using the
`MTLComputeCommandEncoder.setBytes` function, eliminating the need for
cycling temporary arg buffers
Fix#144713
Co-authored-by: Brecht Van Lommel <brecht@noreply.localhost>
Pull Request: https://projects.blender.org/blender/blender/pulls/145175
Currently, it was discovered that in the case of several different
Intel dGPUs being present in the system, the experimental L0 copy
optimization does not work correctly in the Intel Driver, which is
causing crashes in the driver and Blender application. So, to avoid
this situation and restore functionality on these platforms,
a workaround was added to disable this extension from being used if
such a configuration is detected. In the future, when this problem is
fully fixed in all Intel Drivers, this workaround can be removed from
the Blender source code to restore some performance that was lost on
configurations of several dGPUs because of this workaround.
Pull Request: https://projects.blender.org/blender/blender/pulls/144262
Guide the probability to scatter in or transmit through the volume.
Only applied for primary rays.
Co-authored-by: Brecht Van Lommel <brecht@blender.org>
Add new "Linear 3D Curves" option in the Curves panel in the render
properties. This renders curves as linear segments rather than smooth
curves, for faster render time at the cost of accuracy.
On NVIDIA Blackwell GPUs, this can give a 6x speedup compared to smooth
curves, due to hardware acceleration. On NVIDIA Ada there is still
a 3x speedup, and CPU and other GPU backends will also render this
faster.
A difference with smooth curves is that these have end caps, as this
was simpler to implement and they are usually helpful anyway.
In the future this functionality will also be used to properly support
the CURVE_TYPE_POLY on the new curves object.
Pull Request: https://projects.blender.org/blender/blender/pulls/139735
This fixes issues when using Embree on mutliple GPUs.
A previous workaround used separate contexts, this one now
lets us keep a single context for all GPUs.
Pull Request: https://projects.blender.org/blender/blender/pulls/143089
Currently MetalRT renders always use extended limits, which is needed to correctly render scenes where the max primitive count can exceed 2^28 or the instance count can exceed 2^24. This patch adopts Metal best practices of only enabling this flag if it is needed.
This PR is similar to #133364, but there are some notable differences:
1) The old PR made an overly optimistic assumption that all the relevant visibility bits could be squeezed into 8 bits. This new PR adopts the same approach that Optix takes of using 8 bits as a primary HW filter, and checking the full 32 bit mask inside the SW intersection handler.
~~2) I moved the scene scanning check from Scene into MetalDevice. This avoids platform specific details leaking into platform agnostic areas.~~
~~3) In live viewport mode, we always use extended limits in case we tip over the threshold.~~
_EDIT:_
2) The limits are scanned in `Scene::update_kernel_features`, and given to the device by a new `set_bvh_limits` method which returns true if the BVH and kernels need to be reloaded.
Pull Request: https://projects.blender.org/blender/blender/pulls/142401
This improves the existing scene specialisation mechanism by replacing "kernel_data.kernel_features" with a function constant. It doesn't cause any additional compilation requests, but allows the backend compiler to eliminate more dead code. An additional compiler hint is provided for dead-stripping "volume_stack_enter_exit" which results in slightly faster rendering of non-volumetric scenes.
Pull Request: https://projects.blender.org/blender/blender/pulls/142235
These changes introduce modifications to the SYCL queue creation
in OneapiDevice::create_queue. In case several DPC++ devices are
detected by Blender and exposed through it, we are now creating
a new SYCL context for each device, which allows us to prevent
execution failures due to some known issues in the DPC++ runtime
regarding multi GPU support. As this would have some small
performance impact, few percents, it is only applied to
multi GPU configurations, while the behavior for a single
GPU configuration remains the same.
Pull Request: https://projects.blender.org/blender/blender/pulls/141834
On systems with multiple Intel GPUs with a mix of recent and old
unsupported drivers (such as 101.3302), the Level-Zero stack may have
troubles initializing, leading to a crash while enumerating devices.
Luckily this condition actually leads to an exception we can catch,
as implemented here in this commit.
Pull Request: https://projects.blender.org/blender/blender/pulls/141674
All GPU backends now support NanoVDB, using our own kernel side code
that is easily portable. This simplifies kernel and device code.
Volume bounds are now built from the NanoVDB grid instead of OpenVDB,
to avoid having to keep around the OpenVDB grid after loading.
While this reduces memory usage, it does have a performance impact,
particularly for the Cubic filter. That will be addressed by
another commit.
Pull Request: https://projects.blender.org/blender/blender/pulls/132908
This PR is a more extensive follow on from #123551 (removal of AMD and Intel GPU support).
All supported Apple GPUs have Metal 3 and tier 2 argument buffer support. The invariant resource properties `gpuAddress` and `gpuResourceID` can be written directly into GPU structs once at setup time rather than once per dispatch. More background info can be found in [this article](https://developer.apple.com/documentation/metal/improving-cpu-performance-by-using-argument-buffers?language=objc).
Code changes:
- All code relating to `MTLArgumentEncoder` is removed
- `KernelParamsMetal` updates are directly written into `id<MTLBuffer> launch_params_buffer` which is used for the "static" dispatch arguments
- Dynamic dispatch arguments are small enough to be encoded using the `MTLComputeCommandEncoder.setBytes` function, eliminating the need for cycling temporary arg buffers
Pull Request: https://projects.blender.org/blender/blender/pulls/140671
When running Blender with `--debug-cycles` and the right
verbosity level, Cycles can output "GPU Queue Stats" to the terminal
at the end of rendering detailing how much time was spent in each kernel.
The Metal GPU backend did not support this specific way of gathering the
information. This commit fixes that by implementing support to the Metal
GPU backend.
Note: This kind of information could already be gathered for the Metal
GPU backend using the `CYCLES_METAL_PROFILING` environment variable,
and this is still the recommended way of gathering that information for
Metal. This change is just to add some consistency between platforms.
Pull Request: https://projects.blender.org/blender/blender/pulls/137592