Check was misc-const-correctness, combined with readability-isolate-declaration
as suggested by the docs.
Temporarily clang-format "QualifierAlignment: Left" was used to get consistency
with the prevailing order of keywords.
Pull Request: https://projects.blender.org/blender/blender/pulls/132361
* Use .empty() and .data()
* Use nullptr instead of 0
* No else after return
* Simple class member initialization
* Add override for virtual methods
* Include C++ instead of C headers
* Remove some unused includes
* Use default constructors
* Always use braces
* Consistent names in definition and declaration
* Change typedef to using
Pull Request: https://projects.blender.org/blender/blender/pulls/132361
This commit introduces proper handling of ROCm 5 and ROCm 6 runtimes on
Linux, based on the version of the ROCm compiler used at build time.
Previously, HIPEW (the HIP equivalent of Cuda Wrangler) defaulted to
loading the ROCm 5 runtime. If ROCm 5 was unavailable, it would attempt
to load ROCm 6. However, ROCm 6 introduces changes in certain
structures and functions that are not backward compatible, leading to
potential issues when kernels compiled with the ROCm 6 compiler are
executed on the ROCm 5 runtime.
### Summary of Changes:
**Separation of Structures and Functions:**
Structures and functions are now separated into hipew5 and hipew6 to
accommodate the differences between ROCm versions.
**Build-Time Version Detection:**
The ROCm version is determined during build time, and the corresponding
hipew5 or hipew6 is included accordingly.
**Runtime Default to ROCm 6:**
By default, HIPEW now loads the ROCm 6 runtime and
includes hipew6 (Linux only).
**JIT Compilation Behavior:**
Since ROCm 6 is the default version, JIT compilation is supported only
when the ROCm 6 compiler is detected at runtime.
**HIP-RT Update:**
HIP-RT has been updated to load the ROCm 6 runtime by default.
These changes ensure compatibility and stability when switching
between ROCm versions, avoiding issues caused by runtime
and compiler mismatches.
Co-authored-by: Alaska <alaskayou01@gmail.com>
Co-authored-by: Sergey Sharybin <sergey@blender.org>
Pull Request: https://projects.blender.org/blender/blender/pulls/130153
On Linux, Cycles HIP has a JIT compilation feature.
This feature is used when Cycles can not find a precompiled kernel
for your GPU. Which is most common when using hardware that wasn't
out at the time that a version of Blender was released.
There were various issues with this JIT compilation system, this commit
aims to solve them. The changes include:
- Enable `WITH_NANOVDB` when Blender is built with NanoVDB.
- This fixes a issue where VDB objects would not render.
- Enable some extra debug options for developers when desired
(This is so we match the CUDA implementation of the same feature).
- Reduce the optimizaiton level from -O3 to the default.
- This is to avoid any extra issues that may occur as a result
of an increase optimization level that isn't tested with
precompiled kernels.
- Reduce the optimization level even further to -O1 for Vega.
- This was done on precompiled kernels to work around some issues,
so I decided to apply it to JIT kernels as well.
- Note: Although Vega is not officially supported, this may help
people that unofficially use Vega.
- Added some previously missing compiler arguments and fixed errors that
were introduced when enabling these compiler arguments.
- Fixed a issue where JIT compilation would fail if Blener was
installed in a path that had a space in it.
Pull Request: https://projects.blender.org/blender/blender/pulls/131853
Based on #123377 by @brecht, but Gitea doesn't like the rebase these
so here's a new PR.
The purpose here is to switch to fused OptiX programs for OSL execution
on CUDA. On the one hand, this makes the code easier since, but there's
also another advantage - how memory allocation is managed.
OSL shaders need memory to store intermediate values, but how much is
needed depends on the complexity of the shader. With the split program
approach, Cycles had to provide that memory, so we had to allocate a
certain amount (2 KiB, to be precise) statically and show an error if
the shader would need more. If the shader used less (which is the case
for the vast majority), the memory was just wasted.
By switching to fused kernels, OSL knows the required amount during JIT
codegen, so it can allocate only what's required, which avoids this
waste. One still needs to set a maximum, and in theory, OSL would also
support spilling over into a Cycles-provided alternative memory region.
However, we currently don't implement that - instead, we default to the
same 2048 limit as before and let advanced users override it via the
CYCLES_OSL_GROUPDATA_ALLOC environment variable if really needed.
Co-authored-by: Brecht Van Lommel <brecht@blender.org>
Pull Request: https://projects.blender.org/blender/blender/pulls/130149
The function table symbol declared in the headers was renamed starting
in OptiX 8.1, from `g_optixFunctionTable` to
`g_optixFunctionTable_<ABI version>`. This adds support for that by
using the new macro for the name when available (after OptiX 8.1) and
falling back to the old name when it is not (before OptiX 8.1).
Pull Request: https://projects.blender.org/blender/blender/pulls/130451
Don't try to use MetalRT by default unless the device explicitly reports that RT is supported. We shouldn't just rely on an assumption that it's supported for M3 and beyond, ad infinitum.
Pull Request: https://projects.blender.org/blender/blender/pulls/129688
Running Xcode memory graphs and the Instruments tools revealed
memory leaks caused, in the main, by over-retained objects.
This removes the unnecessary 'retains' and adds some asserts
to guard against over-retaining in the future.
There are a few memory leaks remaining involving PyUnicode_DecodeUTF8
but I am unable to identify the cause of these at this time.
Authored by Apple: James McCarthy
Pull Request: https://projects.blender.org/blender/blender/pulls/129117
Changing 3D curve properties while viewport rendering was active
resulted in an error, because Cycles would attempt to update the
acceleration structure containing the curves, but that acceleration
structure was built without the
`OPTIX_BUILD_FLAG_ALLOW_UPDATE` flag allowing updates. This
fixes that by adding the flag to all curve build inputs.
Ideally could just use the same flags as for other build inputs and
differentiate between viewport and final rendering (based on
`bvh_type`), but that's not currently an option since the same flags
have to be specified to query the curve intersection module in
`load_kernels()`, where that differentiation is not known. See also
commit 5c6053ccb1.
Pull Request: https://projects.blender.org/blender/blender/pulls/129634
This commit removes support for Vega GPUs from the AMD HIP backend of
Cycles. This is being done as:
- AMD no longer provides official support for Vega GPUs in their
ROCm software.
- Vega GPUs have rendering artifacts on all supported platforms,
and as a result of the reduction of support from AMD, are unlikely
to be fixed. Rendering artifacts include.
- The incorrect shading of volumes (Windows and Linux)
- Missing intersections on many meshes with HIPRT
- Crashing rendering subsurface scattering materials (Linux)
- And more.
Pull Request: https://projects.blender.org/blender/blender/pulls/129523
Previously, in case of a failure during BVH transfer, when running out
of memory for example, we could get an error such as "BVH failed to
migrate to the GPU due to Embree library error (no error)", because
embree error status was actually reset before being queried.
This commit fixes its propagation.
Pull Request: https://projects.blender.org/blender/blender/pulls/129022
Turns out it is possible to have code to pick up wrong class
when defining a friend:
```
intern\cycles\device/memory.h(255): warning C4099: 'GPUDevice': type name first seen using 'struct' now seen using 'class'
source\blender\gpu\GPU_platform.hh(69): note: see declaration of 'GPUDevice'
```
Now made it so the classes have forward declaration in the CCL
namespace, avoiding possible conflict with the classes with the
same name in the global namespace.
Pull Request: https://projects.blender.org/blender/blender/pulls/128485
Happens for renders from command line, when kernel specialization
thread is still working after the allocators on the Blender side
have been deinitialized.
Add an explicit deinitializaiton, which ensures all Cycles worker
and cache threads are finished before the allocators are deinitialized.
This should solve occasional crashes when running regression tests
for Metal or Metal-RT.
Pull Request: https://projects.blender.org/blender/blender/pulls/128239
This change switches Cycles to an opensource HIP-RT library which
implements hardware ray-tracing. This library is now used on
both Windows and Linux. While there should be no noticeable changes
on Windows, on Linux this adds support for hardware ray-tracing on
AMD GPUs.
The majority of the change is typical platform code to add new
library to the dependency builder, and a change in the way how
ahead-of-time (AoT) kernels are compiled. There are changes in
Cycles itself, but they are rather straightforward: some APIs
changed in the opensource version of the library.
There are a couple of extra files which are needed for this to
work: hiprt02003_6.1_amd.hipfb and oro_compiled_kernels.hipfb.
There are some assumptions in the HIP-RT library about how they
are available. Currently they follow the same rule as AoT
kernels for oneAPI:
- On Windows they are next to blender.exe
- On Linux they are in the lib/ folder
Performance comparison on Ubuntu 22.04.5:
```
GPU: AMD Radeon PRO W7800
Driver: amdgpu-install_6.1.60103-1_all.deb
main hip-rt
attic 0.1414s 0.0932s
barbershop_interior 0.1563s 0.1258s
bistro 0.2134s 0.1597s
bmw27 0.0119s 0.0099s
classroom 0.1006s 0.0803s
fishy_cat 0.0248s 0.0178s
junkshop 0.0916s 0.0713s
koro 0.0589s 0.0720s
monster 0.0435s 0.0385s
pabellon 0.0543s 0.0391s
sponza 0.0223s 0.0180s
spring 0.1026s 1.5145s
victor 0.1901s 0.1239s
wdas_cloud 0.1153s 0.1125s
```
Co-authored-by: Brecht Van Lommel <brecht@blender.org>
Co-authored-by: Ray Molenkamp <github@lazydodo.com>
Co-authored-by: Sergey Sharybin <sergey@blender.org>
Pull Request: https://projects.blender.org/blender/blender/pulls/121050
Deforming motion blurred point clouds do not render in Cycles
HIP-RT when BVH timesteps != 0 if Blender is launched with
debug memory.
The root cause is that the size of allocated memory for the
bounding boxes is reported to HIP-RT not the number of valid
bounding boxes.
Pull Request: https://projects.blender.org/blender/blender/pulls/127432
IGC 1.0.17384, ocloc 24.31.30508, which:
- add support for Battlemage and Lunar Lake GPUs
- recover from recent performance regression on Linux
- allow to drop older work-around
(9d5164d472) and need for a patched
version on Windows
- ocloc now needs "dg2,mtl" naming for fat binaries.
opencl-clang patches don't get applied anymore by igc build scripts
when llvm is not a git repository, hence I could also drop we can drop
current patch disabling patching.
I've only slightly pushed min-driver-version updates after carefull
testing, instead of jumping to the same version as ocloc as we use to.
Pull Request: https://projects.blender.org/blender/blender/pulls/127251
This new version of the graphics compiler solves a performance
regression on Arc, adds support for Battlemage and Lunar Lake GPUs, and
allows to drop older patch to build fat binaries with broad
compatibility.
This latter change requires using -device dg2,mtl naming instead of
passing architecture ids.
Pull Request: https://projects.blender.org/blender/blender/pulls/127371
The device code was disabled for primitives with deformation blur
and the intersection function always returned false, hence no
rendered primitive.
Other than that, there were a few bugs on both device and host codes
(e.g., the order of current and previous times and the primitive name.)
Pull Request: https://projects.blender.org/blender/blender/pulls/127163
Fixes a few issues with point clouds with HIPRT.
1. Crashing when building the BLAS due to an incorrect sized array.
2. A typo leading to all point cloud intersections being skipped.
3. A typo leading to some motion blurred point clouds rendering
as if they were stationary, or not rendering at all.
Pointclouds, with deformable motion blur, with BVH time steps set to >0
still do not render. Curves seem to have the same issue.
Ref #125086
Pull Request: https://projects.blender.org/blender/blender/pulls/125834
Some of the device memory objects had their host_pointer overwritten
with another CPU-side buffer after allocation. This leads to a leak of
host memory allocated by the device_memory.
There are few remaining places where the host_pointer is assigned and
those seems to be fine because the memory was not yet allocated with
a alloc() call.
While the approach in this change is not very ideal, it is small and
potentially could be ported to the LTS tracks. More ideal solution
would be to utilize device_vector::give_data().
Pull Request: https://projects.blender.org/blender/blender/pulls/126788
OptiX has accepted Catmull-Rom curve data natively since OptiX 7.4, but due to the previous conversion to B-Spline code, the format that data is fed to OptiX wasn't optimal.
Each curve segment was put in the vertex buffer as four independent control points, even though continuous segments actually share control points between each other. This patch compacts that so shared control points only occur once in the vertex buffer.
This compact form uses less memory and also allows OptiX to easily identify segments that belong together into a curve (those where the step between indices is one).
Pull Request: https://projects.blender.org/blender/blender/pulls/125899
The GPU packed state is a static check from the Cycles core perspective,
and it is disabled for non-Apple Silicon GPUs. However, the Metal kernel
always used packed integrator.
This change makes it so the Host and Device side checks for the Host CPU
are aligned, and that Device-side packed state check does not differ from
the Host side.
Pull Request: https://projects.blender.org/blender/blender/pulls/126082
Fixes a crash that can occur if motion blur was on, there is a
deforming mesh in the scene with deformable motion blur turned on,
with BVH time steps set >0.
Render results in my test scene appear to match CPU Embree.
Pull Request: https://projects.blender.org/blender/blender/pulls/125854