This commit introduces proper handling of ROCm 5 and ROCm 6 runtimes on
Linux, based on the version of the ROCm compiler used at build time.
Previously, HIPEW (the HIP equivalent of Cuda Wrangler) defaulted to
loading the ROCm 5 runtime. If ROCm 5 was unavailable, it would attempt
to load ROCm 6. However, ROCm 6 introduces changes in certain
structures and functions that are not backward compatible, leading to
potential issues when kernels compiled with the ROCm 6 compiler are
executed on the ROCm 5 runtime.
### Summary of Changes:
**Separation of Structures and Functions:**
Structures and functions are now separated into hipew5 and hipew6 to
accommodate the differences between ROCm versions.
**Build-Time Version Detection:**
The ROCm version is determined during build time, and the corresponding
hipew5 or hipew6 is included accordingly.
**Runtime Default to ROCm 6:**
By default, HIPEW now loads the ROCm 6 runtime and
includes hipew6 (Linux only).
**JIT Compilation Behavior:**
Since ROCm 6 is the default version, JIT compilation is supported only
when the ROCm 6 compiler is detected at runtime.
**HIP-RT Update:**
HIP-RT has been updated to load the ROCm 6 runtime by default.
These changes ensure compatibility and stability when switching
between ROCm versions, avoiding issues caused by runtime
and compiler mismatches.
Co-authored-by: Alaska <alaskayou01@gmail.com>
Co-authored-by: Sergey Sharybin <sergey@blender.org>
Pull Request: https://projects.blender.org/blender/blender/pulls/130153
As mentioned in e072853e63, fb6ac24514 and 9a6beb915d,
`file_draw_preview()` is a rather overloaded and confusing function. I'm
trying to make it more readable.
This splits off the drawing for the loading icon displayed while the
previews are pending/loading still, removing the loading case handling
from `file_draw_preview()`. There was also some implicit logic here:
While loading previews we'd always pass a "special image" to the preview
drawing, so the `is_special_file_image` boolean would always be true.
This is untangled too now, so code paths are more explicit/clear.
`file_add_preview_drag_but()` can't access data returned by
`file_draw_preview()` anymore (it may not be called), so I made it
independent which is an improvement too.
While working on this I noticed the loading icon isn't centered
correctly. For now I made sure the position remains the same, I'll fix
the positioning in a followup.
This changes the `Node.type` and `Node.bl_static_type` properties to be string
instead of enum properties. This allows us to remove another usage of
`NOD_static_types.h`.
Both of these properties were marked as deprecated for a long time already, but
without any way to inform users in a practical way. The result of that is that
especially the `type` property is widely to check if a node has a specific type.
It's used so much that it is impractical to remove it even if it was deprecated.
Instead this patch rephrases these properties as "legacy" (instead of
"deprecated"). This means that they will stay around and won't change in
behavior for existing nodes. For future nodes, we can just return the idname
instead of yet another name specific to these properties so that we can stop
worrying about them.
Pull Request: https://projects.blender.org/blender/blender/pulls/131972
The `NodeCategory` system is not used by Blender anymore for a couple years
already. Instead of using an additional abstraction layer, we now just use a
normal menu. One benefit of the `NodeCategory` system was that it could be used
to populate the search. However, the search is now directly populated from the
menu anyway.
Fixes#115746.
Pull Request: https://projects.blender.org/blender/blender/pulls/132021
The type check there should not be necessary anymore nowadays. It looks like it
might have been necessary when it was introduced in
eed45d2a23. Back then the object was still passed
into `BKE_mesh_wrapper_ensure_subdivision`.
Pull Request: https://projects.blender.org/blender/blender/pulls/131857
How the new navigation pivot is determined depends a bit on the kind of brush:
* Brushes that deform or remove curves use the 3d-brush position at the start of
the brush.
* Brushes that add new curves set the pivot to the bounding box center of the
new curves.
Finding a good pivot point is not super trivial for curves, but the existing 3d
brush functionality seems to work well. This also has the benefit that almost no
additional computation is needed when the user is using the spherical brush
mode. However, if the projected mode is used, and orbit-around-selection is on,
then we have to compute the spherical brush center now anyway.
Pull Request: https://projects.blender.org/blender/blender/pulls/131907
The issue here was sometimes an output socket of a `Group Input` node has a
reference to a data-block. The value stored on these output sockets are never
used, and thus is not exposed in the UI which made it impossible for the user to
find that there still is a data-block reference.
The root cause for this seems to have been fixed a few releases ago. I can
reproduce that the pointer was set in 3.3, but not in 3.6.
This patch only adds some versioning to remove the unnecessary data-block
references to fix old files that might have this issue (e.g. the file from the
report).
Pull Request: https://projects.blender.org/blender/blender/pulls/131900
The autokeying code for cameras used the keyingset code to insert keys.
In the case of "Only Insert Available" turned on this would use the "Available" keyingset.
However, in the case of looking through the camera and moving the viewport
when the camera is not active, the poll function of that keyingset would return false.
Instead of modifying the poll function, the fix is to use the more direct keying code
using `RNAPath`.
This can be backported to 4.2 but not 3.6 due to the changes to the keying code done in 4.0
Pull Request: https://projects.blender.org/blender/blender/pulls/131796
Previously, calling `clear()` on `Map`, `Set` or `VectorSet` would remove all
elements but did not free the already allocated capacity. This is fine in most
cases, but has very bad and non-obvious worst-case behavior as can be seen in
#131793. The issue is that having a huge hash table with only very few elements
is inefficient when having to iterate over it (e.g. when clearing).
There used to be a `clear_and_shrink()` method to avoid this worst-case
behavior. However, it's not obvious that this should be used to improve
performance.
This patch changes the behavior of `clear` to what `clear_and_shrink` did before
to avoid accidentally running in worst-case behavior. The old behavior is still
available with the name `clear_and_keep_capacity`. This is more efficient if
it's known that the hash-table is filled with approximately the same number of
elements or more again.
The main annoying aspect from an API perspective is that for `Vector`, the
default behavior of `clear` is and should stay to not free the memory. `Vector`
does not have the same worst-case behavior when there is a lot of unused
capacity (besides taking up memory), because the extra memory is never looked
at. `std::vector::clear` also does not free the memory, so that's the expected
behavior. While this patch introduces an inconsistency between `Vector` and
`Map/Set/VectorSet` with regards to freeing memory, it makes them more
consistent in that `clear` is the better default when reusing the data-structure
repeatedly.
I went over existing uses of `clear` to see if any of them should be changed to
`clear_and_keep_capacity`. None of them seemed to really benefit from that or
showed that it was impossible to get into the worst-case scenario. Therefore,
this patch slightly changes the behavior of these calls (only performance wise,
semantics are exactly the same).
Pull Request: https://projects.blender.org/blender/blender/pulls/131852
Blender can be started headless. In that case GHOST will use
GHOST_SystemHeadless. This system only supported OpenGL. This change
will add support for Vulkan. For users this allows to render using
Vulkan without the need of X11 or Wayland.
This should fix the cause why Vulkan tests are crashing on build-bot. They
will not work reliable due to a threading issue which is still in investigation.
Pull Request: https://projects.blender.org/blender/blender/pulls/131682
Both the draw manager and gpu backend used the same compilation
directive for enablement. This PR seperates them into
`WITH_GPU_DRAW_TESTS` for draw manager related tests and
`WITH_GPU_BACKEND_TESTS` for gpu backend related tests.
Pull Request: https://projects.blender.org/blender/blender/pulls/132018
Blender stores all pipelines in a pool. Using a hash it checks if a
the pipeline was already created and the previous could be reused. Due
to performance issues when working with graphics pipelines some equal
operations only used a hash check. For scissors and viewports this isn't
enough and could lead to issues.
This PR fixes this to still perform an exact check if the hash are
equal. Note that the performance drops a bit. And should be countered
with other performance improvements in the future.
Pull Request: https://projects.blender.org/blender/blender/pulls/132005
With the HIP-RT BVH on AMD GPUs, instances that have undergone
two sets of transformations will not render properly.
This manifests as:
- Incorrect mesh normals
- Improperly positioned, scaled, or rotated meshes
- Missing intersections
This commit adds a test for this issue to make it easier to test,
and so we can hopefully catch similar issues if we ever add more
BVH options in the future.
Original report: blender/blender#117567
Ref: blender/blender-test-data!29
Pull Request: https://projects.blender.org/blender/blender/pulls/131352
These should have a default size/pos, just like stencil itself.
This came up in #131836 (and probably led to asset essential brushes all
having this wrong -- which in turn will not draw stencil masks for them
in the viewport).
NOTE: without those defaults, resetting the brush would also have this
issue.
For further steps to actually fix fully, please refer to #131836.
Pull Request: https://projects.blender.org/blender/blender/pulls/131848
Similar to toolbar, when spacebar is mapped to `play`, use `shift +
spacebar` for asset-shelf popup. When spacebar mapped `toolbar`, invoke
asset shelf in paint modes. When `spacebar=search`, do not map any key
for asset-shelf popup
Co-authored-by: Julian Eisel
Pull Request: https://projects.blender.org/blender/blender/pulls/131351
When a scene contains distant lights and local lights, the first step
of the light tree traversal is to compute the importance of
distant lights vs local lights and pick one based on a random number.
In the specific case of when there is only one distant light,
the line of code that had been changed in this commit
effectively reduced to:
`min_importance = fast_cosf(x) < cosf(x) ? 0.0 : compute_min_importance`
And depending on the hardware, compiler, and the specific value being
tested, different configurations could take different code paths.
This commit fixes this issue by turning the comparison into
`fast_cosf(x) < fast_cosf(x)`.
---
Why does `cos_theta_plus_theta_u < cosf(bcone.theta_e - bcone.theta_o)`
reduce to `fast_cos(x) < cos(x)` in this specific case?
- `cos_theta_plus_theta_u` is computed as
`cos_theta * cos_theta_u - sin_theta * sin_theta_u`
- `cos_theta` is always 1.0 in the case of a single distant light.
- `cos_theta_u` is computed earlier as `fast_cosf(theta_e)` in
`distant_light_tree_parameters()`
- `sin_theta` is zero, and so that side of the equation doesn't matter.
This reduces `cos_theta_plus_theta_u` to `fast_cosf(theta_e)`.
`cosf(bcone.theta_e - bcone.theta_o)` reduces to `cosf(bcone.theta_e)`
because for the case of a single distant light `theta_o` is always 0.
Pull Request: https://projects.blender.org/blender/blender/pulls/131932
By now it is just a "compositor", so move the files one folder up.
Things that were under realtime_compositor/intern move into
already existing intern folder.
Pull Request: https://projects.blender.org/blender/blender/pulls/132004
This patch optimizes the Step mode of the Dilate node to use the van
Herk/Gil-Werman algorithm which runs in constant time compared to the
current linear time algorithm currently in use. This is an order of
magnitude faster for reasonably large structuring elements.
Only CPU is implemented in this patch, while GPU will be implemented in
a separate patch.
Pull Request: https://projects.blender.org/blender/blender/pulls/131798
This patch adds compile-time optimizations where the operation inputs
are guaranteed to be non-single values. Pixel load methods now take an
optional template parameter CouldBeSingle, which is false by default. If
the input is not guaranteed to be single, it needs to be set to true.
Gives up to 3x improvement in some nodes.
OptiX OSL tests were previously disabled due to a GPU driver bug
resulting in many tests failing unexpectedly.
The new driver version is now out with the fix so we can now enable
OptiX OSL testing.
This commit also updates the OptiX OSL block list with better comments,
and more tests that are known to fail and need investigating.
Ref: #123012
Pull Request: https://projects.blender.org/blender/blender/pulls/129280
Until the newer Hair Curves system can fully replace particle hair, add
a small test to ensure this continues to work.
Since the hair is exported as cubic bspline curves, we can also use this
same file to test bspline import now too.
Pull Request: https://projects.blender.org/blender/blender/pulls/131997
On Linux, Cycles HIP has a JIT compilation feature.
This feature is used when Cycles can not find a precompiled kernel
for your GPU. Which is most common when using hardware that wasn't
out at the time that a version of Blender was released.
There were various issues with this JIT compilation system, this commit
aims to solve them. The changes include:
- Enable `WITH_NANOVDB` when Blender is built with NanoVDB.
- This fixes a issue where VDB objects would not render.
- Enable some extra debug options for developers when desired
(This is so we match the CUDA implementation of the same feature).
- Reduce the optimizaiton level from -O3 to the default.
- This is to avoid any extra issues that may occur as a result
of an increase optimization level that isn't tested with
precompiled kernels.
- Reduce the optimization level even further to -O1 for Vega.
- This was done on precompiled kernels to work around some issues,
so I decided to apply it to JIT kernels as well.
- Note: Although Vega is not officially supported, this may help
people that unofficially use Vega.
- Added some previously missing compiler arguments and fixed errors that
were introduced when enabling these compiler arguments.
- Fixed a issue where JIT compilation would fail if Blener was
installed in a path that had a space in it.
Pull Request: https://projects.blender.org/blender/blender/pulls/131853
Area "close" operation is actually an area merge of some area into the
one being closed. This means that screen->active_region will be
pointing at deallocated RAM. Normally not noticed because active_region
is set very quickly to the new area, but error nonetheless and noticed
by ASAN. This PR sets the screen->active_region to null when merges
change the active area.
Pull Request: https://projects.blender.org/blender/blender/pulls/131994
This applies upstream PR 13328 to our copy of dpcpp, which enables
building dpcpp on a many core box. I Upgraded my build env and
ran into this issue.
No rebuilds required, build time fix only.
While adding tests I found that metaball export has been broken since
Blender 3.4. It would export each metaball geometry twice.
This looks to have been a side effect of a change to `object_dupli.cc`
which no longer sets the `no_draw` flag for metaballs[1]. With the flag
unset we would end up visiting this particular object twice.
Use a direct check for Metaballs now and add test coverage for the
scenario in general.
[1] eaa87101cd
Pull Request: https://projects.blender.org/blender/blender/pulls/131984
The new `--disable-depsgraph-on-file-load` commandline option, when used
together with the `--background` or `--command` ones, will prevent
building a depsgraph immediately after loading a blendfile.
The goal is to improve performances of batch-processing of blendfiles by
python scripts. It is intended to become the default behavior in Blender
5.0.
Scripts requiring evaluated data then need to explicitly ensure that
an evaluated depsgraph is available (e.g. by calling
`depsgraph = context.evaluated_depsgraph_get()`).
------
This disables the call to `wm_event_do_depsgraph` in `wm_file_read_post`.
Some quick performances tests:
* The whole `blendfile_versioning` tests gain about 2% speedup. These are
almost always small and simple blendfiles.
* Loading a Gold production file however goes from 26.5s to 3.5s (almost
90% faster) when this new option is specified.
Pull Request: https://projects.blender.org/blender/blender/pulls/131978
Previously, the number of material slots on the geometry (e.g. mesh) was the
ground truth. However, this had limitations in the case when the object had more
material slots than the evaluated geometry. All extra slots on the object were
ignored.
This patch changes the definition so that the number of materials used for
rendering is the maximum of the number of material slots on the geometry and on
the object. This also implies that one always needs a reference to an object
when determining that number, but that was fairly straight forward to achieve in
current code.
This patch also cleans up the material count handling a fair amount by using the
`BKE_object_material_*_eval` API more consistently instead of manually accessing
`totcol`. Cycles uses the the same API indirectly through RNA.
Pull Request: https://projects.blender.org/blender/blender/pulls/131869
As mentioned in fb6ac24514 and 9a6beb915d, `file_draw_preview()` is a
rather overloaded and confusing function. I'm trying to make it more
readable.
Split out file indicator icon drawing from the preview drawing function,
there's not much reason for it to be there as well. I rather keep
functions a bit simpler and more manageable.
Also added some comments and tried to make logic a bit more clear.
As mentioned in 9a6beb915d, `file_draw_preview()` is a rather
overloaded and confusing function. I'm trying to make it more readable.
Move image scale calculations to a separate function, reducing perceived
complexity of the function and the number of local variables. Also
rename or remove some variables to be more clear, add comments and move
variables closer to where they are used.
The GCC version on the buildbot does not support attribute on
a class member, resulting in the following warning:
NOD_node_declaration.hh:577:42: warning: ‘maybe_unused’ attribute ignored [-Wattributes]
Use the `UNUSED_VARS` macro instead to solve the original warning
about member being unused in release builds without introducing
a warning when using older compiler.
Pull Request: https://projects.blender.org/blender/blender/pulls/131974
Render tests can still fail. This change will disable them until they
are in a better shape. Reduces confusion when running cycles GPU render
tests.
Known issues:
- Render in batch can take forever due to a locking issue
- Headless rendering is still in development
- Particle hair rendering is broken.
Pull Request: https://projects.blender.org/blender/blender/pulls/131964
The type info table for VSE modifiers was initialized to point
to global variables on first use. But really there's no reason to do
that, we can just declare the actual table instead. This is both
shorter, and avoids dances with preprocessor (INIT_TYPE macro).
Pull Request: https://projects.blender.org/blender/blender/pulls/131958
`file_draw_preview()` does multiple things and is quite hard to follow
already, it needs some improvents. One issue is naming that I always
found made the function unnecessarily confusing. For example `is_icon`
had nothing to do with the `icon` parameter, you'd have to search around
the code a bit to understand what it was actually representing.
Attempt to make variable and function names more clear.
Also reduce variable scope and add a comment.
When baking line art strokes, the object matrix that are used for back
transformation was inverted. Should be `world_to_object` instead of
`object_to_world`. Probably a typo during GPv3 rewrite.
The compositor leaks memory when the node tree contains unavailable
links. That's because the compositor doesn't ignore those links when
computing the reference counts for outputs. To fix this, check if the
output is logically linked and return 0 in case it isn't.
This adds initial support for ReBAR capable platforms.
It ensures that when allocating buffers that should not be host visible, still
tries to allocate in host visible memory. When there is space in this memory
heap the buffer will be automatically mapped to host memory.
When mapped staging buffers can be skipped when the buffer was newly
created. In order to make better usage of ReBAR the `VKBuffer::create`
function will need to be revisit. It currently hides to much options to allocate
in the correct memory heap. This change isn't part of this PR.
Using shader_balls.blend rendering the first 50 frames in main takes 1516ms.
When using ReBAR it takes 1416ms.
```
Operating system: Linux-6.8.0-49-generic-x86_64-with-glibc2.39 64 Bits, X11 UI
Graphics card: AMD Radeon Pro W7700 (RADV NAVI32) Advanced Micro Devices radv Mesa 24.3.1 - kisak-mesa PPA Vulkan Backend
```
Pull Request: https://projects.blender.org/blender/blender/pulls/131856
Change from ababc2e01b did not actually behave in a way that the
caller can force-disable overlays [which seems like the intention from
the commit message and also desired behavior for e.g. grease pencil
drawing/reprojecion].
Pull Request: https://projects.blender.org/blender/blender/pulls/131861