Both the draw manager and gpu backend used the same compilation
directive for enablement. This PR seperates them into
`WITH_GPU_DRAW_TESTS` for draw manager related tests and
`WITH_GPU_BACKEND_TESTS` for gpu backend related tests.
Pull Request: https://projects.blender.org/blender/blender/pulls/132018
Blender stores all pipelines in a pool. Using a hash it checks if a
the pipeline was already created and the previous could be reused. Due
to performance issues when working with graphics pipelines some equal
operations only used a hash check. For scissors and viewports this isn't
enough and could lead to issues.
This PR fixes this to still perform an exact check if the hash are
equal. Note that the performance drops a bit. And should be countered
with other performance improvements in the future.
Pull Request: https://projects.blender.org/blender/blender/pulls/132005
With the HIP-RT BVH on AMD GPUs, instances that have undergone
two sets of transformations will not render properly.
This manifests as:
- Incorrect mesh normals
- Improperly positioned, scaled, or rotated meshes
- Missing intersections
This commit adds a test for this issue to make it easier to test,
and so we can hopefully catch similar issues if we ever add more
BVH options in the future.
Original report: blender/blender#117567
Ref: blender/blender-test-data!29
Pull Request: https://projects.blender.org/blender/blender/pulls/131352
These should have a default size/pos, just like stencil itself.
This came up in #131836 (and probably led to asset essential brushes all
having this wrong -- which in turn will not draw stencil masks for them
in the viewport).
NOTE: without those defaults, resetting the brush would also have this
issue.
For further steps to actually fix fully, please refer to #131836.
Pull Request: https://projects.blender.org/blender/blender/pulls/131848
Similar to toolbar, when spacebar is mapped to `play`, use `shift +
spacebar` for asset-shelf popup. When spacebar mapped `toolbar`, invoke
asset shelf in paint modes. When `spacebar=search`, do not map any key
for asset-shelf popup
Co-authored-by: Julian Eisel
Pull Request: https://projects.blender.org/blender/blender/pulls/131351
When a scene contains distant lights and local lights, the first step
of the light tree traversal is to compute the importance of
distant lights vs local lights and pick one based on a random number.
In the specific case of when there is only one distant light,
the line of code that had been changed in this commit
effectively reduced to:
`min_importance = fast_cosf(x) < cosf(x) ? 0.0 : compute_min_importance`
And depending on the hardware, compiler, and the specific value being
tested, different configurations could take different code paths.
This commit fixes this issue by turning the comparison into
`fast_cosf(x) < fast_cosf(x)`.
---
Why does `cos_theta_plus_theta_u < cosf(bcone.theta_e - bcone.theta_o)`
reduce to `fast_cos(x) < cos(x)` in this specific case?
- `cos_theta_plus_theta_u` is computed as
`cos_theta * cos_theta_u - sin_theta * sin_theta_u`
- `cos_theta` is always 1.0 in the case of a single distant light.
- `cos_theta_u` is computed earlier as `fast_cosf(theta_e)` in
`distant_light_tree_parameters()`
- `sin_theta` is zero, and so that side of the equation doesn't matter.
This reduces `cos_theta_plus_theta_u` to `fast_cosf(theta_e)`.
`cosf(bcone.theta_e - bcone.theta_o)` reduces to `cosf(bcone.theta_e)`
because for the case of a single distant light `theta_o` is always 0.
Pull Request: https://projects.blender.org/blender/blender/pulls/131932
By now it is just a "compositor", so move the files one folder up.
Things that were under realtime_compositor/intern move into
already existing intern folder.
Pull Request: https://projects.blender.org/blender/blender/pulls/132004
This patch optimizes the Step mode of the Dilate node to use the van
Herk/Gil-Werman algorithm which runs in constant time compared to the
current linear time algorithm currently in use. This is an order of
magnitude faster for reasonably large structuring elements.
Only CPU is implemented in this patch, while GPU will be implemented in
a separate patch.
Pull Request: https://projects.blender.org/blender/blender/pulls/131798
This patch adds compile-time optimizations where the operation inputs
are guaranteed to be non-single values. Pixel load methods now take an
optional template parameter CouldBeSingle, which is false by default. If
the input is not guaranteed to be single, it needs to be set to true.
Gives up to 3x improvement in some nodes.
OptiX OSL tests were previously disabled due to a GPU driver bug
resulting in many tests failing unexpectedly.
The new driver version is now out with the fix so we can now enable
OptiX OSL testing.
This commit also updates the OptiX OSL block list with better comments,
and more tests that are known to fail and need investigating.
Ref: #123012
Pull Request: https://projects.blender.org/blender/blender/pulls/129280
Until the newer Hair Curves system can fully replace particle hair, add
a small test to ensure this continues to work.
Since the hair is exported as cubic bspline curves, we can also use this
same file to test bspline import now too.
Pull Request: https://projects.blender.org/blender/blender/pulls/131997
On Linux, Cycles HIP has a JIT compilation feature.
This feature is used when Cycles can not find a precompiled kernel
for your GPU. Which is most common when using hardware that wasn't
out at the time that a version of Blender was released.
There were various issues with this JIT compilation system, this commit
aims to solve them. The changes include:
- Enable `WITH_NANOVDB` when Blender is built with NanoVDB.
- This fixes a issue where VDB objects would not render.
- Enable some extra debug options for developers when desired
(This is so we match the CUDA implementation of the same feature).
- Reduce the optimizaiton level from -O3 to the default.
- This is to avoid any extra issues that may occur as a result
of an increase optimization level that isn't tested with
precompiled kernels.
- Reduce the optimization level even further to -O1 for Vega.
- This was done on precompiled kernels to work around some issues,
so I decided to apply it to JIT kernels as well.
- Note: Although Vega is not officially supported, this may help
people that unofficially use Vega.
- Added some previously missing compiler arguments and fixed errors that
were introduced when enabling these compiler arguments.
- Fixed a issue where JIT compilation would fail if Blener was
installed in a path that had a space in it.
Pull Request: https://projects.blender.org/blender/blender/pulls/131853
Area "close" operation is actually an area merge of some area into the
one being closed. This means that screen->active_region will be
pointing at deallocated RAM. Normally not noticed because active_region
is set very quickly to the new area, but error nonetheless and noticed
by ASAN. This PR sets the screen->active_region to null when merges
change the active area.
Pull Request: https://projects.blender.org/blender/blender/pulls/131994
This applies upstream PR 13328 to our copy of dpcpp, which enables
building dpcpp on a many core box. I Upgraded my build env and
ran into this issue.
No rebuilds required, build time fix only.
While adding tests I found that metaball export has been broken since
Blender 3.4. It would export each metaball geometry twice.
This looks to have been a side effect of a change to `object_dupli.cc`
which no longer sets the `no_draw` flag for metaballs[1]. With the flag
unset we would end up visiting this particular object twice.
Use a direct check for Metaballs now and add test coverage for the
scenario in general.
[1] eaa87101cd
Pull Request: https://projects.blender.org/blender/blender/pulls/131984
The new `--disable-depsgraph-on-file-load` commandline option, when used
together with the `--background` or `--command` ones, will prevent
building a depsgraph immediately after loading a blendfile.
The goal is to improve performances of batch-processing of blendfiles by
python scripts. It is intended to become the default behavior in Blender
5.0.
Scripts requiring evaluated data then need to explicitly ensure that
an evaluated depsgraph is available (e.g. by calling
`depsgraph = context.evaluated_depsgraph_get()`).
------
This disables the call to `wm_event_do_depsgraph` in `wm_file_read_post`.
Some quick performances tests:
* The whole `blendfile_versioning` tests gain about 2% speedup. These are
almost always small and simple blendfiles.
* Loading a Gold production file however goes from 26.5s to 3.5s (almost
90% faster) when this new option is specified.
Pull Request: https://projects.blender.org/blender/blender/pulls/131978
Previously, the number of material slots on the geometry (e.g. mesh) was the
ground truth. However, this had limitations in the case when the object had more
material slots than the evaluated geometry. All extra slots on the object were
ignored.
This patch changes the definition so that the number of materials used for
rendering is the maximum of the number of material slots on the geometry and on
the object. This also implies that one always needs a reference to an object
when determining that number, but that was fairly straight forward to achieve in
current code.
This patch also cleans up the material count handling a fair amount by using the
`BKE_object_material_*_eval` API more consistently instead of manually accessing
`totcol`. Cycles uses the the same API indirectly through RNA.
Pull Request: https://projects.blender.org/blender/blender/pulls/131869
As mentioned in fb6ac24514 and 9a6beb915d, `file_draw_preview()` is a
rather overloaded and confusing function. I'm trying to make it more
readable.
Split out file indicator icon drawing from the preview drawing function,
there's not much reason for it to be there as well. I rather keep
functions a bit simpler and more manageable.
Also added some comments and tried to make logic a bit more clear.
As mentioned in 9a6beb915d, `file_draw_preview()` is a rather
overloaded and confusing function. I'm trying to make it more readable.
Move image scale calculations to a separate function, reducing perceived
complexity of the function and the number of local variables. Also
rename or remove some variables to be more clear, add comments and move
variables closer to where they are used.
The GCC version on the buildbot does not support attribute on
a class member, resulting in the following warning:
NOD_node_declaration.hh:577:42: warning: ‘maybe_unused’ attribute ignored [-Wattributes]
Use the `UNUSED_VARS` macro instead to solve the original warning
about member being unused in release builds without introducing
a warning when using older compiler.
Pull Request: https://projects.blender.org/blender/blender/pulls/131974
Render tests can still fail. This change will disable them until they
are in a better shape. Reduces confusion when running cycles GPU render
tests.
Known issues:
- Render in batch can take forever due to a locking issue
- Headless rendering is still in development
- Particle hair rendering is broken.
Pull Request: https://projects.blender.org/blender/blender/pulls/131964
The type info table for VSE modifiers was initialized to point
to global variables on first use. But really there's no reason to do
that, we can just declare the actual table instead. This is both
shorter, and avoids dances with preprocessor (INIT_TYPE macro).
Pull Request: https://projects.blender.org/blender/blender/pulls/131958
`file_draw_preview()` does multiple things and is quite hard to follow
already, it needs some improvents. One issue is naming that I always
found made the function unnecessarily confusing. For example `is_icon`
had nothing to do with the `icon` parameter, you'd have to search around
the code a bit to understand what it was actually representing.
Attempt to make variable and function names more clear.
Also reduce variable scope and add a comment.
When baking line art strokes, the object matrix that are used for back
transformation was inverted. Should be `world_to_object` instead of
`object_to_world`. Probably a typo during GPv3 rewrite.
The compositor leaks memory when the node tree contains unavailable
links. That's because the compositor doesn't ignore those links when
computing the reference counts for outputs. To fix this, check if the
output is logically linked and return 0 in case it isn't.
This adds initial support for ReBAR capable platforms.
It ensures that when allocating buffers that should not be host visible, still
tries to allocate in host visible memory. When there is space in this memory
heap the buffer will be automatically mapped to host memory.
When mapped staging buffers can be skipped when the buffer was newly
created. In order to make better usage of ReBAR the `VKBuffer::create`
function will need to be revisit. It currently hides to much options to allocate
in the correct memory heap. This change isn't part of this PR.
Using shader_balls.blend rendering the first 50 frames in main takes 1516ms.
When using ReBAR it takes 1416ms.
```
Operating system: Linux-6.8.0-49-generic-x86_64-with-glibc2.39 64 Bits, X11 UI
Graphics card: AMD Radeon Pro W7700 (RADV NAVI32) Advanced Micro Devices radv Mesa 24.3.1 - kisak-mesa PPA Vulkan Backend
```
Pull Request: https://projects.blender.org/blender/blender/pulls/131856
Change from ababc2e01b did not actually behave in a way that the
caller can force-disable overlays [which seems like the intention from
the commit message and also desired behavior for e.g. grease pencil
drawing/reprojecion].
Pull Request: https://projects.blender.org/blender/blender/pulls/131861
When running tests `WITH_GTESTS` and `WITH_GPU_DRAW_TESTS` the
GPUShaderCreateInfo's specfically created for the tests could not be
found. This failed running tests on any backend.
This PR fixes this. The root cause what that the name of the compile
directive was incorrect. It should have been `WITH_GTESTS` but was
`WITH_GTEST`.
Pull Request: https://projects.blender.org/blender/blender/pulls/131956
Originally intended to be a code cleanup that makes the code shorter
(part of VSE quality project #130975), but as a side effect many
modifiers are now faster since they no longer do many branches in
the innermost pixel loop.
Main part is having apply_modifier_op that given the "modifier op"
functor object, instantiates the correct processing function based
on type of image (byte vs float) and mask (none, byte, float), for
a total of 6 possible cases. And then a helper like
apply_and_advance_mask that applies mask based on input and result
in a consistent and not "literal copy paste of code" way across the
modifiers.
Brightness/Contrast, Color Balance, Tonemap modifiers were already
optimized to move branches out of inner loops previously; their
performance remains unchanged. Mask modifier performance remains
unchanged; it is very simple and memory bandwidth limited on my
machine.
Other modifiers, tested on 4K resolution, Win10 / Ryzen 5950X, time
in milliseconds taken to apply the modifier calculation, on a byte
image with no mask:
- Curves: 12.1 -> 7.7ms
- Hue Correct: 24.5 -> 15.8ms
- White Balance: 20.5 -> 13.8ms
Same as above, but on a float image with a byte mask:
- Curves: 13.5 -> 12.3ms
- Hue Correct: 19.7 -> 16.4ms
- White Balance: 19.3 -> 15.9ms
Pull Request: https://projects.blender.org/blender/blender/pulls/131736
Change BrightRings exr file to not contain nan/inf pixels. Testing for
nan/inf in input just gives too many headaches across different
platforms, and is arguably a very corner case.
Pull Request: https://projects.blender.org/blender/blender/pulls/131926
UNICODE code points were named "ascii" or "cha",
use the term "charcode" as used elsewhere in BLF and FreeType.
Also use char32_t internally and add a utility function to apply the
small-caps flag.