When using a small brush size, if enough latency is encountered,
sculpting can become unresponsive. See #131334 for more in-depth
information on this issue.
In general, this is most apparent to users with the Smooth brush and
any brushes with auto-smoothing enabled. Much of the work for this
brush action is in retrieving and averaging the neighboring vertex
positions. This is wasted work for vertices outside of the brush radius,
as ultimately they will have no displacement applied to them.
To help mitigate this performance regression, this PR adds a variant
of functions that calculate neighboring vertices that takes in the
precalculated `factor` to skip further processing. Additionally, some
methods are restructured to take advantage of this.
This change represents a speedup of 4x, from `0.40ms` without this
patch to `0.10ms` with it on a cube with 400k vertices and a brush
radius of 10px. For most brush size to BVH Node / Mesh ratios, we see
improvements, as we can avoid processing most nodes that a brush only
affects a small number of vertices of. For cases where the entire mesh
is affected by a brush, this patch does introduce a small but
measurable slowdown of 0.31ms to 0.33ms.
Ref: #136006
Pull Request: https://projects.blender.org/blender/blender/pulls/136274
The initial limitation preventing from using -ffast-math, worked around
in 09df1f4caf, got fixed upstream in LLVM
and the fix is part of current DPC++ compiler:
63ecd2a725
We're now able to go back to using -ffast-math, which helps simplifying
the set of compiler flags.
No performance nor conformance change is expected from this change (most
of the gain is achieved already with the use of -cl-fast-relaxed-math
since 284b89a0a3) and this has been
verified on Arc B580 under Windows.
The Blender's VkInstance cannot be shared with OpenXR VkInstance. The
reason is a chicken and egg problem where OpenXR needs to be started
before Vulkan. OpenXR can add special vulkan specific requirements
(instance&device) that are only available when the user starts an OpenXR
session.
The goal implementation is to share memory between both instances using
[VK_KHR_external_memory](https://registry.khronos.org/vulkan/specs/latest/man/html/VK_KHR_external_memory.html) and related extensions. However this seems
to be a bridge to far as a initial step. Reason: There are not that many
samples/ guides and documentation to be found to handle the workflow that
we require. We want to do a smaller step by step approach to gain the needed
knowledge.
For that reason this PR does the most stupidest thing that can be done to
share memory between instances. Download the render result to CPU RAM share
the host pointer with the OpenXR instance which copies it to the swap chain.
Also the synchronization is done using wait idle commands.
<video src="attachments/32a0d69b-c3fa-4272-aea0-d207609afaaf" title="Screencast From 2025-03-18 11-16-17.webm" controls></video>
**Gaining knowledge**
- Experiment with `VK_KHR_external_memory_host` extension for uploading vertex buffers (not related to OpenXR).
- Import host pointer with `VK_KHR_external_memory_host`. This reduces the additional
memcpy on OpenXR side.
- Export host pointer from Blender side from a mappable buffer.
- Replace host pointers with fd/dmabuf/winhandle
- Remove mappable buffer.
Ref #133718
Pull Request: https://projects.blender.org/blender/blender/pulls/133824
Followup to 9b70851d91.
Return buffers by value rather than creating an empty/uninitialized
buffer first, then initializing it in an extraction function. This generally
makes the code easier to follow. And avoiding these half-created buffers
is an essential step to adding some sort of more global cache.
Pull Request: https://projects.blender.org/blender/blender/pulls/136570
python3.dll was installed for blender, but not next to the
python binary, leading to issues with subprocesses. Given
it's only a small dll the duplication isn't that big of a deal.
No functional changes.
This patch adds unit tests for the animation baking code in `anim_utils.py`.
It is by no means exhaustive but it is a start to figure out what this function
is actually doing.
With the usage of the legacy python API I was worried things might not work as
expected but all added tests pass.
Also, the tests document the current behavior without any attempt of declaring
that behavior as good or correct.
Pull Request: https://projects.blender.org/blender/blender/pulls/135583
Add support for using the Stash (to NLA) and Push Down operators on
empty Actions. In the past years, the NLA has seen stability updates
that ensure strips are at least a single frame long, and with that even
pushing down an empty Action will create a visible (albeit tiny) NLA
strip. There doesn't seem to be a practical reason to disallow this any
more.
Pull Request: https://projects.blender.org/blender/blender/pulls/136604
It is possible to un-assign the action slot from an NLA strip. If then
you enter tweak mode on it and insert keys, a new slot is created on the
Action (so far so good). However, exiting tweak mode did not assign that
slot to the NLA strip, deactivating the animation. This is now solved.
The slot assignment is done when exiting tweak mode because that's
when the whole "sync from assigned Action back to the NLA strip"
happens. Also things like syncing the strip length is done at
tweak-exit, so that seemed like the right place to me to do this too.
Pull Request: https://projects.blender.org/blender/blender/pulls/136601
Update all F-Curves so they have the correct flags (`FCURVE_INT_VALUES`,
`FCURVE_DISCRETE_VALUES`) for the RNA property type that they animate.
The bug that caused these flags to be incorrect (#136347) is already
fixed. This commit ensures that F-Curves that were created while the bug
was in a Blender release are updated to ensure they have the correct
flags.
This is quite important to fix, as otherwise enum properties will
actually be interpolated. Imagine the "fun" when a rig is going
through all the intermediate rotation modes when it was intended to
switch from "Quaternion" to "ZYX".
Even before this commit, these flags were already recomputed on key
insertion (at least the ones through the UI). The versioning code simply
runs this update on all existing F-Curves.
Since this may have some performance impact (doing an RNA path resolve
on all F-Curves on all Actions), the versioning code is only run when
the blend file is from 4.4 or newer, as the bug was introduced in that
release.
Pull Request: https://projects.blender.org/blender/blender/pulls/136512
The issue is caused by the fact that when both compositors are used,
`fftwf_plan_dft_r2c_2d` can end up being called in parallel, which is
only thread-safe if `fftwf_make_planner_thread_safe` is called before.
This is done by `fftw::initialize_float`, but only if the FFTW threading
support library is available. Said library was not detected correctly on
Windows because of a typo, which this change addresses. This should also
make the fog glow faster on Windows because it'll now use multithreaded
FFT as intended.
This change also moves the call to `initialize_float` to the main
function because the FFTW functions it calls are not thread-safe and
because FFTW is also used by Audaspace, which cannot call it.
Pull Request: https://projects.blender.org/blender/blender/pulls/136557
The Glare node currently has a Maximum Highlights input, which has a
special value of 0.0, where the maximum is implicitly set to infinity,
that is, no suppression of highlights happen at that special value. Such
special values are hard to discover and make sliders non-continuous.
To fix this, we introduce a new panel toggle input called Suppress
Highlights, which the user can enable then control the maximum value.
This also have the advantage that the Maximum value is more clear, since
it is now under a panel more clearly named.
This is now possible since the introduction of boolean sockets and node
panel toggle inputs.
Pull Request: https://projects.blender.org/blender/blender/pulls/136309
Apply transform behavior for empties is now consistent.
Applying scale always changes empty size to keep apparent size.
Previously, empty size is only changed when applying scale only.
Pull Request: https://projects.blender.org/blender/blender/pulls/136534
Remove arrow from "Operating system" pointing to "Other Directories".
The other directories are for context and aren't called into.
Also correct a typo.
The `motionSampleTime` argument to `create_object` has been unused since
the dawn of time, and it's not expected to be used in the future either.
Remove the clutter.
Pull Request: https://projects.blender.org/blender/blender/pulls/136587
Both values are unused.
* `current_r` is only ever set and never read from
* `previous_r` is only ever read from and is never set, because it is
always empty, it is never unioned with the current `rcti`.
Pull Request: https://projects.blender.org/blender/blender/pulls/136586
This situation is unlikely to happen in practice, as it would require
there to be either no elements in the mesh, or every average translation
to be a 0-length vector.
Pull Request: https://projects.blender.org/blender/blender/pulls/136572
Even though the doc-string notes that they're only used for
function parameters, it looks as if they might be used for
`wmEvent::modifier` and are exposed in a prominent location.
Remove the flags & replace them with a macro that bit-shifts the
existing modifier values which is more clearly intended to be used
with `KeyMapItem_Params`.
Ref !136539
This commit gives users of the Cycles performance benchmark tool the
option to run performance benchmarks with OSL enabled for CPUs
and OptiX devices.
This can be done by adding `-OSL` to the device name:
`CPU-OSL`
`OPTIX-OSL_0`
Pull Request: https://projects.blender.org/blender/blender/pulls/136506
Instead of returning 0 in case the Intel extension for getting the count
of Execution Units isn't available, we now use
sycl::info::device::max_compute_units.
We keep using the Intel extension in priority since it logically goes
with sycl::ext::intel::info::device::gpu_hw_threads_per_eu used in
get_max_num_threads_per_multiprocessor(), for which there is no
sycl::info::device::max_threads_per_compute_unit replacement yet.
The debug set of Embree prebuilt libraries currently lacks SYCL support
while the release ones have it.
This case was not gracefully handled for debug builds with Embree on GPU
enabled, leading to linking errors, trying to resolve rtcNewSYCLDevice
and rtcIsSYCLDeviceSupported.
We now test for this case to explicitly disable the use of Embree on GPU
for debug builds on Windows and print this status from CMake.
Set the flags on the image buffer when loading an EXR file, so they can be
used when saving.
This also removes IB_halffloat and replaces it by the file options flag.
Pull Request: https://projects.blender.org/blender/blender/pulls/135656
I'm moving this for two (related) reasons:
* It depends a lot on the specifics of `Camera` and `Object` data-blocks.
* It links `Object::object_to_world()` which is not an inline function and thus
easily leads to linker errors. It mostly seems like luck that this is not
breaking our build due to early dead code elimination when linking binaries
which use the blenlib static library such as `msgfmt`.
I found this while working on a compilation tool which would not be as lucky and
has a linker error because of the dependence on `Object::object_to_world`.
Pull Request: https://projects.blender.org/blender/blender/pulls/136547
If you scale down the color pickers to very small sizes the calculation
of the handle size of the value slider will cause it to proportionally
increase as it all approaches zero. This PR just calculates the size in
a better way and clamps it as well. For both the round and square
versions.
Pull Request: https://projects.blender.org/blender/blender/pulls/136566
This fixes the following warning with MSVC:
device_impl.cpp(287): warning C4805: '|=': unsafe mix of type 'bool' and type 'ccl::uint' in operation
The similar fix is applied to Metal code as well.
There is no short-circuiting boolean operator ||=, so expand the expression.
Pull Request: https://projects.blender.org/blender/blender/pulls/136561
Texture nodes are already supported through the GPUMaterial and
multi-function compositor abstractions. So we just need to expose them
through the add menu.
This patch makes it such that the compositor fallback to using the
order of the inputs to infer the domain priority if no domain priority
is specified. This is more robust since some nodes do not declare their
domain priorities and indirectly rely on the order of insertions in some
containers and thus might fail in the future.
We opt for this as opposed to requiting all nodes to declare their
priorities for code brevity.
This patch adds initial support for implicit inputs in pixel operations.
This is currently a non-functional change but will be used in the
future to support implicit inputs in texture nodes or so.
This works by exposing extra inputs to pixel operation for each of the
supported implicit input types if needed, and linking those inputs to
instances of the ImplicitInputOperation operation.
Only a single implicit input exist for now and we do not differentiate
between any of the implicit inputs type. In order to do that, we need to
refactor how input declarations for implicit inputs happen, since they
are now tied to the Geometry Nodes specifically.
HIP-RT functions do have access to kg, and it was used inconsistently:
some functions were passed actual kg, other were passed nullptr.
This change makes it consistent and passes kg everywhere.
Pull Request: https://projects.blender.org/blender/blender/pulls/136503