This uses Spherical Harmonics to store the indirect lighting and
distant lighting visibility.
We can then reuse this information for each closure which divide
the cost of it by 2 or 3 in many cases, doing the scanning once.
The storage cost is higher than previous method, so we split the
resolution scaling to be independant of raytracing.
The spatial filtering has been split to its own pass for performance
reason. Upsampling now only uses 4 bilinearly interpolated samples
(instead of 9) using bilateral weights to avoid bleeding.
This also add a missing dot product (which soften the lighting
around corners) and fixes the blocky artifacts seen at lower
resolution.
Pull Request: https://projects.blender.org/blender/blender/pulls/118924
This adds the approximation of phase function convolution
of the distant lighting captured inside probe volumes.
This is based on a publication at siggraph from Bartlomiej Wronsky
"Volumetric Fog: Unified compute shader based solution to
atmospheric scattering"
Implementation is quite straightforward. However this isn't as
good as one can expect as there isn't self shadowing from the
volume themself, so the lighting is still quite flat.
To fix this, we have to add support for volumetrics inside
probe volumes baking. But this approach would still be static
so a more general solution is still to be found for dynamic
volumes like smoke simulations.
Pull Request: https://projects.blender.org/blender/blender/pulls/119479
Buttons for mesh symmetry on the toolbar, and other buttons of the same
functionality located in other areas of the window, weren't updated when
clicking on them.
Pull Request: https://projects.blender.org/blender/blender/pulls/118508
This improves performance by **reducing** the amounts of threads used for tasks
which require a high memory bandwidth.
This works because the underlying hardware has a certain maximum memory
bandwidth. If that is used up by a few threads already, any additional threads
wanting to use a lot of memory will just cause more contention which actually
slows things down. By reducing the number of threads that can perform certain
tasks, the remaining threads are also not locked up doing work that they can't
do efficiently. It's best if there is enough scheduled work so that these tasks
can do more compute intensive tasks instead.
To use this new functionality, one has to put the parallel code in question into
a `threading::memory_bandwidth_bound_task(...)` block. Additionally, one also
has to provide a (very) rough approximation for how many bytes are accessed. If
the number is low, the number of threads shouldn't be reduced because it's
likely that all touched memory can be in L3 cache which generally has a much
higher bandwidth than main memory.
The exact number of threads that are allowed to do bandwidth bound tasks at the
same time is generally highly context and hardware dependent. It's also not
really possible to measure reliably because it depends on so many static and
dynamic factors. The thread count is now hardcoded to 8. It seems that this many
threads are easily capable of maxing out the bandwidth capacity.
With this technique I can measure surprisingly good performance improvements:
* Generating a 3000x3000 grid: 133ms -> 103ms.
* Generating a mesh line with 100'000'000 vertices: 212ms -> 189ms.
* Realize mesh instances resulting in ~27'000'000 vertices: 460ms -> 305ms.
In all of these cases, only 8 instead of 24 threads are used. The remaining
threads are idle in these cases, but they could do other work if available.
Pull Request: https://projects.blender.org/blender/blender/pulls/118939
This uses parallel reduction when doing the octahedral map re-mapping.
The goal is not the speedup but the accuracy of the computation (temporal
stability) and to pave the way for sunlight extraction.
This weight each individual samples using texel solid angle for correct
energy.
After optimization, the cost is not so expensive (1024px² octahedral map):
- new: 263µs remap + 12µs sum
- old: 75µs remap + 180µs irradiance update
We could optimize it more, but that feels unecessary given that the first
two filter pass are 7ms and a more pressing optimization.
The old irradiance update was fast because it was using the mip2 which
was already pre-filtered and using way less pixels (which already yield a
temporally stable output).
This new implementation does consider all pixel in the LOD0 which will
allow for more precise sunlight extraction.
This also comes with a cleanup of the update tagging.
Pull Request: https://projects.blender.org/blender/blender/pulls/119537
This PR limits adding the RNA property `use_limit_to_segment`
to `ShapeType::Line` based sculpt gesture operators, as the
property is inapplicable for the other two types.
Pull Request: https://projects.blender.org/blender/blender/pulls/119670
Simplifies/optimizes the "font" shader. It runs faster now too, but primarily
this is so that it loads/initializes faster.
* Instead of doing blur via individual bilinear samples (where each sample is 4
texel fetches), do raw texel fetches of the kernel footprint and compute final
result by shifting the kernel weights according to bilinear fraction weight.
For 5x5 blur, this reduces number of texel fetches from 64 down to 36.
* Instead of checking "is the texel inside the glyph box? if so, then fetch it",
first fetch it, and then set result to zero if it was outside. Simplifies the
branching code flow in the compiled GPU shader.
* Avoid costly integer modulo/division for "unwrapping" the font texture. The
texture width is always power of two size, so division/modulo can be replaced
by masking and a shift. Setup uniforms to contain the needed data.
### Fixes
* The 3x3 blur was not doing a 3x3 blur, due to a copy-pasta typo (one of the
sample offsets was repeated twice, and thus another sample offset was
missing).
* Blur towards left/top edges of the glyphs had artifacts, because float->int
casting in GLSL rounds towards zero, but the code actually wanted to round
towards floor.
Image of how the blur has changed in the PR.
### First time initialization
* Windows 10, NVIDIA RTX 3080Ti, OpenGL: 274.4ms -> 51.3ms
* macOS, Apple M1 Max, Metal: 456ms -> 289ms (this is including PSO creation
time).
### Shader performance/complexity
Performance I only measured on macOS (M1 Max), by making a BLF text that is
scaled up to cover most of screen via Python. Using Xcode Metal profiler,
drawing that text with 5x5 shadow blur: 1.5ms -> 0.3ms.
More performance analysis details in PR.
Pull Request: https://projects.blender.org/blender/blender/pulls/119653
Don't assume armature of active object is what is displayed in the properties editor, both in C++ and Python code.
Object pointer was left out from some notifiers, as this means only that object was changed. But an armature datablock can be shared by multiple objects.
Co-authored-by: Brecht Van Lommel <brecht@blender.org>
Pull Request: https://projects.blender.org/blender/blender/pulls/119663
This fix should only be committed to blender-v4.1-release branch
Blender 4.2 the pos/nor buffers are separated and doesn't lead
to drawing artifacts.
This reverts commit 02379f3200
This fix should only be committed to blender-v4.1-release branch
Blender 4.2 the pos/nor buffers are separated and doesn't lead
to drawing artifacts.
This reverts commit 02379f3200
This change reverts 14500953ed. This commit improved the performance
but introduced the regression. The wireframe shader checks the normal
buffer to detect if attributes are being rendered. The VBO contains both
positions and normals.
In Blender 4.2 this VBO was separated (#116902)and this solved the rendering. It is
to late and risky to add this separation to 4.1 in the last minute so we
decided to revert the performance improvement as it was already an issue
for several years.
The performance improvement will still be in Blender 4.2 where it doesn't
have these artifacts.
Pull Request: https://projects.blender.org/blender/blender/pulls/119656
This change reverts 14500953ed. This commit improved the performance
but introduced the regression. The wireframe shader checks the normal
buffer to detect if attributes are being rendered. The VBO contains both
positions and normals.
In Blender 4.2 this VBO was separated (#116902)and this solved the rendering. It is
to late and risky to add this separation to 4.1 in the last minute so we
decided to revert the performance improvement as it was already an issue
for several years.
The performance improvement will still be in Blender 4.2 where it doesn't
have these artifacts.
Pull Request: https://projects.blender.org/blender/blender/pulls/119656
Caused by a96f1208cc
For `SPACE_ACTION` (`SACTCONT_ACTION`), `actedit_get_context` will
return true, but `ANIM_animdata_context_getdata` also checks the
`bAnimContext` data [which in this case is the action], so will **not**
return true.
As a result we are skipping drawing the background and the search, which
is now done again.
Pull Request: https://projects.blender.org/blender/blender/pulls/119621
For example, copying and moving a trivial type ends is the same.
However, currently we generate the code for both cases independently
instead of reusing the same underlying function.
This reduces the size of the Blender binary from `218.548.896` to
`218.355.552` bytes for me. So it's a reduction of about 200kb.
It's probably possible to reduce this even more, but that's for another day.
The main tricky thing here is telling the compiler that a `const` from a
function parameter can be cast away for trivial types (see code comment).
Maybe there is a better way to do this while making sure the compiler
doesn't generate unnecessary code.
Pull Request: https://projects.blender.org/blender/blender/pulls/119601
For historical reasons, the `multi_input_socket_index` was actually reversed
(large index comes first). This patch renames it to `multi_input_sort_id` and
adds a comment. This new name makes it less confusing that the id is reversed.
Pull Request: https://projects.blender.org/blender/blender/pulls/119652
This was caused by 745fd2a2cb.
The issue was that there was an attempt at calling
`uncollapse_by_default` on a `LayerViewItem`
which can't be collapsed (because it can't have any children).
This then triggered the assert.
The fix removes the call to `uncollapse_by_default` for
`LayerViewItem`.
It seems that the compiler on windows struggled to
correctly assign the namespace of the `LocalMemArena`
destructor because the struct was defined locally
in a lambda.
This moves the struct definition out of the lambda.
Fix a crash when inserting a key with tweak mode enabled, but where
`AnimData::actstrip` was NULL.
The root cause of this is that two pointers in the `AnimData` struct
(`act_track` and `actstrip`) are expected to be set when NLA tweak mode
is enabled, BUT these are not exposed to RNA and thus invisible to the
library overrides system. As such, they are NULL when loading from disk,
while the `ADT_NLA_EDIT_ON` flag still indicates they are to be used.
Rather than adding a NULL pointer check (and having to add that in many
more places), I used this two-pronged approach:
- Extend the 'NLA tweakmode' override apply code, to set the `act_track`
and `actstrip` pointers when they are incorrectly NULL. This is done
by lookup of the track and strip by name.
- Add versioning code to exit out of tweak mode whenever the
`ADT_NLA_EDIT_ON` flag is set, but those two pointers are still NULL.
The last step was necessary with the example file attached to the bug
report, as that was saved with a buggy blender version. New saves work
just fine.
Pull Request: https://projects.blender.org/blender/blender/pulls/119632
This makes the edit mode drawing for the new curves data more similar
to the old edit mode. Specifically, it draws the evaluated curves now instead
of just a poly curve. Furthermore, it now draws bezier handles as well as
a separate control curve for nurbs curves.
Pull Request: https://projects.blender.org/blender/blender/pulls/119053
Mistake in a96f1208cc.
When markers are present, offset should be added to the height.
Earlier it was subtracted like so `v2d->tot.ymin -= MarkerMargin` and that worked
because ymin is assigned like so `v2d->tot.ymin = -height`.
Now that the marker margin is added to the height it needs to be additive.
Pull Request: https://projects.blender.org/blender/blender/pulls/119647
Bring back the `INSERTKEY_XYZ_TO_RGB` enum item for the
`keyframe_insert()` function (it was removed in 30b0c5b225). This way
any Python code that targets Blender 4.x can safely pass this flag,
without having to check specific Blender versions.
Note that the flag is implemented as a no-op, as the behaviour change
introduced in 30b0c5b225 (just looking at the user preference) is still
retained. The purpose of this commit is simply to avoid the `ValueError`
exception that would otherwise be raised.
This should also fix Rigify report blender/blender-addons#105241.
Pull Request: https://projects.blender.org/blender/blender/pulls/119625
This became a bottleneck in one of the test files during playback.
A grease pencil object was using an array modifier which tags the triangle
caches to be invalidated. Then it re computed the fills on every frame, which
was slow (when single threaded).
With this patch, the playback went from ~43fps to 60+fps.
Pull Request: https://projects.blender.org/blender/blender/pulls/119531
Add more flexibility to tooltip images by adding the ability to specify
if (one of two) checkerboards are added, border, premultiplied
blending, or recoloring.
Pull Request: https://projects.blender.org/blender/blender/pulls/119437
If the bevel "Harden Normals" option is on, custom normals will be
generated. In that case, the automatic sharp edge tagging based on the
angle shouldn't run. This PR extends the earlier fix to #116395 to
handle this case and also extends the check to not just check the last
modifier, which doesn't work in this test file which has a collision
modifier at the end. That makes sense anyway, since what we really care
about is whether the evaluated mesh has custom normals or not.
Pull Request: https://projects.blender.org/blender/blender/pulls/119638