When rendering in the viewport (or probably on instanced objects, but I didn't
test that), emissive objects whose scale is negative give the wrong value on the
"backfacing" input when multiple sampling is enabled.
The underlying problem was a corner case in how normal transformation is handled,
which is generally a bit messy.
From what I can tell, the pattern appears to be:
- If you first transform vertices to world space and then compute the normal from
them (as triangle light samping, MNEE and light tree do), you need to flip
whenever the transform has negative scale regardless of whether the transform
has been applied
- If you compute the normal in object space and then transform it to world space
(as the regular shader_setup_from_ray path does), you only need to flip if the
transform was already applied and was negative
- If you get the normal from a local intersection result (as bevel and SSS do),
you only need to flip if the transform was already applied and was negative
- If you get the normal from vertex normals, you don't need to do anything since
the host-side code does the flip for you (arguably it'd be more consistent to
do this in the kernel as well, but meh, not worth the potential slowdown)
So, this patch fixes the logic in the triangle emission code.
Also, turns out that the MNEE code had the same problem and was also having
problems in the viewport on negative-scale objects, this is also fixed now.
Differential Revision: https://developer.blender.org/D16952
- Avoid calling node interface items "sockets"
- Use "active" instead of "current" to be more correct
- Avoid using the same word in description and name
- A couple grammar fixes
- Matrix normalize overloads needs to have the vector normalize redefined.
- double underscore (anywhere in symbol name) are reserved.
- Some operation yield different result due to float imprecision. Increasing
epsilon threshold for the failing tests.
Incorrect offset was calculated when strip was implicitly retimed (movie
FPS does not match scene FPS). This is because strip playback rate was
not used for offset calculation at all.
Since hold offset is specifying numbers of frames to skip, but at frame
rate of the source, this could result in gap when splitting the strip.
If that occurs, gap is compensated by moving handle to frame where strip
is split.
Happens, for example, when the object has animation, and disabled for
render, and animation render is performed.
The regression has been uncovered by f12f7800c2 which made it so
the dependency graph relies on runtime visibility tracking and
updates (without updating relations).
The optimization from a while ago in the ff60dd8b18 got in a way
of the visibilit updates because it removed relation between two
no-op nodes which belong to different IDs, which make the visibility
tracking impossible.
This change makes it so only relations which belong to the same
component are removed. This matches the expectations of the visibility
tracking (which, actually, also needed to happen at the moment of the
initial optimization commit). Technically, this change could introduce
some performance regression, but with the current design design of the
graph it is not really avoidable.
The idea to gain the best performance is to separate relations which
actually define the execution flow, and which are only needed to
define things like visibility dependencies.
When drawing using the option `Outline` the result stroke
was not using the Vertex Color option and always was converted
using material.
Now the vertex color option is used.
Crash only occured when textures was stored in a gray scale GPU
texture and was scaled down to fit inside the given limitation.
In this case the original number of pixels were packed into the
GPU buffer, not taken into account the scaled down image. This
resulted in a buffer overflow.
The bug is caused by rBb66b3f547c43e841a7d5da0ecb2c911628339f56.
From what I can see, that fix was intended to enable manual lens shift for
panorama cameras, but it appears that it also unintentionally applies
interocular shift.
This fix disables the multiview shift for panorama cameras, that way manual lens
shift still works but we get the 2.x behavior for stereoscopic renders back.
Differential Revision: https://developer.blender.org/D16950
The code that computes and inverts the shutter CDF had some issues that caused
the result to be asymmetric, this tweaks it to be more robust and produce
symmetric outputs for symmetric inputs.
At the first bounce, the diffuse/glossy/transmission weights are stored so that
contributions along the path can be split into the d/g/t indirect passes.
However, volume bounces always set the weight even at indirect bounces, so
even paths that had their first bounce on a purely glossy object would suddenly
start counting towards the diffuse indirect pass after a secondary volume bounce.
Do not clear all the font's glyph caches with single-step zoom
operators if the area does not change font size when doing so.
See D16785 for more details.
Differential Revision: https://developer.blender.org/D16785
Reviewed by Campbell Barton
During geometry nodes evaluation some sockets can be determined
to be unused, for example based on the condition input in a switch node.
Once a socket is determined to be unused, that information has to be
propagated backwards through the tree to free any memory that may
have been reserved for those sockets already. This is happening before
this commit already, but in a less ideal way.
Determining that sockets are unused early is good because it helps with
memory reuse and avoids copy-on-write copies caused by shared data.
Now, nodes that are scheduled because an output became unused have
priority over nodes scheduled for other reasons.
This allows auto-vectorization to happen when the a multi-function is
evaluated in "materialized" mode, i.e. it is processed in chunks where
all input and outputs values are stored in contiguous arrays.
It also unifies the handling input, mutable and output parameters a bit.
Now they all can use tempory buffers in the same way.
Staging texture update copied over the entire texture, rather than just the region of the texture which had been updated. Also added early-exit for cases where the net texture update extent was zero, as this was causing validation failures.
Authored by Apple: Michael Parkin-White
Ref T103658
Ref T96261
Reviewed By: fclem
Maniphest Tasks: T103658, T96261
Differential Revision: https://developer.blender.org/D16924
First binding of a framebuffer lead to an incorrect SRGB conversion state being applied, as attachments, where presence of SRGB is determined, were processed after the SRGB check rather than before.
This DIFF also cleans up SRGB naming conventions and caching of fallback non-srgb texture view, for use when SRGB mode is disabled.
Authored by Apple: Michael Parkin-White
Ref T103399
Ref T96261
Reviewed By: fclem
Maniphest Tasks: T103399, T96261
Differential Revision: https://developer.blender.org/D16907
This simplifies the code enough so that msvc is able to unroll and
vectorize some multi-functions like simple addition.
The performance improvements are almost as good as the GCC
improvements shown in D16942 (for add and multiply at least).
Required texture bytesize calculation for compacted data types was incorrectly calculated, resulting in an erroneous format conversion taking place instead of direct data upload.
Metal dummy buffer size also temporarily increased to address problematic cases where the bound buffer was too small for missing UBOs.
Authored by Apple: Michael Parkin-White
Ref T96261
Reviewed By: fclem
Maniphest Tasks: T96261
Differential Revision: https://developer.blender.org/D16904
This was caused by rB0d73d5c1a2, which releases the scene mutex during kernel
loading. However, the reset mutex was still held, which can cause a deadlock
if another thread tries to reset the session, since it will acquire the
released scene mutex and then wait for the reset mutex.
Turns out there's no point in keeping the reset mutex locked after the delayed
reset section, so now we just release it earlier, which resolves the deadlock.
This mainly helps GCC catch up with Clang in terms of field evaluation
performance in some cases. In some cases this patch can speedup
field evaluation 2-3x (e.g. when there are many float math nodes).
See D16942 for a more detailed benchmark.
This moves all multi-function related code in the `functions` module
into a new `multi_function` namespace. This is similar to how there
is a `lazy_function` namespace.
The main benefit of this is that many types names that were prefixed
with `MF` (for "multi function") can be simplified.
There is also a common shorthand for the `multi_function` namespace: `mf`.
This is also similar to lazy-functions where the shortened namespace
is called `lf`.
* `depends_on_context` was not used for a long time already.
* `param_data_indices` is not used since rB42b88c008861b6.
* The remaining data is moved to a single `Vector` to avoid
having to do two allocations when the size signature becomes
larger than fits into the inline buffer.
This avoids a move of the signature after building it. Tthe value had
to be moved out of `MFSignatureBuilder` in the `build` method.
This also makes the naming a bit less confusing where sometimes
both the `MFSignature` and `MFSignatureBuilder` were referred
to as "signature".
* New `build_mf` namespace for the multi-function builders.
* The type name of the created multi-functions is now "private",
i.e. the caller has to use `auto`. This has the benefit that the
implementation can change more freely without affecting
the caller.
* `CustomMF` does not use `std::function` internally anymore.
This reduces some overhead during code generation and at
run-time.
* `CustomMF` now supports single-mutable parameters.
This refactors how devirtualization is done in general and how
multi-functions use it.
* The old `Devirtualizer` class has been removed in favor of a simpler
solution. It is also more general in the sense that it is not coupled
with `IndexMask` and `VArray`. Instead there is a function that has
inputs which control how different types are devirtualized. The
new implementation is currently less general with regard to the number
of parameters it supports. This can be changed in the future, but
does not seem necessary now and would make the code less obvious.
* Devirtualizers for different types are now defined in their respective
headers.
* The multi-function builder works with the `GVArray` stored in `MFParams`
directly now, instead of first converting it to a `VArray<T>`. This reduces
some constant overhead, which makes the multi-function slightly
faster. This is only noticable when very few elements are processed though.
No functional changes or performance regressions are expected.