Consistently use edge draw flag instead of original index to determine if an
edge should be drawn or not.
In GPU subdivision the edge original index was used for both edge optimal
display and selection mapping to coarse edges, but they are not the same.
Now match the CPU subdivision logic and use a separate edge draw flag VBO.
For cage display, match Blender 3.3 behavior more in showing/hiding of edges
in wireframe mode. That is edges without a mapping to an original edge are
always hidden when there is no distinct cage, and drawn otherwise. This is
not ideal for e.g. the bevel modifier where it will always show some edges on
corners despite all edges being hidden by the user. But we currently have
no good information to decide if these should be hidden or not, so err on
the side of showing too much as it did before.
Fie #103706: bevel modifier edges not drawn correctly
Fix#103700: optimal display can't be turned of with GPU subdivision
Fix wrong edge display with GPU subdivision preceded by other modifiers
Pull Request #105384
Intel GPUs exhibit a number of rendering artifacts.
The most substantial being incorrect resolve of reflections.
Splitting the reflections_resolve shader into two passes,
one for SSR and one for light probes ensures correct rendering
and optimal performance on this GPU.
Also resolves an artifact with ambient occlusion wherein
the pow(a, b) function causes excessive precision loss.
Using an alternative method for power calculation on these
platforms resolves the issues.
Authored by Apple: Michael Parkin-White
Ref T96261
Pull Request #105240
Metal backed requires HOST_READ texture usage flag
for irradiance grid. This was correctly in place for the
basic grid, but not for grid_prev.
Authored by Apple: Michael Parkin-White
Ref #96261
Pull Request #105312
Complex EEVEE nodegraphs, particularly those combining
multiple principledBSDF shader nodes have a tendancy
to require a large number of simultaneous live registers
due to function call depth. In some instances, this
causes substantial performance drop and corruption if
the stack gets too large.
To mitigate this, splitting calls to closure_eval such
that only a single individual closure is evaluated in each
call reduces the number of live registers required. This
is preferred over using compound closure evaluation
functions which require a large amount of in-flight data.
Note that this is generally not more optimal, if the stack
does not spill, as there is an increased instruction count.
The specific trade-off depends on the exact architecture
in question. Hence, this is limited to AMD GPUs.
Authored by Apple: Michael Parkin-White
Ref #96261
Pull Request #104985
This reverts commit 19222627c6.
Something went wrong here, seems like this commit merged the main branch
into the release branch, which should never be done.
This reverts commit 68181c2560.
I merged 3.6 into 3.5 by mistake. Basically I had a PR against main,
then changed it in the last minute to be against 3.5 via the
web-interface unaware that I shouldn't do it without updating the
patch.
Original Pull Request: #104889
Note that the node group has its sockets names
translated, while the built-in nodes don't.
So we need to use data_ for the built-in nodes names,
and the sockets of the created node groups.
Pull Request #104889
Avoid running out of attributes when multiple material slots use the same one.
Cleanup:
Removes the return value from drw_attributes_add_request since it shouldn't be modified afterward and it's never used.
Avoid making copies of DRW_AttributeRequest in drw_attributes_has_request.
Co-authored-by: Miguel Pozo <pragma37@gmail.com>
Pull Request #104709
overlay_uniform_color_clipped was inheriting from overlay_depth_only, which doesn't
make much sense.
I've changed it to inherit from overlay_uniform_color instead, which is consistent
with other \*\_clipped variants of shaders.
Pull Request #104761
Certain material node graphs can be very expensive to run. This feature aims to produce secondary GPUPass shaders within a GPUMaterial which provide optimal runtime performance. Such optimizations include baking constant data into the shader source directly, allowing the compiler to propogate constants and perform aggressive optimization upfront.
As optimizations can result in reduction of shader editor and animation interactivity, optimized pass generation and compilation is deferred until all outstanding compilations have completed. Optimization is also delayed util a material has remained unmodified for a set period of time, to reduce excessive compilation. The original variant of the material shader is kept to maintain interactivity.
Also adding a new concept to gpu::Shader allowing assignment of a parent shader from which a shader can pull PSO descriptors and any required metadata for asynchronous shader cache warming. This enables fully asynchronous shader optimization, without runtime hitching, while also reducing runtime hitching for standard materials, by using PSO descriptors from default materials, ahead of rendering.
Further shader graph optimizations are likely also possible with this architecture. Certain scenes, such as Wanderer benefit significantly. Viewport performance for this scene is 2-3x faster on Apple-silicon based GPUs.
Authored by Apple: Michael Parkin-White
Ref T96261
Pull Request #104536
This adds a new overlay for curves sculpt mode that displays the curves that the
user currently edits. Those may be different from the evaluated/original curves
when procedural deformations or child curves are used.
The overlay can clash with the evaluated curves when they are exactly on top of
each other. There is not much we can do about that currently. The user will have
to decide whether the overlay should be shown or not on a case-by-case basis.
Pull Request #104467
This renames `data` and `color` to `selection`. This is better because
it's actually what the corresponding buffers contain. Using this
more correct name makes sharing vertex buffers between different
gpu batches for different shaders easier.
In order to experiment with different storage types for `DRW_Attributes`
and for general cleanup (see #103343). Also move a curves header to C++.
Pull Request #104716
Documented all functions, adding use case and side effects.
Also replace the use of shortened argument name by more meaningful ones.
Renamed `GPU_batch_instbuf_add_ex` and `GPU_batch_vertbuf_add_ex` to remove
the `ex` suffix as they are the main version used (removed the few usage
of the other version).
Renamed `GPU_batch_draw_instanced` to `GPU_batch_draw_instance_range` and
make it consistent with `GPU_batch_draw_range`.
As described in #95966, replace the `ME_EDGEDRAW` flag with a bit
vector in mesh runtime data. Currently the the flag is only ever set
to false for the "optimal display" feature of the subdivision surface
modifier. When creating an "original" mesh in the main data-base,
the flag is always supposed to be true.
The bit vector is now created by the modifier only as necessary, and
is cleared for topology-changing operations. This fixes incorrect
interpolation of the flag as noted in #104376. Generally it isn't
possible to interpolate it through topology-changing operations.
After this, only the seam status needs to be removed from edges before
we can replace them with the generic `int2` type (or something similar)
and reduce memory usage by 1/3.
Related:
- 10131a6f62
- 145839aa42
In the future `BM_ELEM_DRAW` could be removed as well. Currently it is
used and aliased by other defines in some non-obvious ways though.
Pull Request #104417
This allow to bypass all cost associated with shadow mapping.
This can be useful in certain situation, such as opening a scene on a
lower end system or just to gain performance in some situation (lookdev).
The merge with master updated the code to use the new matrix API. This
introduce some regressions.
For sunlights make sure there is enough tilemaps in orthographic mode
to cover the depth range and fix the level offset in perspective.
Implements virtual shadow mapping for EEVEE-Next primary shadow solution.
This technique aims to deliver really high precision shadowing for many
lights while keeping a relatively low cost.
The technique works by splitting each shadows in tiles that are only
allocated & updated on demand by visible surfaces and volumes.
Local lights use cubemap projection with mipmap level of detail to adapt
the resolution to the receiver distance.
Sun lights use clipmap distribution or cascade distribution (depending on
which is better) for selecting the level of detail with the distance to
the camera.
Current maximum shadow precision for local light is about 1 pixel per 0.01
degrees.
For sun light, the maximum resolution is based on the camera far clip
distance which sets the most coarse clipmap.
## Limitation:
Alpha Blended surfaces might not get correct shadowing in some corner
casses. This is to be fixed in another commit.
While resolution is greatly increase, it is still finite. It is virtually
equivalent to one 8K shadow per shadow cube face and per clipmap level.
There is no filtering present for now.
## Parameters:
Shadow Pool Size: In bytes, amount of GPU memory to dedicate to the
shadow pool (is allocated per viewport).
Shadow Scaling: Scale the shadow resolution. Base resolution should
target subpixel accuracy (within the limitation of the technique).
Related to #93220
Related to #104472
Discard is not always treated as an explicit return and flow control can continue for required derivative calculations. This behaviour is different in Metal vs OpenGL. Adding return after discards ensures consistency in expectation as behaviour is well-defined.
Authored by Apple: Michael Parkin-White
Ref T96261
Reviewed By: fclem
Maniphest Tasks: T96261
Differential Revision: https://developer.blender.org/D17199
Straightforward port. I took the oportunity to remove some C vector
functions (ex: copy_v2_v2).
This makes some changes to DRWView to accomodate the alignement
requirements of the float4x4 type.