Add fast image writing and reading variants for RT passes.
These variants do not perform range checking on values
and should only be used in cases where the written texel is
guaranteed to be in range. This eliminates additional
branching and simplifies shader logic.
Authored by Apple: Michael Parkin-White
Pull Request: https://projects.blender.org/blender/blender/pulls/121117
Add fast image writing and reading variants for render passes.
These variants do not perform range checking on values
and should only be used in cases where the written texel is
guaranteed to be in range. This eliminates additional
branching and simplifies shader logic.
Authored by Apple: Michael Parkin-White
Pull Request: https://projects.blender.org/blender/blender/pulls/121116
While having negative values in the data itself seems fine (at least
there is nothing in the design forbidding it, and it was also allowed in
GPv2 data), drawing code should only accept positive values, and clamp
negative ones to zero:
* It matches GPv2 behavior.
* Drawing code uses negative values as some sort of 'flag' for
rounded tips of strokes.
Note: This is a follow-up of !120840.
Co-authored-by: Falk David <falk@blender.org>
Pull Request: https://projects.blender.org/blender/blender/pulls/122173
The mesh triangulation data is stored in CPU memory with the same format
as the triangles GPU index buffer. Because of that we can skip creating a
temporary copied owned by the GPU API. One way to do that is to just
upload the data directly and avoid keeping a reference to it. However, we
can only upload GPU data from the main thread with OpenGL, so instead
reference the data and keep track of whether to free it.
When drawing a mesh with a single material and 1.8 million faces, this
change gives a 12-15% improvement in framerate, from about 32 to 37 FPS.
Part of #116901.
Pull Request: https://projects.blender.org/blender/blender/pulls/122175
The pixels size was not computed in the right space.
This is the easy fix as refactoring the directional
shadows is too risky/time consuming right now.
Fix#121641
This was caused by wrong assumption that `shadow_radius`
is also a radius for area light. To avoid the confusion,
use a new property for area light shadow scaling.
In `SyncModule::sync_curves/sync_point_cloud` the material variable
can be used after it has been freed. The cause is that cryptomatte
can request a new material, which might trigger a relocation of the
references material.
This PR fixes this by doing the shadow sync before the cryptomatte
sync.
Pull Request: https://projects.blender.org/blender/blender/pulls/121909
Part of #116901.
The only non-obvious part is changing from using the `MR_DATA_LOOP_NOR`
flag as a signal to calculate normals and store them in `MeshRenderData` to
a more explicit check that the normals buffer is requested. In the future hopefully
these dependencies will be refactored to be part of the task graph instead.
Move the responsibility of deciding whether to calculate face corner
normals closer to the buffers and iteration, in preparation for replacing
the `data_flag` and `iter_flag` abstractions from the extractors system.
Part of #116901. Main benefits to this change are reduced function call
overhead and a simpler hot loop when face corner normals are extracted.
Code duplication is also reduced through use of templates. The downside
is that the access patterns aren't clearly better for the BMesh case where
a lot of data is stored array-of-structs style.
Make better use of the lazy calculation by avoiding storing a reference
to the calculated normals in the mesh render data. This reduces memory
usage from 282 MB to 225 MB in a test file with about 2 million faces.
Extract mesh logic to a separate function, and reorder subdiv loose
geometry extraction function. This is mainly to make a future diff
clearer. Also rename a variable to use more typical naming.
This was happening for any surface behind the near plane.
The sample bias is pushing receivers along normals so it
was visible on lights near the clip plane.
Fix#121815
Refactor subtask of #121929
Simplified void create_index_grids(const PBVH_GPU_Args &args, bool do_coarse)
with 2 extra functions that account for 2 code path generating faces indices.
This increase code readability and reasoning about its behavior instead of having a single uber function.
Pull Request: https://projects.blender.org/blender/blender/pulls/122020
A continuation of #116901. This one doesn't have a performance impact
in my testing. It also adds a bit more code compared to main so it isn't
really obviously "better" like the previous refactors. However it does
get us closer to removing the "extractors" callback iteration loop
(`edit_data` is the only other enabled by default), and I'd argue that
the final code is easier to iterate on in the future since it's more
self-contained.
I made an effort to avoid storing restart indices in the index buffers.
Though this requires a bit more calculation on the CPU (particularly
because the hidden gaps in the IBO need to be compressed out), it
reduces overall CPU->GPU traffic and removes the need to strip the
restart indices on Metal.
Pull Request: https://projects.blender.org/blender/blender/pulls/122084
This was broken when introducing the new lod bias
system. This is not compatible with shadow cascade.
Add TODO to implement the bias on CPU when choosing
the Level of detail of the cascade.
The main change is avoid storage of redundant data in the subdivision
draw cache, mainly by replacing reverse lookup from subdivided edge to
coarse edge. This way loops are structured as iteration over coarse
edges instead of iteration over subdivided edges with optional behavior
for vertices with matching base mesh faces. With that inversion the
information in the draw cache is trivial (or duplicated from an array
in `MeshRenderData`), so it's all removed, except for the subdivided
loose edge positions. That array is also shrunk though, by not
duplicating positions in between each subdivided edge. Its calculation
is more efficient for the same reason too.
Overall, besides code simplification, the effect should be lower
overhead with loose edges with GPU subdivision. Admittedly this isn't
a very important use case, but it's part of a general refactor trying
to use better data oriented design in this area (#116901).
Pull Request: https://projects.blender.org/blender/blender/pulls/122071
We call the `shadow.end_sync()` another time when running
the baking pipeline so that the sun shadow maps can
use the correct camera setup and scene bounds.
This adds PDF computation for eval functions as
well as using PDF for spatial denoising.
This seems to not do have any impact but at least
it seems more sounds than bsdf / pdf. Also
computing pdf is cheaper than whole bsdf.
Pull Request: https://projects.blender.org/blender/blender/pulls/121858
The Viewport Compositor only operates on part of the render when doing
viewport rendering when in camera view. That's because the code wrongly
assumed camera offset even when doing viewport renders, which do not
exist in that case.