Add a `--profile-gpu` launch argument.
When set, it generates a profile in the Trace Event Format with CPU and
GPU metrics based on GPU debug scopes.
https://profilerpedia.markhansen.co.nz/formats/trace-event-format/
The profiles are best viewed at https://ui.perfetto.dev/
Notes:
- The profiler captures everything form app start to exit.
- Being JSON based the profiles can become relatively large, but they
compress very well.
- Only OpenGL profiling is supported for now, but the report formatting
code can be shared across backends.
Pull Request: https://projects.blender.org/blender/blender/pulls/133557
Previously we generally expected CustomData layers to have implicit
sharing info, but we didn't require it. This PR clarifies that we do
require layers with non-null data to have implicit sharing info. This
generally makes code simpler because we don't have to have a separate
code path for non-shared layers. For example, it makes the "totelem"
arguments for layer freeing functions unnecessary, since shared data
knows how to free itself. Those arguments are removed in this PR.
Pull Request: https://projects.blender.org/blender/blender/pulls/134578
All 2D vectors related to image transform code were changed to float2.
Previously, it was decided, that 4x4 matrix should be used for 2D
affine transform, but this is changed to 3x3 now.
Texture painting code did rely on `IMB_transform` with 4x4 matrix.
To avoid large changes, I have added function
`BLI_rctf_transform_calc_m3_pivot_min`.
Main motivation is cleaner code - ease of use of c++ API, and avoiding
returning values by arguments.
Pull Request: https://projects.blender.org/blender/blender/pulls/133692
Avoid calling `GPU_indexbuf_add_line_verts` and the triangle
version of that function. It's faster to avoid function calls and
just write to the data arrays directly. I did some very rough tests
and observed about a 10% improvement in runtime for the
entire index buffer creation process.
This is due to hardcoded color and subdivision value. Also scale and
offset properties stored in overlay stuct was not considered. Now
multiply the transform matrix with `grid_mat` to make use of these
properties.
Pull Request: https://projects.blender.org/blender/blender/pulls/134382
This patch removes the compositor texture pool implementation which
relies on the DRW texture pool, and replaces it with the new texture
pool implementation from the GPU module.
Since the GPU module texture pool does not rely on the global DST, we
can use it for both the viewport compositor engine and the GPU
compositor, so the virtual texture pool implementation is removed and
the GPU texture pool is used directly.
The viewport compositor engine does not need to reset the pool because
that is done by the draw manager. But the GPU compositor needs to reset
the pool every evaluation. The pool is deleted directly after rendering
using the render pipeline or through RE_FreeUnusedGPUResources for the
interactive compositor.
Pull Request: https://projects.blender.org/blender/blender/pulls/134437
In certain setups where passes are used in the viewport compositor,
blender will crash. This happens because passes may not be available
when the compositor first run but then become available in later runs.
Possibly because EEVEE is still compiling shaders. This is problematic
for the compositor because it caches the result of node tree compilation
for the specific data available, like passes, and the compositor does
not get informed when data becomes available like in the case of EEVEE
to invalidate the cached node tree compilation result.
Caching of node tree compilation was always a source of bugs but we
managed to workaround them in the past, so before we work on a fix for
this crash, we first evaluate the removal of caching to see if we can
live without it. Especially since a fix will be rather involved for the
release branch at this stage.
The time it takes to compile the node tree is:
- Small Tree (~10 nodes): 0.3ms.
- Medium Tree (~50 nodes): 0.6ms.
- Huge Tree (~300 nodes): 3ms.
The difference is not noticeable to the eye, probably since as the tree
becomes bigger, the evaluation time becomes more dominant, and small
trees are fast to compile.
It should be noted that we intended to remove caching in the future to
support things like lazy evaluation of node inputs, but we though a few
optimization needs to be done on the GPUMaterial compiler side to make
compilation faster, since it is the main bottleneck during compilation.
So considering this, I think it is acceptable to disable caching of node
tree compilations for the time being. I intend to optimize it such that
it always becomes less than 1ms, but we will have to delay that to 4.5.
Pull Request: https://projects.blender.org/blender/blender/pulls/134394
The root cause is still unknown. But this patch disables
rendering of objects that will produce no volumetric effect.
This does fix the issue reported.