A major bottleneck of current implementation is the call to create_bindings() for basically every drawcalls.
This is due to the VAO being tagged dirty when assigning a new shader to the Batch, defeating the purpose of the Batch (reuse it for drawing).
Since managing hundreds of batches in DrawManager and DrawCache seems not fun enough to me, I prefered rewritting the batches itself.
--- Batch changes ---
For this to happen I needed to change the Instancing to be part of the Batch rather than being another batch supplied at drawtime.
The Gwn_VertBuffers are copied from the batch to be instanciated and a new Gwn_VertBuffer is supplied for instancing attribs.
This mean a VAO can be generated and cached for this instancing case.
A Batch can be rendered with instancing, without instancing attribs and without the need for a new VAO using the GWN_batch_draw_range_ex with the force_instance parameter set to true.
--- Draw manager changes ---
The downside with this approach is that we must track the validity of the instanced batch (the original one). For this the only way (I could think of) is to set a callback for when the batch is getting free.
This means a bit of refactor in the DrawManager with the separation of batching and instancing Batches.
--- VAO cache ---
Each VAO is generated for a given ShaderInterface. This means we can keep it alive as long as the shader interface lives.
If a ShaderInterface is discarded, it needs to destroy every VAO associated to it. Otherwise, a new ShaderInterface with the same adress could be generated and reuse the same VAO with incorrect bindings.
The VAO cache itself is using a mix between a static array of VAO and a dynamic array if the is not enough space in the static.
Using this hybrid approach is a bit more performant than the dynamic array alone.
The array will not resize down but empty entries will be filled up again. It's unlikely we get a buffer overflow from this. Resizing could be done on next allocation if needed.
--- Results ---
Using Cached VAOs means that we are not querying each vertex attrib for each vbo for each drawcall, every redraw!
In a CPU limited test scene (10000 cubes in Clay engine) I get a reduction of CPU drawing time from ~20ms to 13ms.
The only area that is not caching VAOs is the instancing from particles (see comment DRW_shgroup_instance_batch).
This allows allocation of VAOs from different opengl contexts and thread as long as the drawing happens in the same context.
Allocation is thread safe as long as we abide by the "one opengl context per thread" rule.
We can still free from any thread and actual freeing will occur at new vao allocation or next context binding.
We need to move the render result logic outside the render engine code.
It makes no sense for Eevee/Clay/... to have to re-implement the render resilt
creation logic. Beside the original implementation really got it wrong, by
ignoring the different render layers needed for the final render.
Finally, there is no need to re-create the logic for views. So this was also
fixed.
Note 1: This will break still if the depsgraph of the needed view layers is not
updated / created. We need to address this separately. For now if users want
to test this, just show each view layer in the viewport at least once.
Note 2: We are still getting depsgraph from scene and creating if needed.
`BKE_scene_get_depsgraph(scene, view_layer, true);` according to Sergey we need
to move the render depsgraph for the Render struct instead. I will do it
separately as well.
This is a regression in rB4f1c0a1 which only allowed cutting haior at the
second segment only, while there is nothing wrong with cutting hair at the
first segmewnt.
Don't use dm->get*Array for DM you don't own. This call can allocate temporary
CD layer, which is not thread safe at all.
Also removed hard-coded logic around CDDM check. new functions will do same
logic, but are mode DM-type-=independent.
We shouldn't mix image pool acuisition with and without user provided,
the fact that internally image.c uses last frame from Image datablock
confuses the logic.
Optionally don't remap indices for objects.
Checking all objects parent's would reference a freed pointer
while freeing all objects.
In the case of dynamic topology there is no use in keeping track
of hook/vertex-parent indices.
Also disable this when creating meshes for undo storage
since adding an undo step shouldn't be modifying other objects.
Once 'losing lib' issue is fixed (in previous commit), we have new issue
that this could lead to several copies of the same linked data-block in
.blend file. Which is not good. At all.
So had to add a GHash-based check in libraries reading code to ensure we
only load a same ID from a same lib once.
The issue was that when a same lib was found several times in loaded
.blend, we'd only keep the first occurence. But since Blender expects
next data-blocks to belong to last found library, we could actually
be adding data-blocks assigned to copies of the duplicated lib to
another, totally unrelated lib.
Those data-blocks were then obviously not found when actually loading
libs content, and lost.
Note that this only fix one part of the issue, current code can
generate several copies of same linked data-block now, will fix in
another commit.
While the script should be using INVOKE_PREVIEW for operators in clip view,
window manager was lacking some switch statements.
Thanks Brecht fore review!