Commit Graph

142 Commits

Author SHA1 Message Date
Clément Foucault
8b74741b9e GWN: Perf: Bypass glUseProgram(0)
I left a flag to quickly debug if something is wrong.
But now that everything uses shader, it seems to be alright since a shader
is always set active before drawing.
2018-03-30 23:27:45 +02:00
Clément Foucault
8568d38f1b GWN: Add GWN_vertbuf_vertex_count_set.
This allows us to specify a the number of vertices to upload to the gpu.
This is to keep the same allocation on the System Memory but send the least
amount of data to the GPU/Driver.
2018-03-30 20:09:26 +02:00
Clément Foucault
c48b6fae9a GWN: Add immVertex4f. 2018-03-29 21:32:26 +02:00
Clément Foucault
3c48a21833 GWN: Batch: Add GWN_batch_uniform_4fv_array 2018-03-29 14:22:50 +02:00
Germano
75c6119dc9 Fix: GWN Indexbuf creation was replacing the index buff bound to the last VAO
This led to problems such as the drawing of the navigate manipulator.
More details in the code comments.
2018-03-21 11:55:38 -03:00
Clément Foucault
b5bf3011bf GWN: Vertex Buffer: Remove the use of glMapBufferRange
We revert to the malloc/realloc and manually manage the upload.
There seems to be a performance penalty from using glMapBuffer on some
hardware, prefering way is glBufferData(NULL) with glBufferSubData.
2018-03-19 16:13:00 +01:00
Clément Foucault
772e558e92 GWN: Perf: Use unsync glMapBufferRange to prevent sync time. 2018-03-19 14:14:32 +01:00
Clément Foucault
f2ae7796c3 GWN: Context: Use <unordered_set> instead of <forward_list>
We cannot have duplicates so unordered_set is better suited for this case.

Removing batches is now constant time on average instead of linear.
2018-03-19 14:14:32 +01:00
Clément Foucault
c2f36c3558 GWN: Element Buffer: Refactor / Optimisation.
- Upload the data to the GPU directly when creating the element buffer in
   GWN_indexbuf_build_in_place().

 - Convert data in place when squeezing the indices and removing the need
   for another allocation.

 - GWN_indexbuf_build_in_place() can be used with already used element
   buffers and reupload their data without changing vbo id (keeping vaos
   up to date).
2018-03-17 18:23:04 +01:00
Clément Foucault
87d88581aa GWN: Vertex Buffer refactor.
We now alloc a vbo id on creation and let OpenGL manage its memory directly.
We use glMapBuffer to get this memory location.

This enables us to reuse and modify any vertex buffer directly without
destroying it with its associated Batches.

This commit does not really improve performance but will let us implement
more optimizations in the future.

We can also resize the buffer even if this can be slow if we need to keep
the existing data.

The addition of the usage hint makes dynamic buffers not a special case
anymore, simplifying things a bit.
2018-03-17 17:02:07 +01:00
Clément Foucault
7e954d974a GWN: Uncomment a (now) usefull assert 2018-03-16 08:50:31 +01:00
Germano
cca1e1b707 UV/Image Editor: Optimize UV Drawing
Use batchs to store the entire buffer of loops before drawing.
These batchs can be stored in the mesh draw cache later.
2018-03-15 13:36:16 -03:00
Clément Foucault
a769cae241 GWN: Fix compilation error without VRAM_USAGE flag. 2018-03-15 13:58:25 +01:00
Clément Foucault
6fa4001824 GWN: Batch: Perf: Comment out glBindVertexArray(0)
Even if they are for safety they are not free to use !

On my system (Mesa + AMD Vega GPU) calling:
glBindVertexArray(1);
glDrawArrays(GL_TRIANGLES, 0, 3);
glBindVertexArray(0);
in a loop, shows the same overhead as a full vao switching (which is more
or less 10 times slower than just calling glDrawArrays)

Moreover, now that we use OpenGL 3.3 binding a VAO is REQUIRED to issue a
drawcall so it is garanted to be overwritten before the next drawcall.
Problem can only happen if someone draws directly with opengl commands.
2018-03-14 22:44:27 +01:00
Clément Foucault
75de653e4d GWN: Batch: Only revert to default Vao when needed.
Drawing ranges via glDrawArrays is already supported and should not need
a manual offset in the VAO like glDrawArraysInstanced or glDrawElements.
2018-03-14 22:44:27 +01:00
Clément Foucault
4ecc8b6786 GWN: Add primitive restart in element/index buffers.
This allows to draw multiple primitive of the type
GWN_PRIM_LINE_STRIP
GWN_PRIM_LINE_LOOP
GWN_PRIM_TRI_STRIP
GWN_PRIM_TRI_FAN
GWN_PRIM_LINE_STRIP_ADJ
with only one drawcall. This should speed up some areas that are really
sensitive to drawcall counts : UV drawing, Hair drawing...
2018-03-14 22:44:27 +01:00
Clément Foucault
dfd8a52cd2 DRW: Change clip planes API.
The draw manager now just set the number of active clip planes. It's now up to the engine to specify the plane equations as uniform/ubo/constant.
2018-03-10 02:18:25 +01:00
Clément Foucault
d63829117c DRW: Refactor simple instancing.
Instead of creating a new instancing shading group without attrib, we now have instancing calls. The benefits is that they can be culled.
They can be used in conjuction with the standard and generate calls but shader must support it (which is generally not the case).
We store a pointer to the actual count so that the number can be tweaked between redraw.

This will makes multi layer rendering more efficient.
2018-03-02 18:35:59 +01:00
Clément Foucault
be284c82d4 GWN: Query builtin uniform at shader creation.
This avoids having non null entries in shaderface->builtin_uniforms and a redundant check.
2018-02-27 14:50:16 +01:00
Joshua Leung
3d7235fc87 MSVC 2013 Compile Fix/Workaround for "static thread_local" vars
Apparently MSVC 2013 has trouble with stuff that's been declared
"static thread_local" (and/or maybe even the "thread_local" keyword).

https://stackoverflow.com/questions/29399494/what-is-the-current-state-of-support-for-thread-local-across-platforms
2018-02-27 11:23:22 +01:00
Clément Foucault
e94276d403 GWN: Fix glitches when closing a window. 2018-02-26 20:09:54 +01:00
Clément Foucault
f4cc9ba4c3 V3D: Vertex selection: Fix opengl error. 2018-02-26 20:07:39 +01:00
Clément Foucault
241c90c92d DRW/GWN: Bypass glUseProgram.
Turns out to be the call that was destroying performance.

I get 18ms->6ms improvement of drawing time with 10 000 unique objects.

And we can still improve upon this!
2018-02-25 17:59:46 +01:00
Clément Foucault
e7c4a9d1ef GWN: Fix immediate mode when closing a window. 2018-02-22 19:49:59 +01:00
Clément Foucault
5aff002f7b GWN: Context: Fix allocation/codestyle and crash on startup. 2018-02-22 14:31:40 +01:00
Clément Foucault
cc05b661f7 GWN: Fix use after free crash.
This is not an ideal solution but blender freeing system is already well tangled.
So tracking and clearing vao caches when destroying contexts does prevent bad behaviour.
2018-02-22 12:39:57 +01:00
Germano
04964ff1f4 GWN: Fix compilation on windows 2018-02-21 18:58:29 -03:00
Clément Foucault
7be1928ea1 Gawain: VertexFormat: Cleanup
Reorganize struct elements by size, rename a constant.
2018-02-21 15:28:26 +01:00
Clément Foucault
c5eba46d7f Gawain: Refactor: VAOs caching AND use new VAOs manager.
A major bottleneck of current implementation is the call to create_bindings() for basically every drawcalls.
This is due to the VAO being tagged dirty when assigning a new shader to the Batch, defeating the purpose of the Batch (reuse it for drawing).

Since managing hundreds of batches in DrawManager and DrawCache seems not fun enough to me, I prefered rewritting the batches itself.

--- Batch changes ---
For this to happen I needed to change the Instancing to be part of the Batch rather than being another batch supplied at drawtime.
The Gwn_VertBuffers are copied from the batch to be instanciated and a new Gwn_VertBuffer is supplied for instancing attribs.
This mean a VAO can be generated and cached for this instancing case.

A Batch can be rendered with instancing, without instancing attribs and without the need for a new VAO using the GWN_batch_draw_range_ex with the force_instance parameter set to true.

--- Draw manager changes ---
The downside with this approach is that we must track the validity of the instanced batch (the original one). For this the only way (I could think of) is to set a callback for when the batch is getting free.
This means a bit of refactor in the DrawManager with the separation of batching and instancing Batches.

--- VAO cache ---
Each VAO is generated for a given ShaderInterface. This means we can keep it alive as long as the shader interface lives.
If a ShaderInterface is discarded, it needs to destroy every VAO associated to it. Otherwise, a new ShaderInterface with the same adress could be generated and reuse the same VAO with incorrect bindings.
The VAO cache itself is using a mix between a static array of VAO and a dynamic array if the is not enough space in the static.
Using this hybrid approach is a bit more performant than the dynamic array alone.
The array will not resize down but empty entries will be filled up again. It's unlikely we get a buffer overflow from this. Resizing could be done on next allocation if needed.

--- Results ---
Using Cached VAOs means that we are not querying each vertex attrib for each vbo for each drawcall, every redraw!
In a CPU limited test scene (10000 cubes in Clay engine) I get a reduction of CPU drawing time from ~20ms to 13ms.

The only area that is not caching VAOs is the instancing from particles (see comment DRW_shgroup_instance_batch).
2018-02-21 15:28:26 +01:00
Clément Foucault
1b3f9ecd0d Gawain: Add new context/vao manager.
This allows allocation of VAOs from different opengl contexts and thread as long as the drawing happens in the same context.

Allocation is thread safe as long as we abide by the "one opengl context per thread" rule.

We can still free from any thread and actual freeing will occur at new vao allocation or next context binding.
2018-02-21 15:28:26 +01:00
Clément Foucault
a24be95b0f GWN: Fix ubo debug printf 2018-02-15 19:16:08 +01:00
Clément Foucault
e401e2d89c GWN: Fix attrib arrays giving incorrect name depending on the platform.
It seems that some opengl implementations are returning "[0]" after array names but some others dont.

Remove the "[0]" so everything is consistent.
2018-02-15 19:16:08 +01:00
Clément Foucault
ab7e7a005b GWN: Add new dynamic type of batches and remove
Theses batches keeps their memory chuck allocated after transfer to be reused and updated very often.

NOTE: This commit break instancing in DRW. (it's fixed in the next commit)
2018-02-14 18:59:42 +01:00
Clément Foucault
1e9ef2a25e GWN: Add GWN_batch_draw_procedural
This allow to drawn large amounts of primitives without any memory footprint.
2018-02-14 18:59:42 +01:00
Clément Foucault
0f3bc636c8 GWN: Allow drawing instances without batch_instancing 2018-02-14 18:59:42 +01:00
Clément Foucault
01244df007 DRW: Refactor: Make use of the new Gawain long attrib support. 2018-02-14 18:59:42 +01:00
Clément Foucault
df86e9cab5 GWN: Extend support for multiple of 4 components in batches. 2018-02-14 18:59:42 +01:00
Clément Foucault
27a7174546 GWN: Fix style and line of code that does nothing! 2018-02-14 18:59:41 +01:00
Clément Foucault
a5afe13e1c GWN: Add support for 4x4 Matrices and instancing attributes.
Only support float matrices specifically for code simplicity.
2018-02-14 18:59:41 +01:00
Clément Foucault
35ac496dbd Gawain: Fix codestyle. 2018-01-09 15:37:00 +01:00
Clément Foucault
2237ee3ed7 Gawain: VBO: Add possibility to use external datablock.
Adds the possibility to specify the data buffer directly and precise ownership.
By not passing ownership to gawain the memory block can be reused.
2018-01-09 14:54:11 +01:00
Clément Foucault
b300fa4923 Gawain: Modify batch draw function to work with ranges.
This enables to draw the same vbo but only with a selected range. (useful for selection with instancing/batching)
2018-01-09 14:54:11 +01:00
Clément Foucault
ef1918d312 Gawain: Fix instancing messing next draw.
Everything was fine if one batch is always used with instancing. But problem arise if the next drawcall for this batch is not using instancing as the attrib divisor stays set to 1 in th VAO.

As instancing is less used than normal drawing I prefer to reset the divisor after drawing as it is reset before drawing instances.
2017-10-11 02:15:42 +02:00
Clément Foucault
f7db1a4366 Gawain: Make common uniforms become builtins
This improves eevee's cache performance by 13% in my test.
2017-10-08 15:49:25 +02:00
Clément Foucault
a4a5637d7a Gawain: Reduce shader interface bucket size
Tried 101 but it gives colisions.
I think 257 is enough now that we dont have thousands of uniforms.
This gives some noticeable performance improvement.
Could be refined further.
2017-10-06 16:25:50 +02:00
Clément Foucault
f94f141f24 Gawain: Add UBOs to shader interface. 2017-10-06 16:25:50 +02:00
Clément Foucault
d7d32ad452 Gawain: Simplify / optimize the shader interface.
This changes quite a few things:
- Drops the allocation of inputs as a chunk.
- Merge the linked list system into the Gwn_ShaderInput.
- Put name buffer into another memory block, easily resizable.
- Use offset instead of char* to direct to input name.
- Add only requested uniforms dynamicaly to the Shader Interface.

This drops some minor optimisation and use a bit more memory for small shaders (which are fixed count).
But this saves a lot of memory when using UBOs because the names and the Gwn_ShaderInput were alloc'ed for every UBO variable.
This also reduce the Shader Interface initial generation.
The lookup time is left unchanged.
2017-10-06 01:50:51 +02:00
Clément Foucault
9ab3db11c7 Revert "Gawain: Optimize out extra level on top of ShaderInput"
This reverts commit 5514d2df1c.
2017-10-06 01:50:16 +02:00
Sergey Sharybin
5514d2df1c Gawain: Optimize out extra level on top of ShaderInput
This is an internal structure, and we don't put it to a list for anything else
that hash collision resolution. No need to have dedicated entry here, saves us
from extra allocation and pointer dereference.
2017-10-05 18:38:23 +05:00
Sergey Sharybin
0b5bdc4265 Gawain: Make builtin uniform lookup to be O(1) 2017-10-05 16:19:14 +05:00