Overall 10% more performance on general UI drawing time.
This commit can introduce ordering problem on some elements.
In this case you need to flush the widget cache to ensure the element that
is going to be drawn is drawn on top of any widget base.
To flush the cache use UI_widgetbase_draw_cache_flush.
This is already done for BLF and Icons.
Replace the 12 iterations of UI_draw_roundbox_4fv with only one batch.
This mean less overdraw and less drawcalls.
I had to hack the opacity falloff curve manually to get approximatly the
same result as previous technique. I'm sure with a bit more brain power
somebody could find the perfect function.
Now use a list of preset batches with a function to add new ones to this
list.
This removes the need of new functions all over the place to reset/exit.
For this we use a new shader that gets it's data from a uniform array.
Vertex shader position the vertices using these data.
Using glUniform is way faster than using imm for that matter.
Like BLF rendering, UI icons are always (as far as I know) non occluded and
displayed above everything else. They also does not overlap with texts so
they can be batched at the same time.
Introduce a UI batch cache. For the moment it's only used by widgetbase so
leaving it interface_widgets.c. If it grows, it can have its own file.
Like all preset batches (batches used by UI context), vaos must be refreshed
each time a new window context is binded.
This still does 3 GWN_batch_draw in the worst cases but at least it does
not use the IMM api.
I will continue and batch the 3 calls together since we are really CPU
bound, so shader complexity does not really matters.
I cannot spot any difference on all the widgets I could test. I did not use
any unit tests so I cannot tell if there is really any defects.
This is not a complete rewrite but it adresses the top bottleneck found
after a profilling session.
Drawcall per window redraw on default layout:
- 4100+ without patch
- 1270 with patch
Theses drawcalls meant a lot of driver overhead since they each correspond
to one glMapBuffer which is slow.
This includes a few modification:
- The biggest one is call glActiveTexture before doing any call to
glBindTexture for rendering purpose (uniform value depends on it).
This is also better to know what's going on when rendering UI. So if
there is missing UI elements because of this commit look for this first.
This allows us to have "less calls" to glActiveTexture (I did not
measure the final count) and less checks inside GPU_texture.
- Remove use of GL_TEXTURE0 as a uniform value in a few places.
- Be more strict and use BLI_assert for bad usage of GPU_texture functions.
- Disable filtering for integer and stencil textures (not supported by
OGL specs).
- Replace bools inside GPUTexture by a bitflag supporting more options to
identify texture types.
- put render iterator in own scope
(would shadow it's own variable if used multiple times).
- enforce semicolon at end of iterator macros.
- no need to typedef one-off macro structs.
Now that the new 3D viewport draws to a multisample offscreen buffer, there is
no good reason anymore to create an entire multisample window and pay the
performance/memory cost for other regions that don't need it.
GL_MULTISAMPLE now only gets enabled for offscreen buffers, so we don't need
to check for it throughout the UI code anymore.
Differential Revision: https://developer.blender.org/D3062