For this we use a new shader that gets it's data from a uniform array.
Vertex shader position the vertices using these data.
Using glUniform is way faster than using imm for that matter.
Like BLF rendering, UI icons are always (as far as I know) non occluded and
displayed above everything else. They also does not overlap with texts so
they can be batched at the same time.
Introduce a UI batch cache. For the moment it's only used by widgetbase so
leaving it interface_widgets.c. If it grows, it can have its own file.
Like all preset batches (batches used by UI context), vaos must be refreshed
each time a new window context is binded.
This still does 3 GWN_batch_draw in the worst cases but at least it does
not use the IMM api.
I will continue and batch the 3 calls together since we are really CPU
bound, so shader complexity does not really matters.
I cannot spot any difference on all the widgets I could test. I did not use
any unit tests so I cannot tell if there is really any defects.
This is not a complete rewrite but it adresses the top bottleneck found
after a profilling session.
Drawcall per window redraw on default layout:
- 4100+ without patch
- 1270 with patch
Theses drawcalls meant a lot of driver overhead since they each correspond
to one glMapBuffer which is slow.
This includes a few modification:
- The biggest one is call glActiveTexture before doing any call to
glBindTexture for rendering purpose (uniform value depends on it).
This is also better to know what's going on when rendering UI. So if
there is missing UI elements because of this commit look for this first.
This allows us to have "less calls" to glActiveTexture (I did not
measure the final count) and less checks inside GPU_texture.
- Remove use of GL_TEXTURE0 as a uniform value in a few places.
- Be more strict and use BLI_assert for bad usage of GPU_texture functions.
- Disable filtering for integer and stencil textures (not supported by
OGL specs).
- Replace bools inside GPUTexture by a bitflag supporting more options to
identify texture types.
- put render iterator in own scope
(would shadow it's own variable if used multiple times).
- enforce semicolon at end of iterator macros.
- no need to typedef one-off macro structs.
Now that the new 3D viewport draws to a multisample offscreen buffer, there is
no good reason anymore to create an entire multisample window and pay the
performance/memory cost for other regions that don't need it.
GL_MULTISAMPLE now only gets enabled for offscreen buffers, so we don't need
to check for it throughout the UI code anymore.
Differential Revision: https://developer.blender.org/D3062
- Read-only access can often use EvaluationContext.object_mode
- Write access to go to WorkSpace.object_mode.
- Some TODO's remain (marked as "TODO/OBMODE")
- Add-ons will need updating
(context.active_object.mode -> context.workspace.object_mode)
- There will be small/medium issues that still need resolving
this does work on a basic level though.
See D3037
This is rather a workaround to avoid main thread freeing all glyph caches
at the same time as sequencer uses fonts to draw text sequences.
Ideally we need to either make cache more local, or user-counted or to make
somewhat more global locks. All this ends up in a bigger refactor which is
better for 2.8. For the meantime let's make Blender more stable with a tiny
workaround.
Downside is that keeping zooming things up and down in interface during render
will increase memory usage by unused glyph caches. It's not too bad though,
all unused caches will be freed first time at area zoom after render.
Thanks Bastien for review!