Use BM_iter_parallel for face tessellation, this gives around
6.5x speedup for BM_mesh_calc_tessellation on high poly meshes in my
tests, although exact speedup depends on available cores.
Support thread local storage for BLI_task_parallel_mempool,
as well as support for the reduce and free callbacks.
mempool_iter_threadsafe_* functions have been moved into a private
header thats only shared between task_iterator.c and BLI_mempool.c
so the TLS can be made part of the iterator array without having to
rely on passing in struct offsets.
Add test task.MempoolIterTLS that ensures reduce and free
are working as expected.
Reviewed By: mont29
Ref D11548
Also modified existing utility functions to take
care of the new surface normal interpolation and so on.
Reviewed By: Antonio Vazquez (antoniov)
Differential Revision: https://developer.blender.org/D11543
While this should not be allowed in general, there are some cases
(library overrides at least) where supporting this is mandatory.
Further more, comment stating that this could crash is from 2011, could
not reproduce any issue with current code. Commit comment was referring
to undo/redo, but in use cases here those should not affect things.
Note that in general, such relatively high-level checks should be
handled by high-level, close to user code (like in ED area e.g.), not in
low-level BKE code anyway.
This patch adds a specific extraction method when the mesh has only
one material. This method is multi-threaded.
There is a trade-off in this patch as the ibo isn't compressed (it adds
restart indexes for hidden faces). So it depends if threading is faster
than the additional GPU buffer upload.
# Subdivided cube
I used a cube subdivided 7 times, modifiers applied. that gives around 400000 faces.
The test is selecting some vertices and move them. During this test the next buffers are updated on each frame:
* vbo.pos_nor
* vbo.lnor
* vbo.edit_data
* ibo.tris
* ibo.points
System info:
|platform| Linux-5.11.0-7614-generic-x86_64-with-glibc2.33|
| renderer| AMD SIENNA_CICHLID (DRM 3.40.0, 5.11.0-7614-generic, LLVM 11.0.1)|
|vendor| AMD|
|version| 4.6 (Core Profile) Mesa 21.0.1|
|cpu| Intel(R) Core(TM) i7-6700 CPU @ 3.40GHz|
|compiler| gcc version 10.3.0|
Timing have been measured using DEBUG_TIME in `draw_cache_extract_mesh`.
master: `rdata 8ms iter 45ms (frame 153ms)`
this patch `rdata 6ms iter 36ms (frame 132ms)`
Reviewed By: mano-wii
Maniphest Tasks: T88352
Differential Revision: https://developer.blender.org/D11290
This commit adds a flag to disable displaying some socket labels which
just redundant eye sores. We still want to have a label, because that
information can potentially be accessed elsewhere in the UI.
The flag is used in a few geometry nodes.
Differential Revision: https://developer.blender.org/D11540
This is an optimization, but the difference is still not that
significant as some extractions are still done in single thread.
**Benchmarking**
||before:|after:
|---|---|---|
|large_mesh_editing:|Average: 14.246502 FPS|Average: 15.438118 FPS
||rdata 9ms iter 31ms (frame 69ms)|rdata 9ms iter 27ms (frame 65ms)
|large_mesh_editing_ledge: |Average: 14.913622 FPS|Average: 15.856538 FPS
||rdata 9ms iter 30ms (frame 67ms)|rdata 9ms iter 26ms (frame 63ms)
|looptris_test:|Average: 3.970774 FPS|Average: 4.095200 FPS
||rdata 11ms iter 90ms (frame 235ms)|rdata 12ms iter 87ms (frame 229ms)
Reviewed By: jbakker
Differential Revision: https://developer.blender.org/D11467
A single threaded task with thread data over 8192 bytes would leak.
While this didn't happen in practice, it could cause issues in the
future.
The free call for `task_parallel_iterator_do` wasn't running
if callbacks weren't set, also not an issue in practice but avoids
potential problems in the future too.
Changes introduced in commit rBe9f2f17e8518
can create different render results when there is
a Math or Mix operation after TextureOperation
on tiled execution model.
This is due to WriteBufferOperation forcing a single pixel
resolution when these operations use a preferred
resolution of 0 to check if their inputs have resolution.
Fixing this behaviour creates different renders too.
This patch keeps previous tiled implementation and
adds the new implementation only for full frame execution.
Reviewed By: Jeroen Bakker (jbakker)
Differential Revision: https://developer.blender.org/D11546
In order to reduce stack size this patch converts full frame
recursive methods into iterative.
- No functional changes.
- No performance changes.
- Memory peak may slightly vary depending on the tree because
now breadth-first traversal is used instead of depth-first.
Tests in D11113 have same results except for test1 memory peak:
360MBs instead of 329.50MBs.
Reviewed By: Jeroen Bakker (jbakker)
Differential Revision: https://developer.blender.org/D11515
During transforming an image, a matrix multiplication per pixel was done.
The matrix in itself is always linear so it could be replaced by two additions.
During testing in debug builds playing back a movie went from 20fps to
300 fps.
Reviewed By: zeddb
Differential Revision: https://developer.blender.org/D11533
Talked with Bastien and we ended up looking into this. Issue is that the
dupliation through drag & drop should also be considered a
"sub-process", like Shift+D duplicating does. Added a comment explaining
why this is needed.
Current index builder is designed to be used in a single thread.
This makes all index buffer extractions single threaded.
This patch adds a thread safe solution enabling multithreaded
building of index buffers.
To reduce locking the solution would provide a task/thread local
index buffer builder (called sub builder).
When a thread is finished this thread local index buffer builder
can be joined with the initial index buffer builder.
`GPU_indexbuf_subbuilder_init`: Initialized a sub builder. The
index list is shared between the parent and sub buffer, but the
counters are localized. Ensuring that updating counters would
not need any locking.
`GPU_indexbuf_subbuilder_finish`: merge the information of the
sub builder back to the parent builder. Needs to be invoked outside
the worker thread, or when sure that all worker threads have been
finished. Internal the function is not thread safe.
For testing purposes the extract_points extractor has been migrated to
the new API. Herefore changes to the mesh extractor were needed.
* When creating tasks, the task number of current task is stored in
ExtractTaskData including the total number of tasks.
* Adding two functions in `MeshExtract`.
** `task_init` will initialize the task specific userdata.
** `task_finish` should merge back the task specific userdata back.
* adding task_id parameter to the iteration functions so they can
access the correct task data without any need for locking.
There is no noticeable change in end user performance.
Reviewed By: mano-wii
Differential Revision: https://developer.blender.org/D11499
This node creates a boolean face attribute that is "true" for
every face that has the given material.
Differential Revision: https://developer.blender.org/D11324
Selection of an FCurve with box/circle select now selects the entire
curve and all its keys:
- Box selecting a curve selects all the keyframes of the curve.
- Ctrl + box selecting of the curve deselects all the keyframes of the
curve.
- Shift + box selecting of the curve extends the keyframe selection,
adding all the keyframes of the curves that were just selected to the
selection.
- In all cases, if the selection area contains a key, nothing is
performed on the curves themselves (the action only impacts the
selected keys).
Reviewed By: sybren, #animation_rigging
Differential Revision: https://developer.blender.org/D11181
Some of the primitive nodes can return null in an error condition.
This is confusing mixed with adding a maderial slot in calling
functions. This is the second crash caused by that confusion. It's
simpler to add the slot right when allocating the mesh, and it will
lend itself better to copy & paste coding in the future.
Differential Revision: https://developer.blender.org/D11530
Under some circumstances using task isolation can cause deadlocks.
Previously, our task pool implementation would run all tasks in an
isolated region. Now using task isolation is optional and can be
turned on/off for individual task pools.
Task pools that spawn new tasks recursively should never enable
task isolation. There is a new check that finds these cases at runtime.
Right now this check is disabled, so that this commit is a pure refactor.
It will be enabled in an upcoming commit.
This fixes T88598.
Differential Revision: https://developer.blender.org/D11415
Simplify vertex normal calculation by moving the main normal
accumulation function to operate on vertices instead of faces.
Using faces had the down side that it needed to zero, accumulate and
normalize the vertex normals in 3 separate passes, accumulating also
needed a spin-lock for thread since the face would write it's normal
to all of it's vertices which could be shared with other faces.
Now a single loop over vertices is performed without locking.
This gives 5-6% speedup calculating all normals.
This also simplifies partial updates, fixing a problem where
all connected faces were being read from when calculating normals.
While this could have been resolved separately,
it's simpler to operate on vertices directly.
Move BMesh conversion and all loading code into worker.
Reviewed By: Sebastian Parborg (zeddb)
Differential Revision: https://developer.blender.org/D11288
Added a new api function to stich multires grids
on specific faces in a mesh,
subdiv_ccg_average_faces_boundaries_and_corners,
and changed multires normal calc to use it.
VTune profiling showed that this was a major
performance hit once you get above 10,000 or so
base mesh faces and/or have a high number of
subdivision levels.
Here's a video comparing the difference. Note the
bpy.app_debug switch is not in the final commit.
{F10145323}
And the .blend file:
{F10145346}
Reviewed By: Sergey Sharybin (sergey)
Differential Revision:
https://developer.blender.org/D11334