From recent experience, turns out we often do want to use something else than basic
range of parallelized forloop as control parameter over threads usage, so now BLI func
only takes a boolean, and caller defines best check for its own case.
This time, with have over 300% speedup!
But no, this is not due to switch to BLI_task (which 'only' gives usal 15% speedup),
but to enhancement of the algorithm, flatten loop over covariance matrix items now allows
to compute (usually) all items in parallel, instead of having at most 3 or 4 working threads
(with unbalanced load even)...
Gives the usual 10%-30% speedup on affected functions themselves (BLI_bvhtree_overlap() and
BLI_bvhtree_balance()), and about 2% speedup to overall cloth sim e.g. (measured from
main Cloth modifier func).
When called with very small range, `BLI_task_parallel_range_ex()` would generate a zero `chunk_size`,
leading to some infinite looping in `parallel_range_func` due to `parallel_range_next_iter_get` returning
true without actually increasing the counter!
So now, we ensure `chunk_size` and `num_tasks` are always at least 1 (and avoid generating too much tasks too).
Keep index using the outer scope for GHASH iter macros,
while its often nice, in some cases to declare in the for loop,
it means you cant use as a counter after the loop exits, and in some cases signed/unsigned may matter.
API changes should really be split off in their own commits too.
GPU_buffer no longer has a fallback to client vertex arrays, so remove
comments about it.
Changed a few internal structs/function interfaces to use bool where
appropriate.
Use for-loop scope and flexible declaration placement. PBVH does the
same thing but needs ~150 fewer lines to do it!
The change to BLI_ghashIterator_init is admittedly hackish but makes
GHASH_ITER_INDEX nicer to use.
This mimics OpenMP's 'firstprivate' feature. It is sometimes handy to have some persistent local data during a whole chunk.
Reviewers: sergey
Reviewed By: sergey
Subscribers: campbellbarton
Differential Revision: https://developer.blender.org/D1635
This performs a range search on the kdtree, running a callback instead of allocating an array.
Allows the caller to perform extra checks in the case of overlap,
avoids redundant array allocations, since caller can handle matches.