Together with the extended loop callback and userdata_chunk, this allows to perform
cumulative tasks (like aggregation) in a lockfree way using local userdata_chunk to store temp data,
and once all workers have finished, to merge those userdata_chunks in the finalize callback
(from calling thread, so no need to lock here either).
Note that this changes how userdata_chunk is handled (now fully from 'main' thread,
which means a given worker thread will always get the same userdata_chunk, without
being re-initialized anymore to init value at start of each iter chunk).
New code is actually much, much better than first version, using 'fetch_and_add' atomic op
here allows us to get rid of the loop etc.
The broken CAS issue remains on windows, to be investigated...
Some variants of gcc compilation were reporting 'control reaching end of non-void function' error
in this switch/case maze. Either use break everywhere or not at all (which is simpler, since we
only always return anyway...).
There are some serious issues under windows, causing deadlocks somehow (not reproducible under linux so far).
Until further investigation over why this happens, better to revert to previous
spin-locked behavior.
This reverts commits a83bc4f597 and 98123ae916.
Reading the shared state->iter value after storing it in the 'reference' var could in theory
lead to a race condition setting state->iter value above state->stop, which would be 'deadly'.
This **may** be the cause of T48422, though I was not able to reproduce that issue so far.
When writing temp blenbuffer file, libraries of linked datablocks where not tagged correctly, which
means they were not put in the temp Main used to write the buffer, resulting in implicit localization
of linked data.
Previous to 2.77, this used to be default behavior, was changed in rB591f4549c958b.
However, in most append cases, you do want a full localization of your data, so this new behavior
is kept by default, but there is now an option in append operator to only localize the 'first level'
of data (i.e. datablocks from linked library itself, and not those from other 'sub-libraries').
On big and complex rigs like blendrig or koro, it can give up to ~10% more FPS in best cases.
Hard to tackle all cases in tests though, so please report any unexpected slowdown
in armature animation playback!
Line lengths, monolined 'if' statements, int -> bool, etc.
Also, replaced some internal cooked stuff by BLI helpers (most notably, the
'is inside UV triangle' code in `dynamicPaint_createUVSurface()`), and some
other minor optimizations.
Previously if image only had single channel only z buffer value was displaying.
This isn't handy for cases when you've got single channel buffer which is not
a z buffer.
Also fixed possible read past the array.
Take advantage of the efficiency provided by the snap_context.
Also fixes errors:
- volume snap fails based on view angle (T48394).
- multiple instances of dupli-objects break volume calculation.