This replaces code (pseudo-code):
spin_lock();
update_child_dag_nodes();
schedule_new_nodes();
spin_unlock();
with:
update_child_dag_nodes_with_atomic_ops();
schedule_new_nodes();
The reason for this is that scheduling new nodes implies
mutex lock, and having spin around it is a bad idea.
Alternatives could have been to use spinlock around
child nodes update only, but that would either imply having
either per-node spin-lock or using array to put nodes
ready for update to an array.
Didn't like an alternatives, using atomic operations makes
code much easier to follow, keeps data-flow on cpu nice.
Same atomic ops might be used in other performance-critical
areas later.
Using atomic ops implementation from jemalloc project.