The threaded code is twice quicker now (from an average of 20ms/frame to 10ms/frame
while baking 10000 particles here e.g.)!
Think this is mostly due to usage of 'dynamic' scheduler in OMP code though,
from my experience so far this tends to have dramatic effects over performances,
static scheduler is usually much much more efficient.