Ensure that all threads on a multi-core system are used.
The issue was that BLI_task module was trying to be smart and
used heuristic to find optimal number of iterations per thread.
This heuristic assumes that tasks are light-weight, which is
not a case for subdivision surface.
On a higher subdivision level with a file from T70826 the
evaluation time goes down from 0.25 to 0.17 seconds per modifier
evaluation.
When D6189 is finalized we can being some extra performance
improvement.