This change brings back old original logic which was checking
whether worker threads do fit into an active CPU group. But
it does it a bit smarter now and is also checking affinity
within that group. This way Cycles will use all threads on a
Threadripper2 CPU if it's set to automatic number of threads,
but on another hand will not change affinity if user requested
16 threads and changed Blender affinity.
Tweaked scheduling so it survives this situation by scattering
"extra" threads uniformly over all the NUMA nodes.
There are still tweaks possible to make some specific hardware
configurations work better.
The issue was introduced by a Threadripper2 commit back in
ce927e15e0. This boils down to threads inheriting affinity
from the parent thread. It is a question how this slipped
through the review (we definitely run benchmark round).
Quick fix could have been to always set CPU group affinity
in Cycles, and it would work for Windows. On other platforms
we did not have CPU groups API finished.
Ended up making Cycles aware of NUMA topology, so now we
bound threads to a specific NUMA node. This required adding
an external dependency to Cycles, but made some code there
shorter.
There are some changes in API of OpenImageIO, but those are quite
simple to keep working with older and newer library versions.
Reviewers: brecht
Reviewed By: brecht
Differential Revision: https://developer.blender.org/D4064
This commit adds a sample-based profiler that runs during CPU rendering and collects statistics on time spent in different parts of the kernel (ray intersection, shader evaluation etc.) as well as time spent per material and object.
The results are currently not exposed in the user interface or per Python yet, to see the stats on the console pass the "--cycles-print-stats" argument to Cycles (e.g. "./blender -- --cycles-print-stats").
Unfortunately, there is no clear way to extend this functionality to CUDA or OpenCL, so it is CPU-only for now.
Reviewers: brecht, sergey, swerner
Reviewed By: brecht, swerner
Differential Revision: https://developer.blender.org/D3892
For some reason when building with gcc-7.2 (which is default
in previous Ubuntu LTS) the guarded allocator is not being
properly instantiated.
Doesn't happen with newer version of gcc-7 which is 7.3, and
also doesn't happen with gcc-6 and gcc-8.
Would be nice to know what is wrong, but for the time being
committing workaround which keeps Blender users happy.
Note that this is turned off by default and must be enabled at build time with the CMake WITH_CYCLES_EMBREE flag.
Embree must be built as a static library with ray masking turned on, the `make deps` scripts have been updated accordingly.
There, Embree is off by default too and must be enabled with the WITH_EMBREE flag.
Using Embree allows for much faster rendering of deformation motion blur while reducing the memory footprint.
TODO: GPU implementation, deduplication of data, leveraging more of Embrees features (e.g. tessellation cache).
Differential Revision: https://developer.blender.org/D3682
This allows for extra output passes that encode automatic object and material masks
for the entire scene. It is an implementation of the Cryptomatte standard as
introduced by Psyop. A good future extension would be to add a manifest to the
export and to do plenty of testing to ensure that it is fully compatible with other
renderers and compositing programs that use Cryptomatte.
Internally, it adds the ability for Cycles to have several passes of the same type
that are distinguished by their name.
Differential Revision: https://developer.blender.org/D3538
While building the AVX kernel, util_avxf.h/avxb.h were using some AVX2 intrinsics,
these were never called, so it wasn't a run-time issue, but the intrinsics headers
on centos excluded the AVX2 prototypes when building the AVX kernel causing build errors.
This commit cleans up the improper usage of the AVX2 intrinsics and provides AVX
fallback implementations for future use.
Differential Revision: https://developer.blender.org/D3696