The workaround of forcing BVH building into single thread
execution on the Blender side is not needed anymore,
because the problem was properly fixed in the upstream
since Embree upgrade in Blender 4.5
This reverts commit c0f0e2ca6f.
Pull Request: https://projects.blender.org/blender/blender/pulls/146859
There is a bug in Embree that makes BVH updates crash. Disabling multithreaded
BVH updates after the initial BVH build appears to work around it, at the cost
of some performance.
This will not affect performance of the initial BVH build, transforming objects
or editing a single mesh. It will only affect performance when multiple smaller
meshes are edited together, as those can no longer have their BVH updated in
parallel or benefit from parallellization over many primitives.
Pull Request: https://projects.blender.org/blender/blender/pulls/134747
This may be used for device to do host memory allocation in a way that
is more efficient for copy the host memory to the device.
Also rename and group device memory allocation functions for clarity.
Pull Request: https://projects.blender.org/blender/blender/pulls/134412
Regardless of what mem info reports. We can't move this to the host, so
might as well try because the free memory might not be a reliable predictor
of success.
Pull Request: https://projects.blender.org/blender/blender/pulls/132912
With host mapped memory these can be shared, and we can't get back the
original host pointer unless we make a copy which is inefficient.
Also add asserts to verify this doesn't happen.
Pull Request: https://projects.blender.org/blender/blender/pulls/132912
This was not thread safe. And it's better to do them one by one to avoid
moving more than is needed, when another thread already freed up enough.
Thanks to Jorn Visser for investigating and finding this problem.
Pull Request: https://projects.blender.org/blender/blender/pulls/132912
Should be a bit more efficient, and it fixes host memory fallback bugs,
where host memory was incorrectly freed during re-copy. For the case
where memory should get reallocated on the host, a new mem_move_to_host
was added.
Thanks to Jorn Visser for investigating and finding this problem.
Pull Request: https://projects.blender.org/blender/blender/pulls/132912
Check was misc-const-correctness, combined with readability-isolate-declaration
as suggested by the docs.
Temporarily clang-format "QualifierAlignment: Left" was used to get consistency
with the prevailing order of keywords.
Pull Request: https://projects.blender.org/blender/blender/pulls/132361
* Use .empty() and .data()
* Use nullptr instead of 0
* No else after return
* Simple class member initialization
* Add override for virtual methods
* Include C++ instead of C headers
* Remove some unused includes
* Use default constructors
* Always use braces
* Consistent names in definition and declaration
* Change typedef to using
Pull Request: https://projects.blender.org/blender/blender/pulls/132361
This is because with the addition of new features to Cycles, these GPUs
experienced significant performance regressions and bugs, all stemming
from bugs in the Metal GPU driver/compiler. The only reasonable way to
work around these issues was to disable parts of Cycles code on
these GPUs to avoid the driver/compiler bugs.
This resulted in increased development time maintaining these platforms
while being unable to deliver feature parity with other
GPU backends.
It has been decided that this development time is better spent
maintaining platforms that are still actively maintained by
hardware/software vendors, and so AMD and Intel GPU support will be
removed from the Metal backend for Cycles.
Pull Request: https://projects.blender.org/blender/blender/pulls/123551
Since #118841 there are more cases where Cycles would check for the
graphics interop support. This could lead to a crash when graphics
interop functions are called without having active graphics context.
This change makes it so there is no graphics interop calls when doing
headless render. In order to achieve this the device creation is now
aware of the headless mode.
Pull Request: https://projects.blender.org/blender/blender/pulls/122844
Better to disable than crashing, as we are not expecting a quick fix. The cause
is likely similar to issues with the light tree, which was already disabled.
Ref #104013
HIP RT enables AMD hardware ray tracing on RDNA2 and above, and falls back to a
to shader implementation for older graphics cards. It offers an average 25%
sample rendering rate improvement in Cycles benchmarks, on a W6800 card.
The ray tracing feature functions are accessed through HIP RT SDK, available on
GPUOpen. HIP RT traversal functionality is pre-compiled in bitcode format and
shipped with the SDK.
This is not yet enabled as there are issues to be resolved, but landing the
code now makes testing and further changes easier.
Known limitations:
* Not working yet with current public AMD drivers.
* Visual artifact in motion blur.
* One of the buffers allocated for traversal has a static size. Allocating it
dynamically would reduce memory usage.
* This is for Windows only currently, no Linux support.
Co-authored-by: Brecht Van Lommel <brecht@blender.org>
Ref #105538
Updated Embree 4 library with GPU support is required for it to be
compiled - compatiblity with Embree 3 and Embree 4 without GPU support
is maintained.
Enabling hardware raytracing is an opt-in user setting for now.
Pull Request: https://projects.blender.org/blender/blender/pulls/106266
For example
```
OIIOOutputDriver::~OIIOOutputDriver()
{
}
```
becomes
```
OIIOOutputDriver::~OIIOOutputDriver() {}
```
Saves quite some vertical space, which is especially handy for
constructors.
Pull Request: https://projects.blender.org/blender/blender/pulls/105594
Host memory fallback in CUDA and HIP devices is almost identical.
We remove duplicated code and create a shared generic version that
other devices (oneAPI) will be able to use.
Reviewed By: brecht
Differential Revision: https://developer.blender.org/D17173
Uses a light tree to more effectively sample scenes with many lights. This can
significantly reduce noise, at the cost of a somewhat longer render time per
sample.
Light tree sampling is enabled by default. It can be disabled in the Sampling >
Lights panel. Scenes using light clamping or ray visibility tricks may render
different as these are biased techniques that depend on the sampling strategy.
The implementation is currently disabled on AMD HIP. This is planned to be fixed
before the release.
Implementation by Jeffrey Liu, Weizhen Huang, Alaska and Brecht Van Lommel.
Ref T77889
This adds path guiding features into Cycles by integrating Intel's Open Path
Guiding Library. It can be enabled in the Sampling > Path Guiding panel in the
render properties.
This feature helps reduce noise in scenes where finding a path to light is
difficult for regular path tracing.
The current implementation supports guiding directional sampling decisions on
surfaces, when the material contains a least one diffuse component, and in
volumes with isotropic and anisotropic Henyey-Greenstein phase functions.
On surfaces, the guided sampling decision is proportional to the product of
the incident radiance and the normal-oriented cosine lobe and in volumes it
is proportional to the product of the incident radiance and the phase function.
The incident radiance field of a scene is learned and updated during rendering
after each per-frame rendering iteration/progression.
At the moment, path guiding is only supported by the CPU backend. Support for
GPU backends will be added in future versions of OpenPGL.
Ref T92571
Differential Revision: https://developer.blender.org/D15286
This patch adds a new Cycles device with similar functionality to the
existing GPU devices. Kernel compilation and runtime interaction happen
via oneAPI DPC++ compiler and SYCL API.
This implementation is primarly focusing on Intel® Arc™ GPUs and other
future Intel GPUs. The first supported drivers are 101.1660 on Windows
and 22.10.22597 on Linux.
The necessary tools for compilation are:
- A SYCL compiler such as oneAPI DPC++ compiler or
https://github.com/intel/llvm
- Intel® oneAPI Level Zero which is used for low level device queries:
https://github.com/oneapi-src/level-zero
- To optionally generate prebuilt graphics binaries: Intel® Graphics
Compiler All are included in Linux precompiled libraries on svn:
https://svn.blender.org/svnroot/bf-blender/trunk/lib The same goes for
Windows precompiled binaries but for the graphics compiler, available
as "Intel® Graphics Offline Compiler for OpenCL™ Code" from
https://www.intel.com/content/www/us/en/developer/articles/tool/oneapi-standalone-components.html,
for which path can be set as OCLOC_INSTALL_DIR.
Being based on the open SYCL standard, this implementation could also be
extended to run on other compatible non-Intel hardware in the future.
Reviewed By: sergey, brecht
Differential Revision: https://developer.blender.org/D15254
Co-authored-by: Nikita Sirgienko <nikita.sirgienko@intel.com>
Co-authored-by: Stefan Werner <stefan.werner@intel.com>
* Replace license text in headers with SPDX identifiers.
* Remove specific license info from outdated readme.txt, instead leave details
to the source files.
* Add list of SPDX license identifiers used, and corresponding license texts.
* Update copyright dates while we're at it.
Ref D14069, T95597
For curve-heavy scenes, memory consumption regressed when we switched from MetalRT to bvh2. Allow users to opt in to MetalRT to workaround this.
Reviewed By: brecht
Differential Revision: https://developer.blender.org/D14071
There are two things achieved by this change:
- No possible downcast of size_t to int when calculating motion steps.
- Disambiguate call to `min()` which was for some reason considered
ambiguous on 32bit platforms `min(int, unsigned int)`.
- Do the same for the `max()` call to keep them symmetrical.
On an implementation side the `min()` is defined for a fixed width
integer type to disambiguate uint from size_t on 32bit platforms,
and yet be able to use it for 32bit operands on 64bit platforms without
upcast.
This ended up in a bit bigger change as the conditional compile-in of
functions is easiest if the functions is templated. Making the functions
templated required to remove the other source of ambiguity which is
`algorithm.h` which was pulling min/max from std.
Now it is the `math.h` which is the source of truth for min/max.
It was only one place which was relying on `algorithm.h` for these
functions, hence the choice of `math.h` as the safest and least
intrusive.
Fixes 32bit platforms (such as i386) in Debian package build system.
Differential Revision: https://developer.blender.org/D14062
Query TBB for the maximum allowed concurrency, which is free from a bug
in own concurrency detection code. One thing to keep in mind is that now
Cycles is limited by the number of threads in the TBB areana from which
Session is created. This isn't a problem for Blender since we do not limit
arena on Blender side. Could be something to watch out for in other Cycles
integrations.
Differential Revision: https://developer.blender.org/D13658
This patch adds the Metal host-side code:
- Add all core host-side Metal backend files (device_impl, queue, etc)
- Add MetalRT BVH setup files
- Integrate with Cycles device enumeration code
- Revive `path_source_replace_includes` in util/path (required for MSL compilation)
This patch also includes a couple of small kernel-side fixes:
- Add an implementation of `lgammaf` for Metal [Nemes, Gergő (2010), "New asymptotic expansion for the Gamma function", Archiv der Mathematik](https://users.renyi.hu/~gergonemes/)
- include "work_stealing.h" inside the Metal context class because it accesses state now
Ref T92212
Reviewed By: brecht
Maniphest Tasks: T92212
Differential Revision: https://developer.blender.org/D13423