All GPU backends now support NanoVDB, using our own kernel side code
that is easily portable. This simplifies kernel and device code.
Volume bounds are now built from the NanoVDB grid instead of OpenVDB,
to avoid having to keep around the OpenVDB grid after loading.
While this reduces memory usage, it does have a performance impact,
particularly for the Cubic filter. That will be addressed by
another commit.
Pull Request: https://projects.blender.org/blender/blender/pulls/132908
* Add GraphicsInteropDevice to check if interop is possible with device
* Rename GraphcisInterop to GraphicsInteropBuffer
* Include display device type and memory size in GraphicsInteropBuffer
* Unnest graphics interop class to make forward declarations possible
Pull Request: https://projects.blender.org/blender/blender/pulls/137363
This may be used for device to do host memory allocation in a way that
is more efficient for copy the host memory to the device.
Also rename and group device memory allocation functions for clarity.
Pull Request: https://projects.blender.org/blender/blender/pulls/134412
If one of the devices already used host happed memory but another not,
it would previously realloc both.
Thanks to Jorn Visser for investigating and finding this problem.
Pull Request: https://projects.blender.org/blender/blender/pulls/132912
Should be a bit more efficient, and it fixes host memory fallback bugs,
where host memory was incorrectly freed during re-copy. For the case
where memory should get reallocated on the host, a new mem_move_to_host
was added.
Thanks to Jorn Visser for investigating and finding this problem.
Pull Request: https://projects.blender.org/blender/blender/pulls/132912
* Use .empty() and .data()
* Use nullptr instead of 0
* No else after return
* Simple class member initialization
* Add override for virtual methods
* Include C++ instead of C headers
* Remove some unused includes
* Use default constructors
* Always use braces
* Consistent names in definition and declaration
* Change typedef to using
Pull Request: https://projects.blender.org/blender/blender/pulls/132361
Precompiled Cycles kernels make up a considerable fraction of the total size of
Blender builds nowadays. As we add more features and support for more
architectures, this will only continue to increase.
However, since these kernels tend to be quite compressible, we can save a lot
of storage by storing them in compressed form and decompressing the required
kernel(s) during loading.
By using Zstandard compression with a high level, we can get decent compression
ratios (~5x for the current kernels) while keeping decompression time low
(about 30ms in the worse case in my tests). And since we already require zstd
for Blender, this doesn't introduce a new dependency.
While the main improvement is to the size of the extracted Blender installation
(which is reduced by ~400-500MB currently), this also shrinks the download on
Windows, since .zip's deflate compression is less effective. It doesn't help on
Linux since we're already using .tar.xz there, but the smaller installed size
is still a good thing.
See #123522 for initial discussion.
Pull Request: https://projects.blender.org/blender/blender/pulls/123557
Since #118841 there are more cases where Cycles would check for the
graphics interop support. This could lead to a crash when graphics
interop functions are called without having active graphics context.
This change makes it so there is no graphics interop calls when doing
headless render. In order to achieve this the device creation is now
aware of the headless mode.
Pull Request: https://projects.blender.org/blender/blender/pulls/122844
With the switch to using the primary CUDA context it became possible
for peer access between CUDA devices to already have been enabled for
that context, either by a previous Cycles session or third-party library,
thus causing the call to `cuCtxEnablePeerAccess` to return
`CUDA_ERROR_PEER_ACCESS_ALREADY_ENABLED`. This is not a failure
state however, so just needs to be handled like a success return value.
Pull Request: https://projects.blender.org/blender/blender/pulls/120255
Previously the CUDA context was always destroyed and the module along
with it. Now that this no longer happens, the missing module free became
a memory leak.
Also fix the same issue for HIP, though this is destroying the context
so it's not a problem yet.
Fix part of #119035
Co-authored-by: Brecht Van Lommel <brecht@blender.org>
Host memory fallback in CUDA and HIP devices is almost identical.
We remove duplicated code and create a shared generic version that
other devices (oneAPI) will be able to use.
Reviewed By: brecht
Differential Revision: https://developer.blender.org/D17173
This adds a new mirror image extension type for shaders and
geometry nodes (next to the existing repeat, extend and clip
options).
See D16432 for a more detailed explanation of `wrap_mirror`.
This also adds a new sampler flag `GPU_SAMPLER_MIRROR_REPEAT`.
It acts as a modifier to `GPU_SAMPLER_REPEAT`, so any `REPEAT`
flag must be set for the `MIRROR` flag to have an effect.
Differential Revision: https://developer.blender.org/D16432
If no OPTIX_ROOT is set, nvcc fails to compile because there is a stray "-I"
in the arguments. Detect if the include path is empty and act accordingly.
Differential Revision: https://developer.blender.org/D16308
This patch causes the render buffers to be copied to the denoiser
device only once before denoising and output/display is then fed
from that single buffer on the denoiser device. That way usually all
but one copy (from all the render devices to the denoiser device)
can be eliminated, provided that the denoiser device is also the
display device (in which case interop is used to update the display).
As such this patch also adds some logic that tries to ensure the
chosen denoiser device is the same as the display device.
Differential Revision: https://developer.blender.org/D15657
* Rename "texture" to "data array". This has not used textures for a long time,
there are just global memory arrays now. (On old CUDA GPUs there was a cache
for textures but not global memory, so we used to put all data in textures.)
* For CUDA and HIP, put globals in KernelParams struct like other devices.
* Drop __ prefix for data array names, no possibility for naming conflict now that
these are in a struct.
Move MNEE to own kernel, separate from shader ray-tracing. This does introduce
the limitation that a shader can't use both MNEE and AO/bevel, but that seems
like the better trade-off for now.
We can experiment with bigger kernel organization changes later.
Differential Revision: https://developer.blender.org/D15070
This patch makes it possible to change the precision with which to
store volume data in the NanoVDB data structure (as float, half, or
using variable bit quantization) via the previously unused precision
field in the volume data block.
It makes it possible to further reduce memory usage during
rendering, at a slight cost to the visual detail of a volume.
Differential Revision: https://developer.blender.org/D10023
* Replace license text in headers with SPDX identifiers.
* Remove specific license info from outdated readme.txt, instead leave details
to the source files.
* Add list of SPDX license identifiers used, and corresponding license texts.
* Update copyright dates while we're at it.
Ref D14069, T95597
Somehow only a part of rBf4f8b6dde32b0438e0b97a6d8ebeb89802987127 ended up in
Cycles X, causing the issue that commit fixed, "OPTIX_ERROR_INVALID_VALUE" when the
system is out of memory, to show up again.
This adds the missing changes to fix that problem.
Maniphest Tasks: T93620
Differential Revision: https://developer.blender.org/D13488
This patch adds new arg-type parameters to `DeviceQueue::enqueue` and its overrides. This is in preparation for the Metal backend which needs this information for correct argument encoding.
Ref T92212
Reviewed By: brecht
Maniphest Tasks: T92212
Differential Revision: https://developer.blender.org/D13357
Happens when device runs out of memory and Cycles is moving some
textures to the host memory.
The delayed memory free for OptiX BVH was moving data from one
device_memory to another, leaving the original device memory in
an invalid state. This was ruining the allocation map in the CUDA
device which is using pointer to the device_memory.
This change makes it so the memory pointer is stolen from BVH
into the delayed memory free list.
Additionally, forbid copying and moving instances of device_memory
and added sanity checks in the device implementation.
Differential Revision: https://developer.blender.org/D13316
This patch adds a CMake option "WITH_CYCLES_DEBUG" which builds cycles with
a feature that allows debugging/selecting the direct-light sampling strategy.
The same option may later be used to add other debugging features that could
affect performance in release builds.
The three options are:
* Forward path tracing (e.g., via BSDF or phase function)
* Next-event estimation
* Multiple importance sampling combination of the previous two methods
Such a feature is useful for debugging light different sampling, evaluation,
and pdf methods (e.g., for light sources and BSDFs).
Differential Revision: https://developer.blender.org/D13152