On the one hand, this improves initialization time since we don't need to
load/compile the full OSL module with all the shading logic if we're only
using a custom camera with SVM shading.
On the other hand, it also fixes a bug I noticed while preparing test scenes:
The AO and Bevel nodes don't work when using custom cameras with SVM on OptiX.
The issue there is that those two are handled by the SHADE_SURFACE_RAYTRACE
kernel, but since that one has intersection logic, we use the OptiX-specific
kernel even if OSL shading is disabled.
However, with the previous unified OSL module, this would mean loading
SHADE_SURFACE_RAYTRACE from kernel_osl.cu, which has `#undef __SVM__` and
therefore doesn't handle them correctly.
With this change, we'll use the kernels from kernel_shader_raytrace.cu in that
case, which do support SVM nodes just fine.
Disk usage of the new kernel_optix_osl_camera.ptx.zst file is 30KB, so this
also doesn't blow up the kernel disk size (and kernel_optix_osl.ptx.zst is
probably smaller by that amount now).
Since it seems that we can mix modules just fine, I'm suspecting that we could
split the modules properly (intersection, SVM shading with raytracing,
OSL shading, OSL camera), instead of the current approach where modules
essentially correspond to feature set tiers and each includes the previous
one's kernels as well - but that's a separate refactor.
Pull Request: https://projects.blender.org/blender/blender/pulls/138021
This allows users to implement arbitrary camera models using OSL by writing
shaders that take an image position as input and compute ray origin and
direction.
The obvious applications for this are e.g. panorama modes, lens distortion
models and realistic lens simulation, but the possibilities are endless.
Currently, this is only supported on devices with OSL support, so CPU and
OptiX. However, it is independent from the shading model used, so custom
cameras can be used without getting the performance hit of OSL shading.
A few samples are provided as Text Editor templates.
One notable current limitation (in addition to the limited device support)
is that inverse mapping is not supported, so Window texture coordinates and
the Vector pass will not work with custom cameras.
Pull Request: https://projects.blender.org/blender/blender/pulls/129495
* Use .empty() and .data()
* Use nullptr instead of 0
* No else after return
* Simple class member initialization
* Add override for virtual methods
* Include C++ instead of C headers
* Remove some unused includes
* Use default constructors
* Always use braces
* Consistent names in definition and declaration
* Change typedef to using
Pull Request: https://projects.blender.org/blender/blender/pulls/132361
Based on #123377 by @brecht, but Gitea doesn't like the rebase these
so here's a new PR.
The purpose here is to switch to fused OptiX programs for OSL execution
on CUDA. On the one hand, this makes the code easier since, but there's
also another advantage - how memory allocation is managed.
OSL shaders need memory to store intermediate values, but how much is
needed depends on the complexity of the shader. With the split program
approach, Cycles had to provide that memory, so we had to allocate a
certain amount (2 KiB, to be precise) statically and show an error if
the shader would need more. If the shader used less (which is the case
for the vast majority), the memory was just wasted.
By switching to fused kernels, OSL knows the required amount during JIT
codegen, so it can allocate only what's required, which avoids this
waste. One still needs to set a maximum, and in theory, OSL would also
support spilling over into a Cycles-provided alternative memory region.
However, we currently don't implement that - instead, we default to the
same 2048 limit as before and let advanced users override it via the
CYCLES_OSL_GROUPDATA_ALLOC environment variable if really needed.
Co-authored-by: Brecht Van Lommel <brecht@blender.org>
Pull Request: https://projects.blender.org/blender/blender/pulls/130149
Changing 3D curve properties while viewport rendering was active
resulted in an error, because Cycles would attempt to update the
acceleration structure containing the curves, but that acceleration
structure was built without the
`OPTIX_BUILD_FLAG_ALLOW_UPDATE` flag allowing updates. This
fixes that by adding the flag to all curve build inputs.
Ideally could just use the same flags as for other build inputs and
differentiate between viewport and final rendering (based on
`bvh_type`), but that's not currently an option since the same flags
have to be specified to query the curve intersection module in
`load_kernels()`, where that differentiation is not known. See also
commit 5c6053ccb1.
Pull Request: https://projects.blender.org/blender/blender/pulls/129634
OptiX has accepted Catmull-Rom curve data natively since OptiX 7.4, but due to the previous conversion to B-Spline code, the format that data is fed to OptiX wasn't optimal.
Each curve segment was put in the vertex buffer as four independent control points, even though continuous segments actually share control points between each other. This patch compacts that so shared control points only occur once in the vertex buffer.
This compact form uses less memory and also allows OptiX to easily identify segments that belong together into a curve (those where the step between indices is one).
Pull Request: https://projects.blender.org/blender/blender/pulls/125899
Precompiled Cycles kernels make up a considerable fraction of the total size of
Blender builds nowadays. As we add more features and support for more
architectures, this will only continue to increase.
However, since these kernels tend to be quite compressible, we can save a lot
of storage by storing them in compressed form and decompressing the required
kernel(s) during loading.
By using Zstandard compression with a high level, we can get decent compression
ratios (~5x for the current kernels) while keeping decompression time low
(about 30ms in the worse case in my tests). And since we already require zstd
for Blender, this doesn't introduce a new dependency.
While the main improvement is to the size of the extracted Blender installation
(which is reduced by ~400-500MB currently), this also shrinks the download on
Windows, since .zip's deflate compression is less effective. It doesn't help on
Linux since we're already using .tar.xz there, but the smaller installed size
is still a good thing.
See #123522 for initial discussion.
Pull Request: https://projects.blender.org/blender/blender/pulls/123557
The callables generated by OSL reference other external functions
(defined in the OSL services module), in which case OptiX cannot
calculate the right stack size just based on the callable alone, it needs to
know all functions linked together in the pipeline to get to an accurate
result. `optixProgramGroupGetStackSize` has an optional pipeline
argument for this purpose, so make use of that to ensure the correct
stack size is calculated.
Ref #122779
Pull Request: https://projects.blender.org/blender/blender/pulls/123368
Since #118841 there are more cases where Cycles would check for the
graphics interop support. This could lead to a crash when graphics
interop functions are called without having active graphics context.
This change makes it so there is no graphics interop calls when doing
headless render. In order to achieve this the device creation is now
aware of the headless mode.
Pull Request: https://projects.blender.org/blender/blender/pulls/122844
Empty hair geometry in Cycles may still report having one curve, even when
there are no actual segments in that curve. This caused an attempt to build
an acceleration structure with zero primitives, which due to other setup
OptiX rejected with an error. Fix that by checking the number of segments
rather than the number of curves in the hair geometry, since the former will
always be zero for empty geometry.
Pull Request: https://projects.blender.org/blender/blender/pulls/115044
* User simpler API names that accept both PTX and OptiX-IR
* New argument for optixProgramGroupGetStackSize, leave to default
* Remove OptixPipelineLinkOptions::debugLevel that does nothing
Pull Request: https://projects.blender.org/blender/blender/pulls/107450
The traversable handle of a BLAS may be zero when the relevant geometry
is empty (no triangles/curves/points/...), as no BLAS is built in such cases.
It is not correct to attach a zero handle to a TLAS, so filter out such instances.
Materials without connections to the output node would crash with OSL
in OptiX, since the Cycles `OSLCompiler` generates an empty shader
group reference for them, which resulted in the OptiX device
implementation setting an empty SBT entry for the corresponding direct
callables, which then crashed when calling those direct callables was
attempted in `osl_eval_nodes`. This fixes that by setting the SBT entries
for empty shader groups to a dummy direct callable that does nothing.
Switching viewport denoising causes kernels to be reloaded with a new
feature mask, which would destroy the existing OptiX pipelines. But OSL
kernels were not reloaded as well, leaving the shading pipeline
uninitialized and therefore causing an error when it is later attempted to
execute it. This fixes that by ensuring OSL kernels are always reloaded
when the normal kernels are too.
Cycles already treats denoising fairly separate in its code, with a
dedicated `Denoiser` base class used to describe denoising
behavior. That class has been fully implemented for OIDN
(`denoiser_oidn.cpp`), but for OptiX was mostly empty
(`denoiser_optix.cpp`) and denoising was instead implemented in
the OptiX device. That meant denoising code was split over various
files and directories, making it a bit awkward to work with. This
patch moves the OptiX denoising implementation into the existing
`OptiXDenoiser` class, so that everything is in one place. There are
no functional changes, code has been mostly moved as-is. To
retain support for potential other denoiser implementations based
on a GPU device in the future, the `DeviceDenoiser` base class was
kept and slightly extended (and its file renamed to
`denoiser_gpu.cpp` to follow similar naming rules as
`path_trace_work_*.cpp`).
Differential Revision: https://developer.blender.org/D16502
This patch generalizes the OSL support in Cycles to include GPU
device types and adds an implementation for that in the OptiX
device. There are some caveats still, including simplified texturing
due to lack of OIIO on the GPU and a few missing OSL intrinsics.
Note that this is incomplete and missing an update to the OSL
library before being enabled! The implementation is already
committed now to simplify further development.
Maniphest Tasks: T101222
Differential Revision: https://developer.blender.org/D15902
If no OPTIX_ROOT is set, nvcc fails to compile because there is a stray "-I"
in the arguments. Detect if the include path is empty and act accordingly.
Differential Revision: https://developer.blender.org/D16308
This allows individual users or Linux distributions to specify a directory
Cycles will automatically look for the OptiX include folder, to compile kernels
at runtime.
It is still possible to override this with the OPTIX_ROOT_DIR environment
variable at runtime.
Based on patch by Sebastian Parborg.
Ref D15792
This patch causes the render buffers to be copied to the denoiser
device only once before denoising and output/display is then fed
from that single buffer on the denoiser device. That way usually all
but one copy (from all the render devices to the denoiser device)
can be eliminated, provided that the denoiser device is also the
display device (in which case interop is used to update the display).
As such this patch also adds some logic that tries to ensure the
chosen denoiser device is the same as the display device.
Differential Revision: https://developer.blender.org/D15657
This was tested in some places to check if code was being compiled for the
CPU, however this is only defined in the kernel. Checking __KERNEL_GPU__
always works.
* Rename "texture" to "data array". This has not used textures for a long time,
there are just global memory arrays now. (On old CUDA GPUs there was a cache
for textures but not global memory, so we used to put all data in textures.)
* For CUDA and HIP, put globals in KernelParams struct like other devices.
* Drop __ prefix for data array names, no possibility for naming conflict now that
these are in a struct.
Acceleration structures in the viewport default to building with the fast
build flag, but the intersection program used for curves was queried with
the fast trace flag. The resulting mismatch caused an exception in the
intersection kernel. Since it's difficult to predict whether dynamic or static
acceleration structures are going to be built at the time of kernel loading,
this fixes the mismatch by always using the fast trace flag for curves.
Move MNEE to own kernel, separate from shader ray-tracing. This does introduce
the limitation that a shader can't use both MNEE and AO/bevel, but that seems
like the better trade-off for now.
We can experiment with bigger kernel organization changes later.
Differential Revision: https://developer.blender.org/D15070