This allows individual users or Linux distributions to specify a directory
Cycles will automatically look for the OptiX include folder, to compile kernels
at runtime.
It is still possible to override this with the OPTIX_ROOT_DIR environment
variable at runtime.
Based on patch by Sebastian Parborg.
Ref D15792
This cleans up the OpenGL build flags and linking.
It additionally also removes some dead code.
One of these dead code paths is WITH_X11_ALPHA which actually never was
active even with the build flag on. The call to use this was never
called because the default initializer for GHOST was set to have it off
per default. Nothing called this function with a boolean value to enable it.
These cleanups are needed to support true headless OpenGL rendering.
Without these cleanups libepoxy will fail to load the correct OpenGL
Libraries as we have already linked them to the blender binary.
Reviewed By: Brecht, Campbell, Jeroen
Differential Revision: http://developer.blender.org/D15554
With libepoxy we can choose between EGL and GLX at runtime, as well as
dynamically open EGL and GLX libraries without linking to them.
This will make it possible to build with Wayland, EGL, GLVND support while
still running on systems that only have X11, GLX and libGL. It also paves
the way for headless rendering through EGL.
libepoxy is a new library dependency, and is included in the precompiled
libraries. GLEW is no longer a dependency, and WITH_SYSTEM_GLEW was removed.
Includes contributions by Brecht Van Lommel, Ray Molenkamp, Campbell Barton
and Sergey Sharybin.
Ref T76428
Differential Revision: https://developer.blender.org/D15291
This patch causes the render buffers to be copied to the denoiser
device only once before denoising and output/display is then fed
from that single buffer on the denoiser device. That way usually all
but one copy (from all the render devices to the denoiser device)
can be eliminated, provided that the denoiser device is also the
display device (in which case interop is used to update the display).
As such this patch also adds some logic that tries to ensure the
chosen denoiser device is the same as the display device.
Differential Revision: https://developer.blender.org/D15657
The number of Execution Units and resident "threads" (simd width * threads
per EUs) are now exposed and used to select the number of states using
a simplified heuristic.
This was tested in some places to check if code was being compiled for the
CPU, however this is only defined in the kernel. Checking __KERNEL_GPU__
always works.
This was added for Metal, but also gives good results with CUDA and OptiX.
Also enable it for future Apple GPUs instead of only M1 and M2, since this has
been shown to help across multiple GPUs so the better bet seems to enable
rather than disable it.
Also moves some of the logic outside of the Metal device code, and always
enables the code in the kernel since other devices don't do dynamic compile.
Time per sample with OptiX + RTX A6000:
new old
barbershop_interior 0.0730s 0.0727s
bmw27 0.0047s 0.0053s
classroom 0.0428s 0.0464s
fishy_cat 0.0102s 0.0108s
junkshop 0.0366s 0.0395s
koro 0.0567s 0.0578s
monster 0.0206s 0.0223s
pabellon 0.0158s 0.0174s
sponza 0.0088s 0.0100s
spring 0.1267s 0.1280s
victor 0.0524s 0.0531s
wdas_cloud 0.0817s 0.0816s
Ref D15331, T87836
The Metal backend now compiles and caches a second set of kernels which are
optimized for scene contents, enabled for Apple Silicon.
The implementation supports doing this both for intersection and shading
kernels. However this is currently only enabled for intersection kernels that
are quick to compile, and already give a good speedup. Enabling this for
shading kernels would be faster still, however this also causes a long wait
times and would need a good user interface to control this.
M1 Max samples per minute (macOS 13.0):
PSO_GENERIC PSO_SPECIALIZED_INTERSECT PSO_SPECIALIZED_SHADE
barbershop_interior 83.4 89.5 93.7
bmw27 1486.1 1671.0 1825.8
classroom 175.2 196.8 206.3
fishy_cat 674.2 704.3 719.3
junkshop 205.4 212.0 257.7
koro 310.1 336.1 342.8
monster 376.7 418.6 424.1
pabellon 273.5 325.4 339.8
sponza 830.6 929.6 1142.4
victor 86.7 96.4 96.3
wdas_cloud 111.8 112.7 183.1
Code contributed by Jason Fielder, Morteza Mostajabodaveh and Michael Jones
Differential Revision: https://developer.blender.org/D14645
This patch partitions the active indices into chunks prior to sorting by material in order to tradeoff some material coherence for better locality. On Apple Silicon GPUs (particularly higher end M1-family GPUs), we observe overall render time speedups of up to 15%. The partitioning is implemented by repeating the range of `shader_sort_key` for each partition, and encoding a "locator" key which distributes the indices into sorted chunks.
Reviewed By: brecht
Differential Revision: https://developer.blender.org/D15331
This patch adds a new Cycles device with similar functionality to the
existing GPU devices. Kernel compilation and runtime interaction happen
via oneAPI DPC++ compiler and SYCL API.
This implementation is primarly focusing on Intel® Arc™ GPUs and other
future Intel GPUs. The first supported drivers are 101.1660 on Windows
and 22.10.22597 on Linux.
The necessary tools for compilation are:
- A SYCL compiler such as oneAPI DPC++ compiler or
https://github.com/intel/llvm
- Intel® oneAPI Level Zero which is used for low level device queries:
https://github.com/oneapi-src/level-zero
- To optionally generate prebuilt graphics binaries: Intel® Graphics
Compiler All are included in Linux precompiled libraries on svn:
https://svn.blender.org/svnroot/bf-blender/trunk/lib The same goes for
Windows precompiled binaries but for the graphics compiler, available
as "Intel® Graphics Offline Compiler for OpenCL™ Code" from
https://www.intel.com/content/www/us/en/developer/articles/tool/oneapi-standalone-components.html,
for which path can be set as OCLOC_INSTALL_DIR.
Being based on the open SYCL standard, this implementation could also be
extended to run on other compatible non-Intel hardware in the future.
Reviewed By: sergey, brecht
Differential Revision: https://developer.blender.org/D15254
Co-authored-by: Nikita Sirgienko <nikita.sirgienko@intel.com>
Co-authored-by: Stefan Werner <stefan.werner@intel.com>
Enables Vega and Vega II GPUs as well as Vega APU, using changes in HIP code
to support 64-bit waves and a new HIP SDK version.
Tested with Radeon WX9100, Radeon VII GPUs and Ryzen 7 PRO 5850U with Radeon
Graphics APU.
Ref T96740, T91571
Differential Revision: https://developer.blender.org/D15242
If there is an error we should stop rendering, instead of finishing with a
wrong render result or reporting a wrong benchmark time.
Ref T96519
Differential Revision: https://developer.blender.org/D15287
This patch suffixes Apple GPU device names with `(GPU - # cores)` so that variant GPUs with the same chipset can be distinguished. Currently benchmark scores for these M1 family GPUs are being incorrectly merged:
- M1: 7 or 8 cores
- M1 Pro: 14 or 16 cores
- M1 Max: 24 or 32 cores
- M1 Ultra: 48 or 64 cores
Reviewed By: brecht, sergey
Differential Revision: https://developer.blender.org/D15257
* Rename "texture" to "data array". This has not used textures for a long time,
there are just global memory arrays now. (On old CUDA GPUs there was a cache
for textures but not global memory, so we used to put all data in textures.)
* For CUDA and HIP, put globals in KernelParams struct like other devices.
* Drop __ prefix for data array names, no possibility for naming conflict now that
these are in a struct.
This patch adds a new mode of gpu capture (env var `CYCLES_DEBUG_METAL_CAPTURE_SAMPLES`) to capture a block of dispatches between "reset" calls. It also fixes member data naming inconsistencies and adds some missing OS version checks.
Screenshot showing .gputrace capture in Xcode 14.0 beta (using `CYCLES_DEBUG_METAL_CAPTURE_SAMPLES="1"` and `CYCLES_DEBUG_METAL_CAPTURE_LIMIT="10"`):
{F13155703}
Reviewed By: sergey, brecht
Differential Revision: https://developer.blender.org/D15179
Acceleration structures in the viewport default to building with the fast
build flag, but the intersection program used for curves was queried with
the fast trace flag. The resulting mismatch caused an exception in the
intersection kernel. Since it's difficult to predict whether dynamic or static
acceleration structures are going to be built at the time of kernel loading,
this fixes the mismatch by always using the fast trace flag for curves.
Move MNEE to own kernel, separate from shader ray-tracing. This does introduce
the limitation that a shader can't use both MNEE and AO/bevel, but that seems
like the better trade-off for now.
We can experiment with bigger kernel organization changes later.
Differential Revision: https://developer.blender.org/D15070
This patch makes it possible to change the precision with which to
store volume data in the NanoVDB data structure (as float, half, or
using variable bit quantization) via the previously unused precision
field in the volume data block.
It makes it possible to further reduce memory usage during
rendering, at a slight cost to the visual detail of a volume.
Differential Revision: https://developer.blender.org/D10023
OptiX 7.4 adds support for splitting the costly creation of an OptiX
module into smaller tasks that can be executed in parallel on a
thread pool.
This is only really relevant for the "shader_raytrace" kernel variant
as the main one is small and compiles fast either way. It sheds of
a few seconds there (total gain is not massive currently, since it is
difficult for the compiler to split up the huge shading entry point
that is the primary one taking up time, but it is still measurable).
Differential Revision: https://developer.blender.org/D14845
This is a stripped down version of D14645 without the scene specialisation optimisations.
The two major changes in this patch are:
- Enables more aggressive inlining on Apple Silicon resulting in a 1.1x speedup and 10% reduction in spill, at the cost of longer pipeline build times
- Revival of shader binary archives through a new ShaderCache which is shared between MetalDevice instances using the same physical MTLDevice. This mitigates the extra compile times via explicit caching (rather than, as before, relying on the implicit system shader cache which can be purged without notice)
Reviewed By: brecht
Differential Revision: https://developer.blender.org/D14763
Noticed while looking into oneAPI patch.
Seems to be unused, without clear indication why/when it might be
needed. Removing the function simplifies adding the new backend.
Differential Revision: https://developer.blender.org/D14652
* Add missing GLEW and hgiGL libraries for Hydra
* Fix wrong case sensitive include
* Fix link errors by adding external libs to static Hydra lib
* Work around weird Hydra link error with MAX_SAMPLES
* Use Embree by default for Hydra
* Sync external libs code with standalone
* Update version number to match Blender
* Remove unneeded CLEW/GLEW from test executable
None of this should affect Cycles in Blender.
Ref T96731
Basic testing on windows only so far. Will need some testing on Linux as well
when the Linux enablement patch is ready.
Does not enable Vega APUs yet (which would be gfx902 or gfx90c).
Differential Revision: https://developer.blender.org/D14432
This patch adds a Hydra render delegate to Cycles, allowing Cycles to be used for rendering
in applications that provide a Hydra viewport. The implementation was written from scratch
against Cycles X, for integration into the Blender repository to make it possible to continue
developing it in step with the rest of Cycles. For this purpose it follows the style of the rest of
the Cycles code and can be built with a CMake option
(`WITH_CYCLES_HYDRA_RENDER_DELEGATE=1`) similar to the existing standalone version
of Cycles.
Since Hydra render delegates need to be built against the exact USD version and other
dependencies as the target application is using, this is intended to be built separate from
Blender (`WITH_BLENDER=0` CMake option) and with support for library versions different
from what Blender is using. As such the CMake build scripts for Windows had to be modified
slightly, so that the Cycles Hydra render delegate can e.g. be built with MSVC 2017 again
even though Blender requires MSVC 2019 now, and it's possible to specify custom paths to
the USD SDK etc. The codebase supports building against the latest USD release 22.03 and all
the way back to USD 20.08 (with some limitations).
Reviewed By: brecht, LazyDodo
Differential Revision: https://developer.blender.org/D14398
Caused by an integer overflow in the tiling utilities of OptiX SDK.
Seems for now it's easier to copy and modify code to our sources so
that we don't need to bump SDK version requirement (which might lead
to an increased driver requirement as well).
There are still some fixes needed from a newer driver to have such
denoising to work properly: Windows requires 511.79, Linux 510.54.
Thanks Patrick for investigation!
Differential Revision: https://developer.blender.org/D14300
This patch hides the MetalRT checkbox for AMD GPUs, pending fixes for MetalRT argument encoding on AMD.
Reviewed By: brecht
Differential Revision: https://developer.blender.org/D14175
GLUT does not support offscreen contexts, which is required for the new
display driver. So we use SDL instead. Note that this requires using a
system SDL package, the Blender precompiled SDL does not include the video
subsystem.
There is currently no text display support, instead info is printed to
the terminal. This would require adding an embedded font and GLSL shaders,
or using GUI library.
Another improvement to be made is supporting OpenColorIO display transforms,
right now we assume Rec.709 scene linear and display.
All OpenGL, GLEW and SDL code was move out of core cycles and into
app/opengl. This serves as a template for apps that want to integrate
Cycles interactive rendering, with a simple OpenGLDisplayDriver example.
In general this would be adapted to the graphics API and color management
used by the app.
Ref T91846