griefith/test

Author	SHA1	Message	Date
Nikita Sirgienko	f07b09da27	Cycles: Improve oneAPI backend support for non-Intel platforms	2022-11-25 17:46:59 +01:00
Nikita Sirgienko	412642865d	Cleanup: Resolve a warning for the ambiguity on the parenthesis in oneAPI code No functional changes.	2022-11-24 18:05:02 +01:00
Patrick Mours	a859837cde	Cleanup: Move OptiX denoiser code from device into denoiser class Cycles already treats denoising fairly separate in its code, with a dedicated `Denoiser` base class used to describe denoising behavior. That class has been fully implemented for OIDN (`denoiser_oidn.cpp`), but for OptiX was mostly empty (`denoiser_optix.cpp`) and denoising was instead implemented in the OptiX device. That meant denoising code was split over various files and directories, making it a bit awkward to work with. This patch moves the OptiX denoising implementation into the existing `OptiXDenoiser` class, so that everything is in one place. There are no functional changes, code has been mostly moved as-is. To retain support for potential other denoiser implementations based on a GPU device in the future, the `DeviceDenoiser` base class was kept and slightly extended (and its file renamed to `denoiser_gpu.cpp` to follow similar naming rules as `path_trace_work_*.cpp`). Differential Revision: https://developer.blender.org/D16502	2022-11-15 15:50:01 +01:00
Michael Jones	b0e2e45496	Cycles: Enable MetalRT pointclouds & other fixes Code authored by Marco Giordano. This fixes pointcloud rendering on MetalRT and some other subtle MetalRT bugs: - Incorrect kernel hashing - Missing specialisation constants - Incorrect visibility filtering - Missing null pointer check Reviewed By: brecht Differential Revision: https://developer.blender.org/D16499	2022-11-14 16:39:18 +00:00
Michael Jones	2c596319a4	Cycles: Cache only up to 5 kernels of each type on Metal This patch adapts D14754 for the Metal backend. Kernels of the same type are already organised into subdirectories which simplifies type matching. Reviewed By: brecht Differential Revision: https://developer.blender.org/D16469	2022-11-11 18:10:29 +00:00
Patrick Mours	e6b38deb9d	Cycles: Add basic support for using OSL with OptiX This patch generalizes the OSL support in Cycles to include GPU device types and adds an implementation for that in the OptiX device. There are some caveats still, including simplified texturing due to lack of OIIO on the GPU and a few missing OSL intrinsics. Note that this is incomplete and missing an update to the OSL library before being enabled! The implementation is already committed now to simplify further development. Maniphest Tasks: T101222 Differential Revision: https://developer.blender.org/D15902	2022-11-09 15:30:21 +01:00
Chris Blackbourn	4b57bc4e5d	Cleanup: format	2022-11-09 08:30:18 +13:00
Brecht Van Lommel	b539d425f0	Merge branch 'blender-v3.4-release'	2022-11-08 19:47:55 +01:00
Gon Solo	c306ccb67f	Fix Cycles error with runtime compilation when there is no path to OptiX SDK If no OPTIX_ROOT is set, nvcc fails to compile because there is a stray "-I" in the arguments. Detect if the include path is empty and act accordingly. Differential Revision: https://developer.blender.org/D16308	2022-11-08 19:40:57 +01:00
Michael Jones	74140d41b1	Cycles: Apple GPU threadgroup tuning This patch tunes maximum threads-per-threadgroup and threads-per-block for faster renders on Apple GPUs. Appropriate tuning is selected based on the GPU architecture (M1 or M2). We see a benchmark uplift of around 5-10% on M1 family chips. Similar uplift is expected on M2 with upcoming OS changes. (Ref T101931) Reviewed By: brecht Maniphest Tasks: T101931 Differential Revision: https://developer.blender.org/D16299	2022-11-07 10:00:46 +00:00
Campbell Barton	6377d00a61	Cleanup: cmake comment line length	2022-11-03 12:11:08 +11:00
Xavier Hallade	454dd3f7f0	Cycles: fix up logic in oneAPI devices filtering CYCLES_ONEAPI_ALL_DEVICES environment variable wasn't working as intended after `305b92e05f`.	2022-10-27 23:09:14 +02:00
Michael Jones	8dd7b5b26b	Cycles: Metal integrator state size tuning This patch tunes the integrator state sizing for Metal (`num_concurrent_states` and `num_concurrent_busy_states`). On all GPUs architecture, we adjust the busy:total states ratio to be 1:4 which gives better rendering performance than the previous 1:16 ratio (independent of total state count). This gives a small performance uplift (e.g. 2-3% on M1 Ultra). Additionally for M2 architectures, we double the overall state size if there is available headroom. Inclusive of the first change, we can expect uplift of close to 10% in future, as this results in larger dispatch sizes and minimises work submission overheads. In order to make an accurate determination of available headroom, we defer the calculation of `num_concurrent_states` and `num_concurrent_busy_states` until the time of integrator state allocation (i.e. after all of the scene data has been allocated). We also refactor `alloc_integrator_soa` to calculate an exact single-state-size in a first pass, right before allocating the integrator SoA buffers in a second pass. Reviewed By: brecht Differential Revision: https://developer.blender.org/D16313	2022-10-24 17:14:33 +01:00
Sergey Sharybin	2c108d5503	Avoid re-compilation of oneAPI AoT kernels when configuration changes Buildbot infrastructure relies on the fact that it can enable and disable `WITH_CYCLES_<COMPUTE>_BINARIES` without affecting speed of incremental builds. This allows buildbot to skip GPU kernels when doing CI regression tests which do not need GPU kernels, as well as it allows to move GPU kernels compilation to a separate step where all the resources are available to the GPU kernel builders. For the oneAPI compute enabling and disabling AoT kernels has much higher implications due to the kernels being a part of the device implementation from the build target perspective. This change makes it so different target names are used for JIT and AoT configurations, which allows CMake to more fully benefit from "caching" the compiled result. The end goal of this change is to make it so sequential build of the same code base on the buildbot happens super fast, Blender binary still needs to be re-linked when the AOT of oneAPI option is toggled, but that's already the case in the buildbot due to the WITH_BUILDINFO. Differential Revision: https://developer.blender.org/D16312	2022-10-21 17:17:51 +02:00
Xavier Hallade	0cfac5b043	Cycles: oneAPI: migrate from deprecated APIs, require libSYCL 6.0+ sycl::info::device::ext_intel_* descriptors are deprecated, replaced with sycl::ext::intel::info::device:: that are available from 6.0+, for which we now check version in CMake.	2022-10-21 15:36:49 +02:00
Xavier Hallade	305b92e05f	Cycles: oneAPI: remove use of SYCL host device Host device is deprecated in SYCL 2020 spec, cpu device or standard C++ should be used instead.	2022-10-21 15:36:48 +02:00
Sergey Sharybin	4bfb99e4d8	Cycles: Bump versions of DPC++, IGC, and dependencies Patch by Xavier Hallade. Committing next to the actual libraries update in the svn.	2022-10-21 12:19:54 +02:00
Campbell Barton	f24dfdcd3c	Cleanup: format	2022-10-20 13:57:43 +11:00
Brecht Van Lommel	193b456d2d	Fix macOS build error after recent changes to enable Intel GPUs This will only work once we upgrade to the macOS 13 SDK. Ref D16253	2022-10-19 20:34:53 +02:00
Morteza Mostajab	e6902d19a0	Cycles: Allow Intel GPUs under Metal Known Issues: - Command buffer failures when using binary archives (binary archives is disabled for Intel GPUs as a workaround) - Wrong texture sampler being applied (to be addressed in the future) Ref T92212 Reviewed By: brecht Maniphest Tasks: T92212 Differential Revision: https://developer.blender.org/D16253	2022-10-19 17:09:38 +01:00
Xavier Hallade	2943997d2a	Cycles: oneAPI: include sycl/sycl.hpp instead of CL/sycl.hpp Since SYCL 2020 API, sycl/sycl.hpp is the way.	2022-10-19 16:42:10 +02:00
Xavier Hallade	d816bae7bf	Cycles: oneAPI: fix check_usm for debug builds	2022-10-19 16:42:10 +02:00
Werner, Stefan	c32a455605	Cleanup: Fixed some warnings Some unused parameters were left after changing the oneAPI device code to be a direclty linked shared library.	2022-10-13 09:45:53 +02:00
Michael Jones	ba67a383fa	Cycles: Enable MNEE on Metal (macOS >= 13) This patch enables MNEE on macOS >= 13. There was an inefficiency in the calculation of spill requirements, fixed as of macOS 13. This patch also adds a temporary inlining workaround for a Metal compiler bug which causes `mnee_compute_constraint_derivatives` to behave incorrectly. Reviewed By: brecht Differential Revision: https://developer.blender.org/D16235	2022-10-12 17:06:50 +01:00
Nikita Sirgienko	82a5790d2a	Cycles: oneAPI: Trigger compilation of used kernels only JIT compilation of oneAPI kernels now happens during load stage and proper message gets shown in the GUI during compilation. Also, this implementation skips kernels that aren't needed for the used scene, reducing overall (re)compilation time.	2022-10-10 16:38:11 +02:00
Xavier Hallade	3714d3c3ce	Cycles: link oneAPI backend with debug version of sycl when in Debug It fixes SYCL runtime issues in Debug builds that were due to mixing Release and Debug MSVC runtimes. This commit also removes specific handling of dpcpp compiler executable to simplify the CMake implementation. Using it like clang++ works and clang++ executable is also available from Intel oneAPI DPC++ compiler in case it doesn't.	2022-10-07 16:14:50 +02:00
Xavier Hallade	7eeeaec6da	Cycles: use direct linking for oneAPI backend This is a minimal set of changes, allowing a lot of cleanup that can happen afterward as it allows sycl method and objects to be used outside of kernel.cpp. Reviewed By: brecht, sergey Differential Revision: https://developer.blender.org/D15397	2022-10-07 09:50:05 +02:00
Campbell Barton	6d1d1bf2b1	Cleanup: spelling in comments Also add missing task ID.	2022-09-28 09:41:31 +10:00
Campbell Barton	72a7f107d8	Cleanup: format	2022-09-28 09:41:28 +10:00
Nikita Sirgienko	2ead05d738	Cycles: Add optional per-kernel performance statistics When verbose level 4 is enabled, Blender prints kernel performance data for Cycles on GPU backends (except Metal that doesn't use debug_enqueue_* methods) for groups of kernels. These changes introduce a new CYCLES_DEBUG_PER_KERNEL_PERFORMANCE environment variable to allow getting timings for each kernels separately and not grouped with others. This is done by adding explicit synchronization after each kernel execution. Differential Revision: https://developer.blender.org/D15971	2022-09-27 22:15:00 +02:00
Michael Jones	fc604a0be3	Cycles: Disable binary archives on macOS < 13.0 An bug with binary archives was fixed in macOS 13.0 which stops some spurious kernel recompilations. In older macOS versions, falling back on the system shader cache will prevent recompilations in most instances (this is the same behaviour as in Blender 3.1.x and 3.2.x). Reviewed By: brecht Differential Revision: https://developer.blender.org/D16082	2022-09-27 16:58:21 +01:00
Sebastian Herhoz	75a6d3abf7	Cycles: add Path Guiding on CPU through Intel OpenPGL This adds path guiding features into Cycles by integrating Intel's Open Path Guiding Library. It can be enabled in the Sampling > Path Guiding panel in the render properties. This feature helps reduce noise in scenes where finding a path to light is difficult for regular path tracing. The current implementation supports guiding directional sampling decisions on surfaces, when the material contains a least one diffuse component, and in volumes with isotropic and anisotropic Henyey-Greenstein phase functions. On surfaces, the guided sampling decision is proportional to the product of the incident radiance and the normal-oriented cosine lobe and in volumes it is proportional to the product of the incident radiance and the phase function. The incident radiance field of a scene is learned and updated during rendering after each per-frame rendering iteration/progression. At the moment, path guiding is only supported by the CPU backend. Support for GPU backends will be added in future versions of OpenPGL. Ref T92571 Differential Revision: https://developer.blender.org/D15286	2022-09-27 15:56:32 +02:00
Sergey Sharybin	3c2c296130	Fix compilation error on Windows after recent change	2022-09-13 11:52:11 +02:00
Patrick Mours	a45c36efae	Cycles: Make OSL implementation independent from SVM Cleans up the file structure to be more similar to that of the SVM and also makes it possible to build kernels with OSL support, but without having to include SVM support. This patch was split from D15902. Differential Revision: https://developer.blender.org/D15949	2022-09-13 10:59:28 +02:00
Sergey Sharybin	602cca671e	Cycles: Include reason the oneAPI library could not be loaded Additionally, just stick to a pure error stating. Such messages are aimed for developers and it is rather implied that oneAPI rendering will be disabled.	2022-09-13 10:52:18 +02:00
Josh Whelchel	74477149dd	Fix T100845: wrong Cycles OptiX runtime compilation include path Causing OptiX kernel build errors on Arch Linux. Differential Revision: https://developer.blender.org/D15891	2022-09-06 16:11:12 +02:00
Nikita Sirgienko	e1fbb4ce89	Merge branch 'blender-v3.3-release'	2022-09-06 15:39:12 +02:00
Nikita Sirgienko	8b11ed392c	Cycles: Fix crashes in oneAPI backend for scenes not fitting in dGPU memory Differential Revision: https://developer.blender.org/D15889	2022-09-06 15:38:15 +02:00
Campbell Barton	6c6a53fad3	Cleanup: spelling in comments, formatting, move comments into headers	2022-09-06 16:25:20 +10:00
Brecht Van Lommel	74caf77361	Cycles: add option to specify OptiX runtime root directory This allows individual users or Linux distributions to specify a directory Cycles will automatically look for the OptiX include folder, to compile kernels at runtime. It is still possible to override this with the OPTIX_ROOT_DIR environment variable at runtime. Based on patch by Sebastian Parborg. Ref D15792	2022-08-29 19:50:20 +02:00
Sebastian Parborg	8ffc11dbcb	Cleanup OpenGL linking and related code after libepoxy merge This cleans up the OpenGL build flags and linking. It additionally also removes some dead code. One of these dead code paths is WITH_X11_ALPHA which actually never was active even with the build flag on. The call to use this was never called because the default initializer for GHOST was set to have it off per default. Nothing called this function with a boolean value to enable it. These cleanups are needed to support true headless OpenGL rendering. Without these cleanups libepoxy will fail to load the correct OpenGL Libraries as we have already linked them to the blender binary. Reviewed By: Brecht, Campbell, Jeroen Differential Revision: http://developer.blender.org/D15554	2022-08-15 16:47:20 +02:00
Christian Rauch	a296b8f694	GPU: replace GLEW with libepoxy With libepoxy we can choose between EGL and GLX at runtime, as well as dynamically open EGL and GLX libraries without linking to them. This will make it possible to build with Wayland, EGL, GLVND support while still running on systems that only have X11, GLX and libGL. It also paves the way for headless rendering through EGL. libepoxy is a new library dependency, and is included in the precompiled libraries. GLEW is no longer a dependency, and WITH_SYSTEM_GLEW was removed. Includes contributions by Brecht Van Lommel, Ray Molenkamp, Campbell Barton and Sergey Sharybin. Ref T76428 Differential Revision: https://developer.blender.org/D15291	2022-08-15 16:10:29 +02:00
Patrick Mours	79787bf8e1	Cycles: Improve denoiser update performance when rendering with multiple GPUs This patch causes the render buffers to be copied to the denoiser device only once before denoising and output/display is then fed from that single buffer on the denoiser device. That way usually all but one copy (from all the render devices to the denoiser device) can be eliminated, provided that the denoiser device is also the display device (in which case interop is used to update the display). As such this patch also adds some logic that tries to ensure the chosen denoiser device is the same as the display device. Differential Revision: https://developer.blender.org/D15657	2022-08-12 16:00:54 +02:00
Xavier Hallade	d706d0460c	Cycles oneAPI: simplify num_concurrent_states selection The number of Execution Units and resident "threads" (simd width * threads per EUs) are now exposed and used to select the number of states using a simplified heuristic.	2022-07-27 09:45:33 +02:00
Brecht Van Lommel	f26aa186b2	Cleanup: remove __KERNEL_CPU__ This was tested in some places to check if code was being compiled for the CPU, however this is only defined in the kernel. Checking __KERNEL_GPU__ always works.	2022-07-25 17:43:35 +02:00
Brecht Van Lommel	011d3c75a7	Cleanup: compiler warning	2022-07-15 15:20:53 +02:00
Brecht Van Lommel	523bbf7065	Cycles: generalize shader sorting / locality heuristic to all GPU devices This was added for Metal, but also gives good results with CUDA and OptiX. Also enable it for future Apple GPUs instead of only M1 and M2, since this has been shown to help across multiple GPUs so the better bet seems to enable rather than disable it. Also moves some of the logic outside of the Metal device code, and always enables the code in the kernel since other devices don't do dynamic compile. Time per sample with OptiX + RTX A6000: new old barbershop_interior 0.0730s 0.0727s bmw27 0.0047s 0.0053s classroom 0.0428s 0.0464s fishy_cat 0.0102s 0.0108s junkshop 0.0366s 0.0395s koro 0.0567s 0.0578s monster 0.0206s 0.0223s pabellon 0.0158s 0.0174s sponza 0.0088s 0.0100s spring 0.1267s 0.1280s victor 0.0524s 0.0531s wdas_cloud 0.0817s 0.0816s Ref D15331, T87836	2022-07-15 13:42:47 +02:00
Michael Jones	da4ef05e4d	Cycles: Apple Silicon optimization to specialize intersection kernels The Metal backend now compiles and caches a second set of kernels which are optimized for scene contents, enabled for Apple Silicon. The implementation supports doing this both for intersection and shading kernels. However this is currently only enabled for intersection kernels that are quick to compile, and already give a good speedup. Enabling this for shading kernels would be faster still, however this also causes a long wait times and would need a good user interface to control this. M1 Max samples per minute (macOS 13.0): PSO_GENERIC PSO_SPECIALIZED_INTERSECT PSO_SPECIALIZED_SHADE barbershop_interior 83.4 89.5 93.7 bmw27 1486.1 1671.0 1825.8 classroom 175.2 196.8 206.3 fishy_cat 674.2 704.3 719.3 junkshop 205.4 212.0 257.7 koro 310.1 336.1 342.8 monster 376.7 418.6 424.1 pabellon 273.5 325.4 339.8 sponza 830.6 929.6 1142.4 victor 86.7 96.4 96.3 wdas_cloud 111.8 112.7 183.1 Code contributed by Jason Fielder, Morteza Mostajabodaveh and Michael Jones Differential Revision: https://developer.blender.org/D14645	2022-07-15 13:40:04 +02:00
Brecht Van Lommel	79da7f2a8f	Cycles: refactor to move part of KernelData definition to template header To be used for specialization on Metal in a following commit, turning these members into compile time constants. Ref D14645	2022-07-15 13:40:04 +02:00
Michael Jones	4b1d315017	Cycles: Improve cache usage on Apple GPUs by chunking active indices This patch partitions the active indices into chunks prior to sorting by material in order to tradeoff some material coherence for better locality. On Apple Silicon GPUs (particularly higher end M1-family GPUs), we observe overall render time speedups of up to 15%. The partitioning is implemented by repeating the range of `shader_sort_key` for each partition, and encoding a "locator" key which distributes the indices into sorted chunks. Reviewed By: brecht Differential Revision: https://developer.blender.org/D15331	2022-07-14 14:26:18 +01:00

1 2 3 4 5 ...

1082 Commits