test2

Author	SHA1	Message	Date
Nikita Sirgienko	82a5790d2a	Cycles: oneAPI: Trigger compilation of used kernels only JIT compilation of oneAPI kernels now happens during load stage and proper message gets shown in the GUI during compilation. Also, this implementation skips kernels that aren't needed for the used scene, reducing overall (re)compilation time.	2022-10-10 16:38:11 +02:00
Xavier Hallade	3714d3c3ce	Cycles: link oneAPI backend with debug version of sycl when in Debug It fixes SYCL runtime issues in Debug builds that were due to mixing Release and Debug MSVC runtimes. This commit also removes specific handling of dpcpp compiler executable to simplify the CMake implementation. Using it like clang++ works and clang++ executable is also available from Intel oneAPI DPC++ compiler in case it doesn't.	2022-10-07 16:14:50 +02:00
Xavier Hallade	7eeeaec6da	Cycles: use direct linking for oneAPI backend This is a minimal set of changes, allowing a lot of cleanup that can happen afterward as it allows sycl method and objects to be used outside of kernel.cpp. Reviewed By: brecht, sergey Differential Revision: https://developer.blender.org/D15397	2022-10-07 09:50:05 +02:00
Campbell Barton	6d1d1bf2b1	Cleanup: spelling in comments Also add missing task ID.	2022-09-28 09:41:31 +10:00
Campbell Barton	72a7f107d8	Cleanup: format	2022-09-28 09:41:28 +10:00
Nikita Sirgienko	2ead05d738	Cycles: Add optional per-kernel performance statistics When verbose level 4 is enabled, Blender prints kernel performance data for Cycles on GPU backends (except Metal that doesn't use debug_enqueue_* methods) for groups of kernels. These changes introduce a new CYCLES_DEBUG_PER_KERNEL_PERFORMANCE environment variable to allow getting timings for each kernels separately and not grouped with others. This is done by adding explicit synchronization after each kernel execution. Differential Revision: https://developer.blender.org/D15971	2022-09-27 22:15:00 +02:00
Michael Jones	fc604a0be3	Cycles: Disable binary archives on macOS < 13.0 An bug with binary archives was fixed in macOS 13.0 which stops some spurious kernel recompilations. In older macOS versions, falling back on the system shader cache will prevent recompilations in most instances (this is the same behaviour as in Blender 3.1.x and 3.2.x). Reviewed By: brecht Differential Revision: https://developer.blender.org/D16082	2022-09-27 16:58:21 +01:00
Sebastian Herhoz	75a6d3abf7	Cycles: add Path Guiding on CPU through Intel OpenPGL This adds path guiding features into Cycles by integrating Intel's Open Path Guiding Library. It can be enabled in the Sampling > Path Guiding panel in the render properties. This feature helps reduce noise in scenes where finding a path to light is difficult for regular path tracing. The current implementation supports guiding directional sampling decisions on surfaces, when the material contains a least one diffuse component, and in volumes with isotropic and anisotropic Henyey-Greenstein phase functions. On surfaces, the guided sampling decision is proportional to the product of the incident radiance and the normal-oriented cosine lobe and in volumes it is proportional to the product of the incident radiance and the phase function. The incident radiance field of a scene is learned and updated during rendering after each per-frame rendering iteration/progression. At the moment, path guiding is only supported by the CPU backend. Support for GPU backends will be added in future versions of OpenPGL. Ref T92571 Differential Revision: https://developer.blender.org/D15286	2022-09-27 15:56:32 +02:00
Sergey Sharybin	3c2c296130	Fix compilation error on Windows after recent change	2022-09-13 11:52:11 +02:00
Patrick Mours	a45c36efae	Cycles: Make OSL implementation independent from SVM Cleans up the file structure to be more similar to that of the SVM and also makes it possible to build kernels with OSL support, but without having to include SVM support. This patch was split from D15902. Differential Revision: https://developer.blender.org/D15949	2022-09-13 10:59:28 +02:00
Sergey Sharybin	602cca671e	Cycles: Include reason the oneAPI library could not be loaded Additionally, just stick to a pure error stating. Such messages are aimed for developers and it is rather implied that oneAPI rendering will be disabled.	2022-09-13 10:52:18 +02:00
Josh Whelchel	74477149dd	Fix T100845: wrong Cycles OptiX runtime compilation include path Causing OptiX kernel build errors on Arch Linux. Differential Revision: https://developer.blender.org/D15891	2022-09-06 16:11:12 +02:00
Nikita Sirgienko	e1fbb4ce89	Merge branch 'blender-v3.3-release'	2022-09-06 15:39:12 +02:00
Nikita Sirgienko	8b11ed392c	Cycles: Fix crashes in oneAPI backend for scenes not fitting in dGPU memory Differential Revision: https://developer.blender.org/D15889	2022-09-06 15:38:15 +02:00
Campbell Barton	6c6a53fad3	Cleanup: spelling in comments, formatting, move comments into headers	2022-09-06 16:25:20 +10:00
Brecht Van Lommel	74caf77361	Cycles: add option to specify OptiX runtime root directory This allows individual users or Linux distributions to specify a directory Cycles will automatically look for the OptiX include folder, to compile kernels at runtime. It is still possible to override this with the OPTIX_ROOT_DIR environment variable at runtime. Based on patch by Sebastian Parborg. Ref D15792	2022-08-29 19:50:20 +02:00
Sebastian Parborg	8ffc11dbcb	Cleanup OpenGL linking and related code after libepoxy merge This cleans up the OpenGL build flags and linking. It additionally also removes some dead code. One of these dead code paths is WITH_X11_ALPHA which actually never was active even with the build flag on. The call to use this was never called because the default initializer for GHOST was set to have it off per default. Nothing called this function with a boolean value to enable it. These cleanups are needed to support true headless OpenGL rendering. Without these cleanups libepoxy will fail to load the correct OpenGL Libraries as we have already linked them to the blender binary. Reviewed By: Brecht, Campbell, Jeroen Differential Revision: http://developer.blender.org/D15554	2022-08-15 16:47:20 +02:00
Christian Rauch	a296b8f694	GPU: replace GLEW with libepoxy With libepoxy we can choose between EGL and GLX at runtime, as well as dynamically open EGL and GLX libraries without linking to them. This will make it possible to build with Wayland, EGL, GLVND support while still running on systems that only have X11, GLX and libGL. It also paves the way for headless rendering through EGL. libepoxy is a new library dependency, and is included in the precompiled libraries. GLEW is no longer a dependency, and WITH_SYSTEM_GLEW was removed. Includes contributions by Brecht Van Lommel, Ray Molenkamp, Campbell Barton and Sergey Sharybin. Ref T76428 Differential Revision: https://developer.blender.org/D15291	2022-08-15 16:10:29 +02:00
Patrick Mours	79787bf8e1	Cycles: Improve denoiser update performance when rendering with multiple GPUs This patch causes the render buffers to be copied to the denoiser device only once before denoising and output/display is then fed from that single buffer on the denoiser device. That way usually all but one copy (from all the render devices to the denoiser device) can be eliminated, provided that the denoiser device is also the display device (in which case interop is used to update the display). As such this patch also adds some logic that tries to ensure the chosen denoiser device is the same as the display device. Differential Revision: https://developer.blender.org/D15657	2022-08-12 16:00:54 +02:00
Xavier Hallade	d706d0460c	Cycles oneAPI: simplify num_concurrent_states selection The number of Execution Units and resident "threads" (simd width * threads per EUs) are now exposed and used to select the number of states using a simplified heuristic.	2022-07-27 09:45:33 +02:00
Brecht Van Lommel	f26aa186b2	Cleanup: remove __KERNEL_CPU__ This was tested in some places to check if code was being compiled for the CPU, however this is only defined in the kernel. Checking __KERNEL_GPU__ always works.	2022-07-25 17:43:35 +02:00
Brecht Van Lommel	011d3c75a7	Cleanup: compiler warning	2022-07-15 15:20:53 +02:00
Brecht Van Lommel	523bbf7065	Cycles: generalize shader sorting / locality heuristic to all GPU devices This was added for Metal, but also gives good results with CUDA and OptiX. Also enable it for future Apple GPUs instead of only M1 and M2, since this has been shown to help across multiple GPUs so the better bet seems to enable rather than disable it. Also moves some of the logic outside of the Metal device code, and always enables the code in the kernel since other devices don't do dynamic compile. Time per sample with OptiX + RTX A6000: new old barbershop_interior 0.0730s 0.0727s bmw27 0.0047s 0.0053s classroom 0.0428s 0.0464s fishy_cat 0.0102s 0.0108s junkshop 0.0366s 0.0395s koro 0.0567s 0.0578s monster 0.0206s 0.0223s pabellon 0.0158s 0.0174s sponza 0.0088s 0.0100s spring 0.1267s 0.1280s victor 0.0524s 0.0531s wdas_cloud 0.0817s 0.0816s Ref D15331, T87836	2022-07-15 13:42:47 +02:00
Michael Jones	da4ef05e4d	Cycles: Apple Silicon optimization to specialize intersection kernels The Metal backend now compiles and caches a second set of kernels which are optimized for scene contents, enabled for Apple Silicon. The implementation supports doing this both for intersection and shading kernels. However this is currently only enabled for intersection kernels that are quick to compile, and already give a good speedup. Enabling this for shading kernels would be faster still, however this also causes a long wait times and would need a good user interface to control this. M1 Max samples per minute (macOS 13.0): PSO_GENERIC PSO_SPECIALIZED_INTERSECT PSO_SPECIALIZED_SHADE barbershop_interior 83.4 89.5 93.7 bmw27 1486.1 1671.0 1825.8 classroom 175.2 196.8 206.3 fishy_cat 674.2 704.3 719.3 junkshop 205.4 212.0 257.7 koro 310.1 336.1 342.8 monster 376.7 418.6 424.1 pabellon 273.5 325.4 339.8 sponza 830.6 929.6 1142.4 victor 86.7 96.4 96.3 wdas_cloud 111.8 112.7 183.1 Code contributed by Jason Fielder, Morteza Mostajabodaveh and Michael Jones Differential Revision: https://developer.blender.org/D14645	2022-07-15 13:40:04 +02:00
Brecht Van Lommel	79da7f2a8f	Cycles: refactor to move part of KernelData definition to template header To be used for specialization on Metal in a following commit, turning these members into compile time constants. Ref D14645	2022-07-15 13:40:04 +02:00
Michael Jones	4b1d315017	Cycles: Improve cache usage on Apple GPUs by chunking active indices This patch partitions the active indices into chunks prior to sorting by material in order to tradeoff some material coherence for better locality. On Apple Silicon GPUs (particularly higher end M1-family GPUs), we observe overall render time speedups of up to 15%. The partitioning is implemented by repeating the range of `shader_sort_key` for each partition, and encoding a "locator" key which distributes the indices into sorted chunks. Reviewed By: brecht Differential Revision: https://developer.blender.org/D15331	2022-07-14 14:26:18 +01:00
Xavier Hallade	41c10ac84a	Cycles: fix support for multiple Intel GPUs Identical Intel GPUs ended up with the same id. Added PCI BDF to the id to make it unique.	2022-07-01 11:20:00 +02:00
Campbell Barton	b6c28002ac	Cleanup: spelling in comments	2022-06-30 12:14:22 +10:00
Xavier Hallade	a02992f131	Cycles: Add support for rendering on Intel GPUs using oneAPI This patch adds a new Cycles device with similar functionality to the existing GPU devices. Kernel compilation and runtime interaction happen via oneAPI DPC++ compiler and SYCL API. This implementation is primarly focusing on Intel® Arc™ GPUs and other future Intel GPUs. The first supported drivers are 101.1660 on Windows and 22.10.22597 on Linux. The necessary tools for compilation are: - A SYCL compiler such as oneAPI DPC++ compiler or https://github.com/intel/llvm - Intel® oneAPI Level Zero which is used for low level device queries: https://github.com/oneapi-src/level-zero - To optionally generate prebuilt graphics binaries: Intel® Graphics Compiler All are included in Linux precompiled libraries on svn: https://svn.blender.org/svnroot/bf-blender/trunk/lib The same goes for Windows precompiled binaries but for the graphics compiler, available as "Intel® Graphics Offline Compiler for OpenCL™ Code" from https://www.intel.com/content/www/us/en/developer/articles/tool/oneapi-standalone-components.html, for which path can be set as OCLOC_INSTALL_DIR. Being based on the open SYCL standard, this implementation could also be extended to run on other compatible non-Intel hardware in the future. Reviewed By: sergey, brecht Differential Revision: https://developer.blender.org/D15254 Co-authored-by: Nikita Sirgienko <nikita.sirgienko@intel.com> Co-authored-by: Stefan Werner <stefan.werner@intel.com>	2022-06-29 12:58:04 +02:00
Sayak Biswas	abfa09752f	Cycles: enable Vega GPU/APU support Enables Vega and Vega II GPUs as well as Vega APU, using changes in HIP code to support 64-bit waves and a new HIP SDK version. Tested with Radeon WX9100, Radeon VII GPUs and Ryzen 7 PRO 5850U with Radeon Graphics APU. Ref T96740, T91571 Differential Revision: https://developer.blender.org/D15242	2022-06-28 18:35:43 +02:00
Brecht Van Lommel	9b6e86ace1	Cycles: stop Metal rendering on command buffer error If there is an error we should stop rendering, instead of finishing with a wrong render result or reporting a wrong benchmark time. Ref T96519 Differential Revision: https://developer.blender.org/D15287	2022-06-24 16:51:56 +02:00
Brecht Van Lommel	a5ff46e0fc	Cleanup: make format	2022-06-23 19:28:39 +02:00
Michael Jones	d8e9647ae2	Cycles: Add diagnostic tracing of MTLLibrary compilation time Reviewed By: sergey Differential Revision: https://developer.blender.org/D15268	2022-06-23 10:06:20 +01:00
Michael Jones	532b33973b	Cycles: Tidy of KernelData patchup code Reviewed By: sergey Differential Revision: https://developer.blender.org/D15267	2022-06-22 22:38:00 +01:00
Michael Jones	328a911379	Cycles: Distinguish Apple GPUs by core count This patch suffixes Apple GPU device names with `(GPU - # cores)` so that variant GPUs with the same chipset can be distinguished. Currently benchmark scores for these M1 family GPUs are being incorrectly merged: - M1: 7 or 8 cores - M1 Pro: 14 or 16 cores - M1 Max: 24 or 32 cores - M1 Ultra: 48 or 64 cores Reviewed By: brecht, sergey Differential Revision: https://developer.blender.org/D15257	2022-06-22 22:32:56 +01:00
Brecht Van Lommel	ff1883307f	Cleanup: renaming and consistency for kernel data * Rename "texture" to "data array". This has not used textures for a long time, there are just global memory arrays now. (On old CUDA GPUs there was a cache for textures but not global memory, so we used to put all data in textures.) * For CUDA and HIP, put globals in KernelParams struct like other devices. * Drop __ prefix for data array names, no possibility for naming conflict now that these are in a struct.	2022-06-20 12:30:48 +02:00
Brecht Van Lommel	2c1bffa286	Cleanup: add verbose logging category names instead of numbers And use them more consistently than before.	2022-06-17 14:08:14 +02:00
Michael Jones	19e0b60f3e	Cycles: MetalDeviceQueue - capture of multiple dispatches, and some tidying This patch adds a new mode of gpu capture (env var `CYCLES_DEBUG_METAL_CAPTURE_SAMPLES`) to capture a block of dispatches between "reset" calls. It also fixes member data naming inconsistencies and adds some missing OS version checks. Screenshot showing .gputrace capture in Xcode 14.0 beta (using `CYCLES_DEBUG_METAL_CAPTURE_SAMPLES="1"` and `CYCLES_DEBUG_METAL_CAPTURE_LIMIT="10"`): {F13155703} Reviewed By: sergey, brecht Differential Revision: https://developer.blender.org/D15179	2022-06-13 13:42:07 +01:00
Sergey Sharybin	0fddff027e	Cleanup: Unused but set variable in Cycles Metal profiler	2022-06-09 10:20:26 +02:00
Michael Jones	4412e14708	Cycles: Useful Metal backend debug & profiling functionality This patch adds some useful debugging & profiling env vars to the Metal backend: - `CYCLES_METAL_PROFILING`: output a per-kernel timing report at the end of the render - `CYCLES_METAL_DEBUG`: enable per-dispatch tracing (very verbose) - `CYCLES_DEBUG_METAL_CAPTURE_KERNEL`: enable programatic .gputrace capture for a specified kernel index Here's an example of the timing report with `CYCLES_METAL_PROFILING` enabled: ``` --------------------------------------------------------------------------------------------------- Kernel name Total threads Dispatches Avg. T/D Time Time% --------------------------------------------------------------------------------------------------- integrator_init_from_camera 657,407,232 161 4,083,274 0.24s 0.51% integrator_intersect_closest 1,629,288,440 681 2,392,494 15.18s 32.12% integrator_intersect_shadow 751,652,291 470 1,599,260 5.80s 12.28% integrator_shade_background 304,612,074 263 1,158,220 1.16s 2.45% integrator_shade_surface 1,159,764,041 676 1,715,627 20.57s 43.52% integrator_shade_shadow 598,885,847 418 1,432,741 1.27s 2.69% integrator_queued_paths_array 2,969,650,130 805 3,689,006 0.35s 0.74% integrator_queued_shadow_paths_array 593,936,619 379 1,567,115 0.14s 0.29% integrator_terminated_paths_array 22,205,417 155 143,260 0.05s 0.10% integrator_sorted_paths_array 2,517,140,043 676 3,723,579 1.65s 3.50% integrator_compact_paths_array 648,912,748 155 4,186,533 0.03s 0.07% integrator_compact_states 20,872,687 155 134,662 0.14s 0.29% integrator_terminated_shadow_paths_array 374,100,675 438 854,111 0.16s 0.33% integrator_compact_shadow_paths_array 503,768,657 438 1,150,156 0.05s 0.10% integrator_compact_shadow_states 37,664,941 202 186,460 0.23s 0.50% integrator_reset 25,165,824 6 4,194,304 0.06s 0.12% film_convert_combined_half_rgba 3,110,400 6 518,400 0.00s 0.01% prefix_sum 676 676 1 0.19s 0.40% --------------------------------------------------------------------------------------------------- 6,760 47.27s 100.00% --------------------------------------------------------------------------------------------------- ``` Reviewed By: brecht Differential Revision: https://developer.blender.org/D15044	2022-06-07 11:08:39 +01:00
Patrick Mours	5c6053ccb1	Fix misaligned address error when rendering 3D curves in the viewport with Cycles and OptiX 7.4 Acceleration structures in the viewport default to building with the fast build flag, but the intersection program used for curves was queried with the fast trace flag. The resulting mismatch caused an exception in the intersection kernel. Since it's difficult to predict whether dynamic or static acceleration structures are going to be built at the time of kernel loading, this fixes the mismatch by always using the fast trace flag for curves.	2022-06-03 12:24:13 +02:00
Campbell Barton	61a7e5be18	Cleanup: '*' prefix C-comment blocks	2022-06-01 15:38:48 +10:00
Brecht Van Lommel	610619c203	Merge branch 'blender-v3.2-release'	2022-05-31 17:35:16 +02:00
Brecht Van Lommel	f2cd7e08fe	Fix Cycles MNEE not working for Metal Move MNEE to own kernel, separate from shader ray-tracing. This does introduce the limitation that a shader can't use both MNEE and AO/bevel, but that seems like the better trade-off for now. We can experiment with bigger kernel organization changes later. Differential Revision: https://developer.blender.org/D15070	2022-05-31 17:24:43 +02:00
Patrick Mours	a8c81ffa83	Cycles: Add half precision float support for volumes with NanoVDB This patch makes it possible to change the precision with which to store volume data in the NanoVDB data structure (as float, half, or using variable bit quantization) via the previously unused precision field in the volume data block. It makes it possible to further reduce memory usage during rendering, at a slight cost to the visual detail of a volume. Differential Revision: https://developer.blender.org/D10023	2022-05-23 19:08:01 +02:00
Campbell Barton	427a2c920a	Cleanup: spelling in comments, capitalize tags Also add missing task-ID reference & remove colon after \note as it doesn't render properly in doxygen.	2022-05-13 09:29:25 +10:00
Michael Jones	007184bcf2	Enable inlining on Apple Silicon. Use new process-wide ShaderCache in order to safely re-enable binary archives This patch is the same as D14763, but with a fix for unit test failures caused by ShaderCache fetch logic not working in the non-MetalRT case: ``` diff --git a/intern/cycles/device/metal/kernel.mm b/intern/cycles/device/metal/kernel.mm index ad268ae7057..6aa1a56056e 100644 --- a/intern/cycles/device/metal/kernel.mm +++ b/intern/cycles/device/metal/kernel.mm @@ -203,9 +203,12 @@ bool kernel_has_intersection(DeviceKernel device_kernel) /* metalrt options / request.pipeline->use_metalrt = device->use_metalrt; - request.pipeline->metalrt_hair = device->kernel_features & KERNEL_FEATURE_HAIR; - request.pipeline->metalrt_hair_thick = device->kernel_features & KERNEL_FEATURE_HAIR_THICK; - request.pipeline->metalrt_pointcloud = device->kernel_features & KERNEL_FEATURE_POINTCLOUD; + request.pipeline->metalrt_hair = device->use_metalrt && + (device->kernel_features & KERNEL_FEATURE_HAIR); + request.pipeline->metalrt_hair_thick = device->use_metalrt && + (device->kernel_features & KERNEL_FEATURE_HAIR_THICK); + request.pipeline->metalrt_pointcloud = device->use_metalrt && + (device->kernel_features & KERNEL_FEATURE_POINTCLOUD); { thread_scoped_lock lock(cache_mutex); @@ -225,9 +228,9 @@ bool kernel_has_intersection(DeviceKernel device_kernel) / metalrt options / bool use_metalrt = device->use_metalrt; - bool metalrt_hair = device->kernel_features & KERNEL_FEATURE_HAIR; - bool metalrt_hair_thick = device->kernel_features & KERNEL_FEATURE_HAIR_THICK; - bool metalrt_pointcloud = device->kernel_features & KERNEL_FEATURE_POINTCLOUD; + bool metalrt_hair = use_metalrt && (device->kernel_features & KERNEL_FEATURE_HAIR); + bool metalrt_hair_thick = use_metalrt && (device->kernel_features & KERNEL_FEATURE_HAIR_THICK); + bool metalrt_pointcloud = use_metalrt && (device->kernel_features & KERNEL_FEATURE_POINTCLOUD); MetalKernelPipeline best_pipeline = nullptr; for (auto &pipeline : collection) { ``` Reviewed By: brecht Differential Revision: https://developer.blender.org/D14923	2022-05-11 16:20:59 +01:00
Campbell Barton	12a1fa9cf4	Cleanup: format	2022-05-06 18:27:44 +10:00
Patrick Mours	6fa5d520b8	Cycles: Add support for parallel compilation of OptiX module OptiX 7.4 adds support for splitting the costly creation of an OptiX module into smaller tasks that can be executed in parallel on a thread pool. This is only really relevant for the "shader_raytrace" kernel variant as the main one is small and compiles fast either way. It sheds of a few seconds there (total gain is not massive currently, since it is difficult for the compiler to split up the huge shading entry point that is the primary one taking up time, but it is still measurable). Differential Revision: https://developer.blender.org/D14845	2022-05-05 14:35:41 +02:00
Brecht Van Lommel	52a5f68562	Revert "Cycles: Enable inlining on Apple Silicon for 1.1x speedup" This reverts commit `b82de02e7c`. It is causing crashes in various regression tests. Ref D14763	2022-04-28 00:46:43 +02:00

1 2 3 4 5 ...

1058 Commits