test2

Author	SHA1	Message	Date
Brecht Van Lommel	c2c019dda8	Fix Cycles MetalRT compile error	2022-08-13 19:55:38 +02:00
Nikita Sirgienko	1382514bf2	Fix: Error in oneAPI image code for texture access with clip extension	2022-08-08 10:47:11 +02:00
Brecht Van Lommel	fa514564b0	Fix T99201: Cycles render difference with 3D hair curves between OptiX and Emrbee It should consistently use the Cycles pirmitive ID for self intersection detection, not the one from the OptiX or Embree acceleration structure. Differential Revision: https://developer.blender.org/D15632	2022-08-05 15:03:47 +02:00
Nikita Sirgienko	76169472d3	Cycles: Resolve recent performance regression in oneAPI implementation for Intel® Arc™ GPUs Recently, performance with oneAPI have regressed due some recent changes in Blender itself. This commit's changes is resolving this and also improve compilation time for oneAPI backend first execution (or Blender compilation time in case of AoT). Regression have appeared after `5152c7c152` and not related to the changes itself, but increase of kernels complexity introduced with it. Changes in this commit is marking some Blender functions as noinlined for oneAPI backend, which helps GPU compiler to deal with this complexity without any negative side-effects on performance.	2022-08-01 12:45:34 +02:00
Brecht Van Lommel	38af5b0501	Cycles: switch Cycles triangle barycentric convention to match Embree/OptiX Simplifies intersection code a little and slightly improves precision regarding self intersection. The parametric texture coordinate in shader nodes is still the same as before for compatibility.	2022-07-27 21:03:33 +02:00
Brecht Van Lommel	cd47d1b2ed	Fix broken BVH2 on CPU after recent changes Runtime switching between Embree and BVH2 got lost.	2022-07-27 20:58:02 +02:00
Xavier Hallade	d706d0460c	Cycles oneAPI: simplify num_concurrent_states selection The number of Execution Units and resident "threads" (simd width * threads per EUs) are now exposed and used to select the number of states using a simplified heuristic.	2022-07-27 09:45:33 +02:00
Campbell Barton	f1f89ca751	Cleanup: spelling in comments	2022-07-26 13:21:21 +10:00
Brecht Van Lommel	4cf6524731	Fix Cycles Metal build errors after recent changes float8 is a reserved type in Metal, but is not implemented. So rename to float8_t for now. Also move back intersection handlers to kernel.metal, they can't be in the class that encapsulates the other Metal kernel functions.	2022-07-26 00:17:37 +02:00
Brecht Van Lommel	f26aa186b2	Cleanup: remove __KERNEL_CPU__ This was tested in some places to check if code was being compiled for the CPU, however this is only defined in the kernel. Checking __KERNEL_GPU__ always works.	2022-07-25 17:43:35 +02:00
Brecht Van Lommel	7a74d91e32	Cleanup: move device BVH code to kernel/device/*/bvh.h Having the OptiX/MetalRT/Embree/MetalRT implementations all in one file with many #ifdefs became too confusing. Instead split it up per device, and also move it together with device specific hit/filter/intersect functions and associated data types.	2022-07-25 16:34:22 +02:00
Brecht Van Lommel	484ad31653	Cycles: simplify handling of ray distance in GPU rendering All our intersections functions now work with unnormalized ray direction, which means we no longer need to transform ray distance between world and object space, they can all remain in world space. There doesn't seem to be any real performance difference one way or the other, but it does simplify the code.	2022-07-25 13:27:40 +02:00
Brecht Van Lommel	5152c7c152	Cycles: refactor rays to have start and end distance, fix precision issues For transparency, volume and light intersection rays, adjust these distances rather than the ray start position. This way we increment the start distance by the smallest possible float increment to avoid self intersections, and be sure it works as the distance compared to be will be exactly the same as before, due to the ray start position and direction remaining the same. Fix T98764, T96537, hair ray tracing precision issues. Differential Revision: https://developer.blender.org/D15455	2022-07-15 18:46:24 +02:00
Brecht Van Lommel	bb376da6df	Fix Cycles MetalRT error after recent specialization changes	2022-07-15 18:28:13 +02:00
Michael Jones	da4ef05e4d	Cycles: Apple Silicon optimization to specialize intersection kernels The Metal backend now compiles and caches a second set of kernels which are optimized for scene contents, enabled for Apple Silicon. The implementation supports doing this both for intersection and shading kernels. However this is currently only enabled for intersection kernels that are quick to compile, and already give a good speedup. Enabling this for shading kernels would be faster still, however this also causes a long wait times and would need a good user interface to control this. M1 Max samples per minute (macOS 13.0): PSO_GENERIC PSO_SPECIALIZED_INTERSECT PSO_SPECIALIZED_SHADE barbershop_interior 83.4 89.5 93.7 bmw27 1486.1 1671.0 1825.8 classroom 175.2 196.8 206.3 fishy_cat 674.2 704.3 719.3 junkshop 205.4 212.0 257.7 koro 310.1 336.1 342.8 monster 376.7 418.6 424.1 pabellon 273.5 325.4 339.8 sponza 830.6 929.6 1142.4 victor 86.7 96.4 96.3 wdas_cloud 111.8 112.7 183.1 Code contributed by Jason Fielder, Morteza Mostajabodaveh and Michael Jones Differential Revision: https://developer.blender.org/D14645	2022-07-15 13:40:04 +02:00
Xavier Hallade	47dd42485e	Cycles: fix and enable JIT oneAPI CentOS7 builds for drivers 23570+ The current specific CentOS7 workaround we have for AoT, which is to disable __FAST_MATH__ by using -fhonor-nans, now also fixes the compilation issue for JIT as well since at least driver 23570.	2022-07-12 15:55:32 +02:00
Xavier Hallade	0f50ae131f	Cycles: enable oneAPI in Linux release builds with a very high min-driver version requirement, placeholder until JIT CentOS runtime compilation issue gets fixed in a defined version. min-driver version check can be worked around by setting CYCLES_ONEAPI_ALL_DEVICES environment variable.	2022-07-08 15:39:13 +02:00
Xavier Hallade	190ad73590	Cycles oneAPI: Remove direct dependency on Level-Zero We used it only to access device id for explicitly allowing Arc GPUs. It made the backend require ze_loader.dll which could be problematic if we end up using direct linking. I've replaced filtering based on PCI device id by using other HW properties instead (EUs, threads per EU), that are now available through Level-Zero.	2022-07-06 18:55:38 +02:00
Xavier Hallade	debb233787	Cleanup: fix comments in oneAPI kernel.cpp	2022-07-06 18:55:38 +02:00
Nikita Sirgienko	0df574b55e	Cycles: Improve an occupancy for Intel GPUs Initially oneAPI implementation have waited after each memory operation, even if there was no need for this. Now, the implementation will wait only if it is really necessary - it have improved performance noticeble for some scenes and a bit for the rest of them.	2022-07-06 17:26:23 +02:00
Xavier Hallade	41c10ac84a	Cycles: fix support for multiple Intel GPUs Identical Intel GPUs ended up with the same id. Added PCI BDF to the id to make it unique.	2022-07-01 11:20:00 +02:00
Xavier Hallade	0554537c3c	Cleanup: add missing license headers in Cycles oneAPI implementation	2022-07-01 10:13:07 +02:00
Campbell Barton	b6c28002ac	Cleanup: spelling in comments	2022-06-30 12:14:22 +10:00
Xavier Hallade	a02992f131	Cycles: Add support for rendering on Intel GPUs using oneAPI This patch adds a new Cycles device with similar functionality to the existing GPU devices. Kernel compilation and runtime interaction happen via oneAPI DPC++ compiler and SYCL API. This implementation is primarly focusing on Intel® Arc™ GPUs and other future Intel GPUs. The first supported drivers are 101.1660 on Windows and 22.10.22597 on Linux. The necessary tools for compilation are: - A SYCL compiler such as oneAPI DPC++ compiler or https://github.com/intel/llvm - Intel® oneAPI Level Zero which is used for low level device queries: https://github.com/oneapi-src/level-zero - To optionally generate prebuilt graphics binaries: Intel® Graphics Compiler All are included in Linux precompiled libraries on svn: https://svn.blender.org/svnroot/bf-blender/trunk/lib The same goes for Windows precompiled binaries but for the graphics compiler, available as "Intel® Graphics Offline Compiler for OpenCL™ Code" from https://www.intel.com/content/www/us/en/developer/articles/tool/oneapi-standalone-components.html, for which path can be set as OCLOC_INSTALL_DIR. Being based on the open SYCL standard, this implementation could also be extended to run on other compatible non-Intel hardware in the future. Reviewed By: sergey, brecht Differential Revision: https://developer.blender.org/D15254 Co-authored-by: Nikita Sirgienko <nikita.sirgienko@intel.com> Co-authored-by: Stefan Werner <stefan.werner@intel.com>	2022-06-29 12:58:04 +02:00
Sayak Biswas	abfa09752f	Cycles: enable Vega GPU/APU support Enables Vega and Vega II GPUs as well as Vega APU, using changes in HIP code to support 64-bit waves and a new HIP SDK version. Tested with Radeon WX9100, Radeon VII GPUs and Ryzen 7 PRO 5850U with Radeon Graphics APU. Ref T96740, T91571 Differential Revision: https://developer.blender.org/D15242	2022-06-28 18:35:43 +02:00
Brecht Van Lommel	ff1883307f	Cleanup: renaming and consistency for kernel data * Rename "texture" to "data array". This has not used textures for a long time, there are just global memory arrays now. (On old CUDA GPUs there was a cache for textures but not global memory, so we used to put all data in textures.) * For CUDA and HIP, put globals in KernelParams struct like other devices. * Drop __ prefix for data array names, no possibility for naming conflict now that these are in a struct.	2022-06-20 12:30:48 +02:00
Michael Jones	4412e14708	Cycles: Useful Metal backend debug & profiling functionality This patch adds some useful debugging & profiling env vars to the Metal backend: - `CYCLES_METAL_PROFILING`: output a per-kernel timing report at the end of the render - `CYCLES_METAL_DEBUG`: enable per-dispatch tracing (very verbose) - `CYCLES_DEBUG_METAL_CAPTURE_KERNEL`: enable programatic .gputrace capture for a specified kernel index Here's an example of the timing report with `CYCLES_METAL_PROFILING` enabled: ``` --------------------------------------------------------------------------------------------------- Kernel name Total threads Dispatches Avg. T/D Time Time% --------------------------------------------------------------------------------------------------- integrator_init_from_camera 657,407,232 161 4,083,274 0.24s 0.51% integrator_intersect_closest 1,629,288,440 681 2,392,494 15.18s 32.12% integrator_intersect_shadow 751,652,291 470 1,599,260 5.80s 12.28% integrator_shade_background 304,612,074 263 1,158,220 1.16s 2.45% integrator_shade_surface 1,159,764,041 676 1,715,627 20.57s 43.52% integrator_shade_shadow 598,885,847 418 1,432,741 1.27s 2.69% integrator_queued_paths_array 2,969,650,130 805 3,689,006 0.35s 0.74% integrator_queued_shadow_paths_array 593,936,619 379 1,567,115 0.14s 0.29% integrator_terminated_paths_array 22,205,417 155 143,260 0.05s 0.10% integrator_sorted_paths_array 2,517,140,043 676 3,723,579 1.65s 3.50% integrator_compact_paths_array 648,912,748 155 4,186,533 0.03s 0.07% integrator_compact_states 20,872,687 155 134,662 0.14s 0.29% integrator_terminated_shadow_paths_array 374,100,675 438 854,111 0.16s 0.33% integrator_compact_shadow_paths_array 503,768,657 438 1,150,156 0.05s 0.10% integrator_compact_shadow_states 37,664,941 202 186,460 0.23s 0.50% integrator_reset 25,165,824 6 4,194,304 0.06s 0.12% film_convert_combined_half_rgba 3,110,400 6 518,400 0.00s 0.01% prefix_sum 676 676 1 0.19s 0.40% --------------------------------------------------------------------------------------------------- 6,760 47.27s 100.00% --------------------------------------------------------------------------------------------------- ``` Reviewed By: brecht Differential Revision: https://developer.blender.org/D15044	2022-06-07 11:08:39 +01:00
Brecht Van Lommel	610619c203	Merge branch 'blender-v3.2-release'	2022-05-31 17:35:16 +02:00
Brecht Van Lommel	f2cd7e08fe	Fix Cycles MNEE not working for Metal Move MNEE to own kernel, separate from shader ray-tracing. This does introduce the limitation that a shader can't use both MNEE and AO/bevel, but that seems like the better trade-off for now. We can experiment with bigger kernel organization changes later. Differential Revision: https://developer.blender.org/D15070	2022-05-31 17:24:43 +02:00
Jacques Lucke	25d216724b	Cleanup: make format	2022-05-24 15:53:16 +02:00
Patrick Mours	a8c81ffa83	Cycles: Add half precision float support for volumes with NanoVDB This patch makes it possible to change the precision with which to store volume data in the NanoVDB data structure (as float, half, or using variable bit quantization) via the previously unused precision field in the volume data block. It makes it possible to further reduce memory usage during rendering, at a slight cost to the visual detail of a volume. Differential Revision: https://developer.blender.org/D10023	2022-05-23 19:08:01 +02:00
Sergey Sharybin	698e394e7e	Merge branch 'blender-v3.2-release'	2022-05-23 16:02:52 +02:00
Sergey Sharybin	9bb4bf5748	Fix missing 64bit casts when calculating Cycles render buffer offset Found those missing casts while looking into a crash report made in the Blender Chat. Was unable to reproduce the crash, but the casts should totally be there to avoid integer overflow.	2022-05-23 15:59:52 +02:00
Michael Jones	007184bcf2	Enable inlining on Apple Silicon. Use new process-wide ShaderCache in order to safely re-enable binary archives This patch is the same as D14763, but with a fix for unit test failures caused by ShaderCache fetch logic not working in the non-MetalRT case: ``` diff --git a/intern/cycles/device/metal/kernel.mm b/intern/cycles/device/metal/kernel.mm index ad268ae7057..6aa1a56056e 100644 --- a/intern/cycles/device/metal/kernel.mm +++ b/intern/cycles/device/metal/kernel.mm @@ -203,9 +203,12 @@ bool kernel_has_intersection(DeviceKernel device_kernel) /* metalrt options / request.pipeline->use_metalrt = device->use_metalrt; - request.pipeline->metalrt_hair = device->kernel_features & KERNEL_FEATURE_HAIR; - request.pipeline->metalrt_hair_thick = device->kernel_features & KERNEL_FEATURE_HAIR_THICK; - request.pipeline->metalrt_pointcloud = device->kernel_features & KERNEL_FEATURE_POINTCLOUD; + request.pipeline->metalrt_hair = device->use_metalrt && + (device->kernel_features & KERNEL_FEATURE_HAIR); + request.pipeline->metalrt_hair_thick = device->use_metalrt && + (device->kernel_features & KERNEL_FEATURE_HAIR_THICK); + request.pipeline->metalrt_pointcloud = device->use_metalrt && + (device->kernel_features & KERNEL_FEATURE_POINTCLOUD); { thread_scoped_lock lock(cache_mutex); @@ -225,9 +228,9 @@ bool kernel_has_intersection(DeviceKernel device_kernel) / metalrt options / bool use_metalrt = device->use_metalrt; - bool metalrt_hair = device->kernel_features & KERNEL_FEATURE_HAIR; - bool metalrt_hair_thick = device->kernel_features & KERNEL_FEATURE_HAIR_THICK; - bool metalrt_pointcloud = device->kernel_features & KERNEL_FEATURE_POINTCLOUD; + bool metalrt_hair = use_metalrt && (device->kernel_features & KERNEL_FEATURE_HAIR); + bool metalrt_hair_thick = use_metalrt && (device->kernel_features & KERNEL_FEATURE_HAIR_THICK); + bool metalrt_pointcloud = use_metalrt && (device->kernel_features & KERNEL_FEATURE_POINTCLOUD); MetalKernelPipeline best_pipeline = nullptr; for (auto &pipeline : collection) { ``` Reviewed By: brecht Differential Revision: https://developer.blender.org/D14923	2022-05-11 16:20:59 +01:00
Brecht Van Lommel	52a5f68562	Revert "Cycles: Enable inlining on Apple Silicon for 1.1x speedup" This reverts commit `b82de02e7c`. It is causing crashes in various regression tests. Ref D14763	2022-04-28 00:46:43 +02:00
Michael Jones	b82de02e7c	Cycles: Enable inlining on Apple Silicon for 1.1x speedup This is a stripped down version of D14645 without the scene specialisation optimisations. The two major changes in this patch are: - Enables more aggressive inlining on Apple Silicon resulting in a 1.1x speedup and 10% reduction in spill, at the cost of longer pipeline build times - Revival of shader binary archives through a new ShaderCache which is shared between MetalDevice instances using the same physical MTLDevice. This mitigates the extra compile times via explicit caching (rather than, as before, relying on the implicit system shader cache which can be purged without notice) Reviewed By: brecht Differential Revision: https://developer.blender.org/D14763	2022-04-26 22:17:16 +01:00
Brecht Van Lommel	8d2da45f98	Revert "Fix Cycles HIP assuming warp size 32" This reverts commit `390b9f1305`. It seems to break things on Linux for unknown reasons, so leave it out for now. A solution to this will be required for Vega cards though.	2022-04-20 18:09:23 +02:00
Stefan Werner	65dcb5ebd3	Cycles: Semantically separate 2D and 3D texture objects Currently there are no functional changes. Preparing for an upcoming oneAPI integration where such separation in types is needed.	2022-04-01 19:44:31 +02:00
Stefan Werner	9c6dff70c8	Cycles: Introduce postfix for kernel body definition Increases flexibility of code-generation for kernel entry points. Currently no functional changes, preparing for integration with oneAPI.	2022-04-01 19:44:02 +02:00
Brecht Van Lommel	51380b9346	Fix Cycles Metal build error and GCC warning after recent changes Function overloading of make_float4() doesn't work since it's a macro, just don't do this minor cleanup then.	2022-03-23 23:25:31 +01:00
Kévin Dietrich	d84b4becd3	Fix compile error on GCC Explicit template specialization has to happen outside of class definition (some compilers are more lenient). Since it is not possible to specialize the method without also specializing the enclosing class for all of its possible types, the method is moved outside of the class, and specialized there.	2022-03-23 22:01:32 +01:00
Ethan-Hall	4e56e738a8	Cycles: optimize CPU texture sampler interpolation Use templates to optimize the CPU texture sampler to interpolate using float for single component datatypes instead of using float4 for all types. Differential Revision: https://developer.blender.org/D14424	2022-03-23 20:06:12 +01:00
Ethan-Hall	4abb8a14a2	Cycles: make 3D texture sampling at boundaries more similar to GPU CPU code for cubic interpolation with clip texture extension only performed texture interpolation inside the range of [0,1]. As a result, even though the volume's color is sampled using cubic interpolation, the boundary is not being interpolated. The GPU appears was interpolating samples that span the clip boundary softening the edge, which the CPU now does also. This commit also includes refactoring of 2D and 3D texture sampling in preparation of adding new extension modes. Differential Revision: https://developer.blender.org/D14295	2022-03-21 16:38:13 +01:00
Brecht Van Lommel	390b9f1305	Fix Cycles HIP assuming warp size 32 In HIP these masks are 64 bit, while in CUDA only 32 bit.	2022-03-16 18:05:48 +01:00
Brecht Van Lommel	076079454f	Cleanup: remove some unused Cycles GPU code To make porting to other architectures easier, clarifying that this does not need to be supported. The unused parallel_reduce implementation assumed warp size 32, but is easy to update if we ever need it in the future.	2022-03-16 18:05:08 +01:00
Ethan-Hall	3902bebf18	Cycles: make smart interpolation fallback to cubic for GPU Matching CPU and Eevee behavior. Differential Revision: https://developer.blender.org/D14296	2022-03-11 18:27:58 +01:00
Brecht Van Lommel	a9a05d5597	Merge branch 'blender-v3.1-release'	2022-02-15 01:05:47 +01:00
Brecht Van Lommel	facd9d8268	Cleanup: clang-format	2022-02-15 01:05:25 +01:00
Brecht Van Lommel	35c261dfcf	Merge branch 'blender-v3.1-release'	2022-02-11 23:58:41 +01:00
Michael Jones	27d3140b13	Cycles: Fix Metal kernel compilation for AMD GPUs Workaround for a compilation issue preventing kernels compiling for AMD GPUs: Avoid problematic use of templates on Metal by making `gpu_parallel_active_index_array` a wrapper macro, and moving `blocksize` to be a macro parameter. Reviewed By: brecht Differential Revision: https://developer.blender.org/D14081	2022-02-11 22:52:48 +00:00

1 2 3

106 Commits