griefith/test

Author	SHA1	Message	Date
Brecht Van Lommel	5152c7c152	Cycles: refactor rays to have start and end distance, fix precision issues For transparency, volume and light intersection rays, adjust these distances rather than the ray start position. This way we increment the start distance by the smallest possible float increment to avoid self intersections, and be sure it works as the distance compared to be will be exactly the same as before, due to the ray start position and direction remaining the same. Fix T98764, T96537, hair ray tracing precision issues. Differential Revision: https://developer.blender.org/D15455	2022-07-15 18:46:24 +02:00
Brecht Van Lommel	bb376da6df	Fix Cycles MetalRT error after recent specialization changes	2022-07-15 18:28:13 +02:00
Michael Jones	da4ef05e4d	Cycles: Apple Silicon optimization to specialize intersection kernels The Metal backend now compiles and caches a second set of kernels which are optimized for scene contents, enabled for Apple Silicon. The implementation supports doing this both for intersection and shading kernels. However this is currently only enabled for intersection kernels that are quick to compile, and already give a good speedup. Enabling this for shading kernels would be faster still, however this also causes a long wait times and would need a good user interface to control this. M1 Max samples per minute (macOS 13.0): PSO_GENERIC PSO_SPECIALIZED_INTERSECT PSO_SPECIALIZED_SHADE barbershop_interior 83.4 89.5 93.7 bmw27 1486.1 1671.0 1825.8 classroom 175.2 196.8 206.3 fishy_cat 674.2 704.3 719.3 junkshop 205.4 212.0 257.7 koro 310.1 336.1 342.8 monster 376.7 418.6 424.1 pabellon 273.5 325.4 339.8 sponza 830.6 929.6 1142.4 victor 86.7 96.4 96.3 wdas_cloud 111.8 112.7 183.1 Code contributed by Jason Fielder, Morteza Mostajabodaveh and Michael Jones Differential Revision: https://developer.blender.org/D14645	2022-07-15 13:40:04 +02:00
Xavier Hallade	47dd42485e	Cycles: fix and enable JIT oneAPI CentOS7 builds for drivers 23570+ The current specific CentOS7 workaround we have for AoT, which is to disable __FAST_MATH__ by using -fhonor-nans, now also fixes the compilation issue for JIT as well since at least driver 23570.	2022-07-12 15:55:32 +02:00
Xavier Hallade	0f50ae131f	Cycles: enable oneAPI in Linux release builds with a very high min-driver version requirement, placeholder until JIT CentOS runtime compilation issue gets fixed in a defined version. min-driver version check can be worked around by setting CYCLES_ONEAPI_ALL_DEVICES environment variable.	2022-07-08 15:39:13 +02:00
Xavier Hallade	190ad73590	Cycles oneAPI: Remove direct dependency on Level-Zero We used it only to access device id for explicitly allowing Arc GPUs. It made the backend require ze_loader.dll which could be problematic if we end up using direct linking. I've replaced filtering based on PCI device id by using other HW properties instead (EUs, threads per EU), that are now available through Level-Zero.	2022-07-06 18:55:38 +02:00
Xavier Hallade	debb233787	Cleanup: fix comments in oneAPI kernel.cpp	2022-07-06 18:55:38 +02:00
Nikita Sirgienko	0df574b55e	Cycles: Improve an occupancy for Intel GPUs Initially oneAPI implementation have waited after each memory operation, even if there was no need for this. Now, the implementation will wait only if it is really necessary - it have improved performance noticeble for some scenes and a bit for the rest of them.	2022-07-06 17:26:23 +02:00
Xavier Hallade	41c10ac84a	Cycles: fix support for multiple Intel GPUs Identical Intel GPUs ended up with the same id. Added PCI BDF to the id to make it unique.	2022-07-01 11:20:00 +02:00
Xavier Hallade	0554537c3c	Cleanup: add missing license headers in Cycles oneAPI implementation	2022-07-01 10:13:07 +02:00
Campbell Barton	b6c28002ac	Cleanup: spelling in comments	2022-06-30 12:14:22 +10:00
Xavier Hallade	a02992f131	Cycles: Add support for rendering on Intel GPUs using oneAPI This patch adds a new Cycles device with similar functionality to the existing GPU devices. Kernel compilation and runtime interaction happen via oneAPI DPC++ compiler and SYCL API. This implementation is primarly focusing on Intel® Arc™ GPUs and other future Intel GPUs. The first supported drivers are 101.1660 on Windows and 22.10.22597 on Linux. The necessary tools for compilation are: - A SYCL compiler such as oneAPI DPC++ compiler or https://github.com/intel/llvm - Intel® oneAPI Level Zero which is used for low level device queries: https://github.com/oneapi-src/level-zero - To optionally generate prebuilt graphics binaries: Intel® Graphics Compiler All are included in Linux precompiled libraries on svn: https://svn.blender.org/svnroot/bf-blender/trunk/lib The same goes for Windows precompiled binaries but for the graphics compiler, available as "Intel® Graphics Offline Compiler for OpenCL™ Code" from https://www.intel.com/content/www/us/en/developer/articles/tool/oneapi-standalone-components.html, for which path can be set as OCLOC_INSTALL_DIR. Being based on the open SYCL standard, this implementation could also be extended to run on other compatible non-Intel hardware in the future. Reviewed By: sergey, brecht Differential Revision: https://developer.blender.org/D15254 Co-authored-by: Nikita Sirgienko <nikita.sirgienko@intel.com> Co-authored-by: Stefan Werner <stefan.werner@intel.com>	2022-06-29 12:58:04 +02:00
Sayak Biswas	abfa09752f	Cycles: enable Vega GPU/APU support Enables Vega and Vega II GPUs as well as Vega APU, using changes in HIP code to support 64-bit waves and a new HIP SDK version. Tested with Radeon WX9100, Radeon VII GPUs and Ryzen 7 PRO 5850U with Radeon Graphics APU. Ref T96740, T91571 Differential Revision: https://developer.blender.org/D15242	2022-06-28 18:35:43 +02:00
Brecht Van Lommel	ff1883307f	Cleanup: renaming and consistency for kernel data * Rename "texture" to "data array". This has not used textures for a long time, there are just global memory arrays now. (On old CUDA GPUs there was a cache for textures but not global memory, so we used to put all data in textures.) * For CUDA and HIP, put globals in KernelParams struct like other devices. * Drop __ prefix for data array names, no possibility for naming conflict now that these are in a struct.	2022-06-20 12:30:48 +02:00
Michael Jones	4412e14708	Cycles: Useful Metal backend debug & profiling functionality This patch adds some useful debugging & profiling env vars to the Metal backend: - `CYCLES_METAL_PROFILING`: output a per-kernel timing report at the end of the render - `CYCLES_METAL_DEBUG`: enable per-dispatch tracing (very verbose) - `CYCLES_DEBUG_METAL_CAPTURE_KERNEL`: enable programatic .gputrace capture for a specified kernel index Here's an example of the timing report with `CYCLES_METAL_PROFILING` enabled: ``` --------------------------------------------------------------------------------------------------- Kernel name Total threads Dispatches Avg. T/D Time Time% --------------------------------------------------------------------------------------------------- integrator_init_from_camera 657,407,232 161 4,083,274 0.24s 0.51% integrator_intersect_closest 1,629,288,440 681 2,392,494 15.18s 32.12% integrator_intersect_shadow 751,652,291 470 1,599,260 5.80s 12.28% integrator_shade_background 304,612,074 263 1,158,220 1.16s 2.45% integrator_shade_surface 1,159,764,041 676 1,715,627 20.57s 43.52% integrator_shade_shadow 598,885,847 418 1,432,741 1.27s 2.69% integrator_queued_paths_array 2,969,650,130 805 3,689,006 0.35s 0.74% integrator_queued_shadow_paths_array 593,936,619 379 1,567,115 0.14s 0.29% integrator_terminated_paths_array 22,205,417 155 143,260 0.05s 0.10% integrator_sorted_paths_array 2,517,140,043 676 3,723,579 1.65s 3.50% integrator_compact_paths_array 648,912,748 155 4,186,533 0.03s 0.07% integrator_compact_states 20,872,687 155 134,662 0.14s 0.29% integrator_terminated_shadow_paths_array 374,100,675 438 854,111 0.16s 0.33% integrator_compact_shadow_paths_array 503,768,657 438 1,150,156 0.05s 0.10% integrator_compact_shadow_states 37,664,941 202 186,460 0.23s 0.50% integrator_reset 25,165,824 6 4,194,304 0.06s 0.12% film_convert_combined_half_rgba 3,110,400 6 518,400 0.00s 0.01% prefix_sum 676 676 1 0.19s 0.40% --------------------------------------------------------------------------------------------------- 6,760 47.27s 100.00% --------------------------------------------------------------------------------------------------- ``` Reviewed By: brecht Differential Revision: https://developer.blender.org/D15044	2022-06-07 11:08:39 +01:00
Brecht Van Lommel	610619c203	Merge branch 'blender-v3.2-release'	2022-05-31 17:35:16 +02:00
Brecht Van Lommel	f2cd7e08fe	Fix Cycles MNEE not working for Metal Move MNEE to own kernel, separate from shader ray-tracing. This does introduce the limitation that a shader can't use both MNEE and AO/bevel, but that seems like the better trade-off for now. We can experiment with bigger kernel organization changes later. Differential Revision: https://developer.blender.org/D15070	2022-05-31 17:24:43 +02:00
Jacques Lucke	25d216724b	Cleanup: make format	2022-05-24 15:53:16 +02:00
Patrick Mours	a8c81ffa83	Cycles: Add half precision float support for volumes with NanoVDB This patch makes it possible to change the precision with which to store volume data in the NanoVDB data structure (as float, half, or using variable bit quantization) via the previously unused precision field in the volume data block. It makes it possible to further reduce memory usage during rendering, at a slight cost to the visual detail of a volume. Differential Revision: https://developer.blender.org/D10023	2022-05-23 19:08:01 +02:00
Sergey Sharybin	698e394e7e	Merge branch 'blender-v3.2-release'	2022-05-23 16:02:52 +02:00
Sergey Sharybin	9bb4bf5748	Fix missing 64bit casts when calculating Cycles render buffer offset Found those missing casts while looking into a crash report made in the Blender Chat. Was unable to reproduce the crash, but the casts should totally be there to avoid integer overflow.	2022-05-23 15:59:52 +02:00
Michael Jones	007184bcf2	Enable inlining on Apple Silicon. Use new process-wide ShaderCache in order to safely re-enable binary archives This patch is the same as D14763, but with a fix for unit test failures caused by ShaderCache fetch logic not working in the non-MetalRT case: ``` diff --git a/intern/cycles/device/metal/kernel.mm b/intern/cycles/device/metal/kernel.mm index ad268ae7057..6aa1a56056e 100644 --- a/intern/cycles/device/metal/kernel.mm +++ b/intern/cycles/device/metal/kernel.mm @@ -203,9 +203,12 @@ bool kernel_has_intersection(DeviceKernel device_kernel) /* metalrt options / request.pipeline->use_metalrt = device->use_metalrt; - request.pipeline->metalrt_hair = device->kernel_features & KERNEL_FEATURE_HAIR; - request.pipeline->metalrt_hair_thick = device->kernel_features & KERNEL_FEATURE_HAIR_THICK; - request.pipeline->metalrt_pointcloud = device->kernel_features & KERNEL_FEATURE_POINTCLOUD; + request.pipeline->metalrt_hair = device->use_metalrt && + (device->kernel_features & KERNEL_FEATURE_HAIR); + request.pipeline->metalrt_hair_thick = device->use_metalrt && + (device->kernel_features & KERNEL_FEATURE_HAIR_THICK); + request.pipeline->metalrt_pointcloud = device->use_metalrt && + (device->kernel_features & KERNEL_FEATURE_POINTCLOUD); { thread_scoped_lock lock(cache_mutex); @@ -225,9 +228,9 @@ bool kernel_has_intersection(DeviceKernel device_kernel) / metalrt options / bool use_metalrt = device->use_metalrt; - bool metalrt_hair = device->kernel_features & KERNEL_FEATURE_HAIR; - bool metalrt_hair_thick = device->kernel_features & KERNEL_FEATURE_HAIR_THICK; - bool metalrt_pointcloud = device->kernel_features & KERNEL_FEATURE_POINTCLOUD; + bool metalrt_hair = use_metalrt && (device->kernel_features & KERNEL_FEATURE_HAIR); + bool metalrt_hair_thick = use_metalrt && (device->kernel_features & KERNEL_FEATURE_HAIR_THICK); + bool metalrt_pointcloud = use_metalrt && (device->kernel_features & KERNEL_FEATURE_POINTCLOUD); MetalKernelPipeline best_pipeline = nullptr; for (auto &pipeline : collection) { ``` Reviewed By: brecht Differential Revision: https://developer.blender.org/D14923	2022-05-11 16:20:59 +01:00
Brecht Van Lommel	52a5f68562	Revert "Cycles: Enable inlining on Apple Silicon for 1.1x speedup" This reverts commit `b82de02e7c`. It is causing crashes in various regression tests. Ref D14763	2022-04-28 00:46:43 +02:00
Michael Jones	b82de02e7c	Cycles: Enable inlining on Apple Silicon for 1.1x speedup This is a stripped down version of D14645 without the scene specialisation optimisations. The two major changes in this patch are: - Enables more aggressive inlining on Apple Silicon resulting in a 1.1x speedup and 10% reduction in spill, at the cost of longer pipeline build times - Revival of shader binary archives through a new ShaderCache which is shared between MetalDevice instances using the same physical MTLDevice. This mitigates the extra compile times via explicit caching (rather than, as before, relying on the implicit system shader cache which can be purged without notice) Reviewed By: brecht Differential Revision: https://developer.blender.org/D14763	2022-04-26 22:17:16 +01:00
Brecht Van Lommel	8d2da45f98	Revert "Fix Cycles HIP assuming warp size 32" This reverts commit `390b9f1305`. It seems to break things on Linux for unknown reasons, so leave it out for now. A solution to this will be required for Vega cards though.	2022-04-20 18:09:23 +02:00
Stefan Werner	65dcb5ebd3	Cycles: Semantically separate 2D and 3D texture objects Currently there are no functional changes. Preparing for an upcoming oneAPI integration where such separation in types is needed.	2022-04-01 19:44:31 +02:00
Stefan Werner	9c6dff70c8	Cycles: Introduce postfix for kernel body definition Increases flexibility of code-generation for kernel entry points. Currently no functional changes, preparing for integration with oneAPI.	2022-04-01 19:44:02 +02:00
Brecht Van Lommel	51380b9346	Fix Cycles Metal build error and GCC warning after recent changes Function overloading of make_float4() doesn't work since it's a macro, just don't do this minor cleanup then.	2022-03-23 23:25:31 +01:00
Kévin Dietrich	d84b4becd3	Fix compile error on GCC Explicit template specialization has to happen outside of class definition (some compilers are more lenient). Since it is not possible to specialize the method without also specializing the enclosing class for all of its possible types, the method is moved outside of the class, and specialized there.	2022-03-23 22:01:32 +01:00
Ethan-Hall	4e56e738a8	Cycles: optimize CPU texture sampler interpolation Use templates to optimize the CPU texture sampler to interpolate using float for single component datatypes instead of using float4 for all types. Differential Revision: https://developer.blender.org/D14424	2022-03-23 20:06:12 +01:00
Ethan-Hall	4abb8a14a2	Cycles: make 3D texture sampling at boundaries more similar to GPU CPU code for cubic interpolation with clip texture extension only performed texture interpolation inside the range of [0,1]. As a result, even though the volume's color is sampled using cubic interpolation, the boundary is not being interpolated. The GPU appears was interpolating samples that span the clip boundary softening the edge, which the CPU now does also. This commit also includes refactoring of 2D and 3D texture sampling in preparation of adding new extension modes. Differential Revision: https://developer.blender.org/D14295	2022-03-21 16:38:13 +01:00
Brecht Van Lommel	390b9f1305	Fix Cycles HIP assuming warp size 32 In HIP these masks are 64 bit, while in CUDA only 32 bit.	2022-03-16 18:05:48 +01:00
Brecht Van Lommel	076079454f	Cleanup: remove some unused Cycles GPU code To make porting to other architectures easier, clarifying that this does not need to be supported. The unused parallel_reduce implementation assumed warp size 32, but is easy to update if we ever need it in the future.	2022-03-16 18:05:08 +01:00
Ethan-Hall	3902bebf18	Cycles: make smart interpolation fallback to cubic for GPU Matching CPU and Eevee behavior. Differential Revision: https://developer.blender.org/D14296	2022-03-11 18:27:58 +01:00
Brecht Van Lommel	a9a05d5597	Merge branch 'blender-v3.1-release'	2022-02-15 01:05:47 +01:00
Brecht Van Lommel	facd9d8268	Cleanup: clang-format	2022-02-15 01:05:25 +01:00
Brecht Van Lommel	35c261dfcf	Merge branch 'blender-v3.1-release'	2022-02-11 23:58:41 +01:00
Michael Jones	27d3140b13	Cycles: Fix Metal kernel compilation for AMD GPUs Workaround for a compilation issue preventing kernels compiling for AMD GPUs: Avoid problematic use of templates on Metal by making `gpu_parallel_active_index_array` a wrapper macro, and moving `blocksize` to be a macro parameter. Reviewed By: brecht Differential Revision: https://developer.blender.org/D14081	2022-02-11 22:52:48 +00:00
Brecht Van Lommel	9cfc7967dd	Cycles: use SPDX license headers * Replace license text in headers with SPDX identifiers. * Remove specific license info from outdated readme.txt, instead leave details to the source files. * Add list of SPDX license identifiers used, and corresponding license texts. * Update copyright dates while we're at it. Ref D14069, T95597	2022-02-11 17:47:34 +01:00
William Leeson	ae44070341	Cycles: explicitly skip self-intersection Remember the last intersected primitive and skip any intersections with the same primitive. Ref D12954	2022-01-26 17:51:05 +01:00
Brecht Van Lommel	d68ce0e475	Cycles: add pointcloud implementation for Metal RT This is not currently working, with an internal compiler error. However we are currently using BVH2 instead of Metal RT. So this has no effect for users, it's being committed to avoid the code getting outdated. Ref T92573, T92212 Differential Revision: https://developer.blender.org/D13632	2022-01-21 14:42:27 +01:00
Brecht Van Lommel	0cf2fafd81	Fix T94050, T94570, T94527: Cycles Bevel and AO nodes not working with Metal Workaround what may be a compiler bug, solution found by Michael Jones.	2022-01-13 10:40:41 +01:00
Michael Jones	efe3d60a2c	Cycles: Fix Metal build This patch fixes a couple of new Metal kernel compilation errors: 1) a kernel parameter count overflow, and 2) missing address space qualifiers. Reviewed By: brecht Differential Revision: https://developer.blender.org/D13763	2022-01-07 16:19:31 +00:00
Patrick Mours	8393ccd076	Cycles: Add OptiX temporal denoising support Enables the `bpy.ops.cycles.denoise_animation()` operator again and modifies it to support temporal denoising with OptiX. This requires renders that were done with both the "Vector" and "Denoising Data" passes. Differential Revision: https://developer.blender.org/D11442	2022-01-05 15:58:36 +01:00
Brecht Van Lommel	e2e7f7ea52	Fix Cycles OptiX crash with 3D curves after point cloud changes Includes refactoring to reduce the number of bits taken by primitive types, so they more easily fit in the OptiX limit.	2021-12-20 14:14:43 +01:00
Brecht Van Lommel	35b1e9fc3a	Cycles: pointcloud rendering This add support for rendering of the point cloud object in Blender, as a native geometry type in Cycles that is more memory and time efficient than instancing sphere meshes. This can be useful for rendering sand, water splashes, particles, motion graphics, etc. Points are currently always rendered as spheres, with backface culling. More shapes are likely to be added later, but this is the most important one and can be customized with shaders. For CPU rendering the Embree primitive is used, for GPU there is our own intersection code. Motion blur is suppored. Volumes inside points are not currently supported. Implemented with help from: * Kévin Dietrich: Alembic procedural integration * Patrick Mourse: OptiX integration * Josh Whelchel: update for cycles-x changes Ref T92573 Differential Revision: https://developer.blender.org/D9887	2021-12-16 20:54:04 +01:00
Michael Jones	9558fa5196	Cycles: Metal host-side code This patch adds the Metal host-side code: - Add all core host-side Metal backend files (device_impl, queue, etc) - Add MetalRT BVH setup files - Integrate with Cycles device enumeration code - Revive `path_source_replace_includes` in util/path (required for MSL compilation) This patch also includes a couple of small kernel-side fixes: - Add an implementation of `lgammaf` for Metal [Nemes, Gergő (2010), "New asymptotic expansion for the Gamma function", Archiv der Mathematik](https://users.renyi.hu/~gergonemes/) - include "work_stealing.h" inside the Metal context class because it accesses state now Ref T92212 Reviewed By: brecht Maniphest Tasks: T92212 Differential Revision: https://developer.blender.org/D13423	2021-12-07 15:52:21 +00:00
Campbell Barton	ac447ba1a3	Cleanup: clang-format, trailing space	2021-11-30 10:15:17 +11:00
Brecht Van Lommel	4fac3be146	Fix Cycles OptiX doing a bit too much work for almost opaque curve shadows Found in D13353, likely has no significant impact in performance.	2021-11-29 18:41:37 +01:00
Michael Jones	f613c4c095	Cycles: MetalRT support (kernel side) This patch adds MetalRT support to Cycles kernel code. It is mostly additive in nature or confined to Metal-specific code, however there are a few areas where this interacts with other code: - MetalRT closely follows the Optix implementation, and in some cases (notably handling of transforms) it makes sense to extend Optix special-casing to MetalRT. For these generalisations we now have `__KERNEL_GPU_RAYTRACING__` instead of `__KERNEL_OPTIX__`. - MetalRT doesn't support primitive offsetting (as with `primitiveIndexOffset` in Optix), so we define and populate a new kernel texture, `__object_prim_offset`, containing per-object primitive / curve-segment offsets. This is referenced and applied in MetalRT intersection handlers. - Two new BVH layout enum values have been added: `BVH_LAYOUT_METAL` and `BVH_LAYOUT_MULTI_METAL_EMBREE` for XPU mode). Some host-side enum case handling has been updated where it is trivial to do so. Ref T92212 Reviewed By: brecht Maniphest Tasks: T92212 Differential Revision: https://developer.blender.org/D13353	2021-11-29 15:20:26 +00:00

1 2

94 Commits