griefith/test

Author	SHA1	Message	Date
Weizhen Huang	1667d69d3b	Cleanup: Cycles: use `constexpr` in kernel instead of lambda and macro guard. Should be possible after `ce0ae95ed3` Pull Request: https://projects.blender.org/blender/blender/pulls/143723	2025-08-01 14:06:13 +02:00
Hugh Delaney	930a942dd0	Refactor: Cycles: Move block sizes into common header This change puts all the block size macros in the same common header, so they can be included in host side code without needing to also include the kernels that are defined in the device headers that contained these values. This change also removes a magic number used to enqueue a kernel, which happened to agree with the GPU_PARALLEL_SORT_BLOCK_SIZE macro. Pull Request: https://projects.blender.org/blender/blender/pulls/143646	2025-08-01 13:26:02 +02:00
Patrick Mours	6487395fa5	Cycles: Add linear curve shape Add new "Linear 3D Curves" option in the Curves panel in the render properties. This renders curves as linear segments rather than smooth curves, for faster render time at the cost of accuracy. On NVIDIA Blackwell GPUs, this can give a 6x speedup compared to smooth curves, due to hardware acceleration. On NVIDIA Ada there is still a 3x speedup, and CPU and other GPU backends will also render this faster. A difference with smooth curves is that these have end caps, as this was simpler to implement and they are usually helpful anyway. In the future this functionality will also be used to properly support the CURVE_TYPE_POLY on the new curves object. Pull Request: https://projects.blender.org/blender/blender/pulls/139735	2025-07-29 17:05:01 +02:00
Michael Jones	f3485cc925	Cycles: MetalRT: Only use extended limits if needed (revisited) Currently MetalRT renders always use extended limits, which is needed to correctly render scenes where the max primitive count can exceed 2^28 or the instance count can exceed 2^24. This patch adopts Metal best practices of only enabling this flag if it is needed. This PR is similar to #133364, but there are some notable differences: 1) The old PR made an overly optimistic assumption that all the relevant visibility bits could be squeezed into 8 bits. This new PR adopts the same approach that Optix takes of using 8 bits as a primary HW filter, and checking the full 32 bit mask inside the SW intersection handler. ~~2) I moved the scene scanning check from Scene into MetalDevice. This avoids platform specific details leaking into platform agnostic areas.~~ ~~3) In live viewport mode, we always use extended limits in case we tip over the threshold.~~ _EDIT:_ 2) The limits are scanned in `Scene::update_kernel_features`, and given to the device by a new `set_bvh_limits` method which returns true if the BVH and kernels need to be reloaded. Pull Request: https://projects.blender.org/blender/blender/pulls/142401	2025-07-24 13:27:20 +02:00
Thomas Dinges	ce0ae95ed3	Cycles: Bump minimum supported CUDA architecture to sm_50 Pull Request: https://projects.blender.org/blender/blender/pulls/142212	2025-07-21 19:49:21 +02:00
Nikita Sirgienko	9875836519	Cycles: oneAPI: Compile only needed device binaries in multi-GPU case The code of the "oneapi_load_kernels" function before this modification was loading kernels and compiling them, if needed, for all devices in the associated GPU context. This makes sense for one GPU execution scenario, as well as for execution scenario of multi identical GPU, but in cases where Blender users have several different GPUs in render, the previous implementation would compile all kernels for all devices for each device, unnecessarily doing the same work multiple times. Because of this, I am changing the implementation so that now compilation happens only for the used device per used device, ensuring that no unnecessary work is done. No render performance changes are expected.	2025-07-19 14:15:36 +02:00
Michael Jones	8077384e3a	Cycles: Improve Metal kernel specialisation This improves the existing scene specialisation mechanism by replacing "kernel_data.kernel_features" with a function constant. It doesn't cause any additional compilation requests, but allows the backend compiler to eliminate more dead code. An additional compiler hint is provided for dead-stripping "volume_stack_enter_exit" which results in slightly faster rendering of non-volumetric scenes. Pull Request: https://projects.blender.org/blender/blender/pulls/142235	2025-07-18 11:18:43 +02:00
Brecht Van Lommel	4c25b49875	Refactor: Cycles: Deduplicate 3D texture sampling between devices Pull Request: https://projects.blender.org/blender/blender/pulls/132908	2025-07-09 21:04:38 +02:00
Brecht Van Lommel	b6c4233b28	Refactor: Cycles: Remove now unused 3D image texture support Pull Request: https://projects.blender.org/blender/blender/pulls/132908	2025-07-09 21:04:38 +02:00
Brecht Van Lommel	7978799e6f	Cycles: Always render volume as NanoVDB All GPU backends now support NanoVDB, using our own kernel side code that is easily portable. This simplifies kernel and device code. Volume bounds are now built from the NanoVDB grid instead of OpenVDB, to avoid having to keep around the OpenVDB grid after loading. While this reduces memory usage, it does have a performance impact, particularly for the Cubic filter. That will be addressed by another commit. Pull Request: https://projects.blender.org/blender/blender/pulls/132908	2025-07-09 21:04:38 +02:00
Michael Jones	b4be954856	Cycles: Simplify Metal backend with direct bindless resource encoding This PR is a more extensive follow on from #123551 (removal of AMD and Intel GPU support). All supported Apple GPUs have Metal 3 and tier 2 argument buffer support. The invariant resource properties `gpuAddress` and `gpuResourceID` can be written directly into GPU structs once at setup time rather than once per dispatch. More background info can be found in [this article](https://developer.apple.com/documentation/metal/improving-cpu-performance-by-using-argument-buffers?language=objc). Code changes: - All code relating to `MTLArgumentEncoder` is removed - `KernelParamsMetal` updates are directly written into `id<MTLBuffer> launch_params_buffer` which is used for the "static" dispatch arguments - Dynamic dispatch arguments are small enough to be encoded using the `MTLComputeCommandEncoder.setBytes` function, eliminating the need for cycling temporary arg buffers Pull Request: https://projects.blender.org/blender/blender/pulls/140671	2025-07-08 23:20:16 +02:00
Weizhen Huang	2f7797dd4d	Merge branch 'blender-v4.5-release'	2025-06-20 14:20:00 +02:00
weizhen	bf9836da65	Fix: Cycles not building with OptiX 9.0 As suggested by @pmoursnv Was throwing errors like `identifier "half" is undefined`. Pull Request: https://projects.blender.org/blender/blender/pulls/140676	2025-06-20 14:19:43 +02:00
Brecht Van Lommel	7f380e0644	Revert "Fix: Cycles: Do not count volume bounds bounce as transparent" This reverts commit `23c762e388` in the blender-v4.5-release branch to work around HIP compiler issues. It will remain in the main branch. Ref blender/blender#139836	2025-06-11 15:47:07 +02:00
Brecht Van Lommel	04e325029f	Revert "Cycles: Guiding cleaning up and refactoring the guiding code" This reverts commit `5abf42012d` in the blender-v4.5-release branch to work around HIP compiler issues. It will remain in the main branch. Ref blender/blender#139836	2025-06-11 15:47:06 +02:00
Brecht Van Lommel	501b4641f6	Revert "Cleanup: Unused arguments in Cycles kernel" This reverts commit `0e7a696819` in the blender-v4.5-release branch to work around HIP compiler issues. It will remain in the main branch. Ref blender/blender#139836	2025-06-11 15:47:06 +02:00
Campbell Barton	07121d44ae	Cleanup: use braces (follow own style guide)	2025-06-11 09:05:26 +00:00
Lukas Stockner	39d7576844	Cycles: Switch OptiX OSL to use LLVM bitcode for shadeops This is required to make ray differentials work correctly for OSL custom cameras. But it also lets us simplify the implementation, and makes the OSL functionality more complete, such as implementing all noise types. Pull Request: https://projects.blender.org/blender/blender/pulls/138161	2025-06-03 20:12:07 +02:00
Nikita Sirgienko	69091c5028	Cycles: Show device optimizations status in preferences for oneAPI With these changes, we can now mark devices which are expected to work as performant as possible, and devices which were not optimized for some reason. For example, because the device was released after the Blender release, making it impossible for developers to optimize for devices in already released unchangeable code. This is primarily relevant for the LTS versions, which are supported for two years and require proper communication about optimization status for the new devices released during this time. This is implemented for oneAPI devices. Other device types currently are marked as optimized for compatibility with old behavior, but may implement the same in the future. Pull Request: https://projects.blender.org/blender/blender/pulls/139751	2025-06-03 20:07:52 +02:00
Brecht Van Lommel	0e7a696819	Cleanup: Unused arguments in Cycles kernel And add back the compiler flag that hid them. Pull Request: https://projects.blender.org/blender/blender/pulls/139497	2025-05-27 21:30:45 +02:00
Sebastian Herholz	5abf42012d	Cycles: Guiding cleaning up and refactoring the guiding code In detail: - Direct accesses of state attributes are replaced with the INTEGRATOR_STATE and INTEGRATOR_STATE_WRITE macros. - Unified the checks for the __PATH_GUIDING define to use # if defined (__PATH_GUIDING__). - Even if __PATH_GUIDING__ is defined, we now check if the feature is enabled using if ((kernel_data.kernel_features & KERNEL_FEATURE_PATH_GUIDING)) {. This is important for later GPU ports. - The kernel usage of the guiding field, surface, and volume sampling distributions is wrapped behind macros for each specific device (atm only CPU). This will make it easier for a GPU port later.	2025-05-22 13:46:30 +02:00
Nikita Sirgienko	54766b6a54	Cycles: Introducing the code for adoption of Embree 4.4 Embree 4.4 introduces an improvement in the Embree GPU implementation by dropping shared memory usage in favor of direct controllable memory transfers. This should allow addressing several problems spotted in Blender regarding multithreading and memory corruption when BVH and rendering happen at the same time. However, to implement such improvements, the API has changed for several functions, and this commit adopts Blender code to these changes, making Blender buildable and functional with all existing Embree 4.X versions, before and after 4.4. No functional changes in Blender behavior are expected if using Embree versions below 4.4. Pull Request: https://projects.blender.org/blender/blender/pulls/139061	2025-05-19 11:25:50 +02:00
Brecht Van Lommel	2c99edbffa	Cycles: Bump Embree minimum version to 4.0.0 The build is already failing with Embree 3, as noticed in #137556. And Embree 4 was released 2 years ago. Pull Request: https://projects.blender.org/blender/blender/pulls/138221	2025-04-30 19:50:14 +02:00
Lukas Stockner	0dc4754da4	Cycles: Move OptiX OSL Camera kernel into its own PTX module On the one hand, this improves initialization time since we don't need to load/compile the full OSL module with all the shading logic if we're only using a custom camera with SVM shading. On the other hand, it also fixes a bug I noticed while preparing test scenes: The AO and Bevel nodes don't work when using custom cameras with SVM on OptiX. The issue there is that those two are handled by the SHADE_SURFACE_RAYTRACE kernel, but since that one has intersection logic, we use the OptiX-specific kernel even if OSL shading is disabled. However, with the previous unified OSL module, this would mean loading SHADE_SURFACE_RAYTRACE from kernel_osl.cu, which has `#undef __SVM__` and therefore doesn't handle them correctly. With this change, we'll use the kernels from kernel_shader_raytrace.cu in that case, which do support SVM nodes just fine. Disk usage of the new kernel_optix_osl_camera.ptx.zst file is 30KB, so this also doesn't blow up the kernel disk size (and kernel_optix_osl.ptx.zst is probably smaller by that amount now). Since it seems that we can mix modules just fine, I'm suspecting that we could split the modules properly (intersection, SVM shading with raytracing, OSL shading, OSL camera), instead of the current approach where modules essentially correspond to feature set tiers and each includes the previous one's kernels as well - but that's a separate refactor. Pull Request: https://projects.blender.org/blender/blender/pulls/138021	2025-04-28 12:49:35 +02:00
Campbell Barton	682e5e3597	Cleanup: spelling in comments (make check_spelling_*)	2025-04-26 00:48:04 +00:00
Lukas Stockner	bf412ed9dd	Cycles: Support for custom OSL cameras This allows users to implement arbitrary camera models using OSL by writing shaders that take an image position as input and compute ray origin and direction. The obvious applications for this are e.g. panorama modes, lens distortion models and realistic lens simulation, but the possibilities are endless. Currently, this is only supported on devices with OSL support, so CPU and OptiX. However, it is independent from the shading model used, so custom cameras can be used without getting the performance hit of OSL shading. A few samples are provided as Text Editor templates. One notable current limitation (in addition to the limited device support) is that inverse mapping is not supported, so Window texture coordinates and the Vector pass will not work with custom cameras. Pull Request: https://projects.blender.org/blender/blender/pulls/129495	2025-04-25 19:27:30 +02:00
Weizhen Huang	23c762e388	Fix: Cycles: Do not count volume bounds bounce as transparent In forward path tracing, when we pass volume bounding meshes, we accumulate `volume_bounds_bounce`. We should match this behaviour in NEE instead of accumulating `transparent_bounce`. Pull Request: https://projects.blender.org/blender/blender/pulls/137556	2025-04-24 13:10:33 +02:00
Sergey Sharybin	36559fd89f	Fix #136811 : HIP-RT performance regression in 4.5 Reduce the register pressure and branching in the switch() by using subclass and cast from void* to the base class. This ensures intersection functions are not inlined multiple times, bringing performance back. Alternative could be to avoid functions (they are quite large) but that only partially resolves the performance regression. Pull Request: https://projects.blender.org/blender/blender/pulls/136823	2025-04-01 17:59:44 +02:00
Campbell Barton	42ad772a1f	Cleanup: spelling & repeated terms (make check_spelling_*) Also use comment blocks for English text.	2025-03-27 01:13:34 +00:00
Sergey Sharybin	2ab231d802	Refactor: Pass proper KernelGlobals HIP-RT functions do have access to kg, and it was used inconsistently: some functions were passed actual kg, other were passed nullptr. This change makes it consistent and passes kg everywhere. Pull Request: https://projects.blender.org/blender/blender/pulls/136503	2025-03-26 11:07:06 +01:00
Sergey Sharybin	709371b278	Refactor: Avoid creation of local copy of RaySelfPrimitives	2025-03-26 11:07:04 +01:00
Sergey Sharybin	888c7e1df9	Cleanup: Avoid redundant data fetch	2025-03-26 11:07:04 +01:00
Sergey Sharybin	3d882acee2	Cleanup: Else after return	2025-03-26 11:07:04 +01:00
Sergey Sharybin	b2dd523d0d	Cleanup: Avoid default hit initialization The entire object is assigned later on, no need to initialize it.	2025-03-26 11:07:04 +01:00
Sergey Sharybin	323e27d825	Cleanup: Remove redundant assignment The payload stores pointers, no need to restore pointer of the function argument to the same value.	2025-03-26 11:07:04 +01:00
Sergey Sharybin	e92a8042c3	Refactor: Payload for shadow intersection and filter in HIP-RT The code before this change was relying on the ShadowPayload have the same "header" as RayPayload for some of the primitive types (curve, motion triangle, point): intersection functions were shared between "regular" and shadow rays (shadow in this case is shadow_all), but extra filter function was used for shadow rays. This is fragile if someone changes one of these structures. What is worse is that compiler might actually decide to shuffle things in some structs, or remove unused fields. This change also solves confusion about ShadowPayload::prim_type seemingly only being assigned to PRIMITIVE_NONE. With time it is not impossible that compiler will also see this, and constant-fold some checks, or even remove the field. If that happens then the render result will be wrong. Maybe it is already happening as there are some GPU and driver and optimization flag specific bugs in the area. It is unclear whether it was causing any actual problem: W7800 seems to render all hair correctly on Linux.	2025-03-26 11:07:04 +01:00
Sergey Sharybin	cdb3f34944	Cleanup: Use full name for the primitive_type Makes it extra clear locally type of what the variable contains: primitive, ray, or something else.	2025-03-26 11:07:04 +01:00
Sergey Sharybin	72542f3bb4	Cleanup: Follow Blender style and use more const Also make some style decisions more consistent: for example, the way how stop/continue search return value is commented. Prefer lower vertical space for those.	2025-03-26 11:07:04 +01:00
Sergey Sharybin	bf9c95f164	Cleanup: Move payload type cast to caller in HIP-RT Mainly readability purposes: - Having variables called local_payload is ambiguous: does it refer to LocalPayload type or to a variable be local in a function? - Some of the functions are used for different ray types, so having the type case in intersectFunc and filterFunc makes it easier to scan. For the latter: now it is more obvious that Curve_Intersect_Shadow expects RayPayload, but Curve_Filter_Shadow expects ShadowPayload. It might not be a problem currently as ShadowPayload has the same "header" RayPayload, but it might change in the future. Also, compiler might optimize fields out from one but not from the other.	2025-03-26 11:07:04 +01:00
Sergey Sharybin	3daaf21bab	Cleanup: Remove unused function argument in HIP-RT	2025-03-26 11:07:04 +01:00
Sergey Sharybin	5ce4e91a80	Fix #136319 : Incorrect transparent bounce count with spatial splits The transparent bounce test was too optimistic in regards to the intersection being considered. The check needs to happen after it has been validated that it is not duplicate. It was already the case for Metal and HIP-RT, but not for Embree and BVH2. Tests updated by: Alaska <Alaskayou01@gmail.com> Pull Request: https://projects.blender.org/blender/blender/pulls/136325	2025-03-22 04:51:42 +01:00
Sergey Sharybin	50180283e9	Fix #117527 : Spatial split leads to artifacts on transparent shadows The reason for this to happen is because when spatial split is used the same intersection could be recorded twice (via different BVH nodes). This change introduces check for the intersection being already recoded, similar to the check in the local BVH. The check is done during BVH intersection which allows to properly ignore intersections even for the maximum bounce number check. A faster approach would be to do such filtering after sorting, but then we can not keep bounce check in the BVH code consistent with and without spatial splits. Intuitively it seems that it should be possible to merge the new loop with the one that checks for which intersection to keep. But it is not so trivial in practice: it doesn't run for all intersections, and also it is formulated in a way that updates isect_index for the next record. Pull Request: https://projects.blender.org/blender/blender/pulls/136251	2025-03-21 13:56:50 +01:00
Sergey Sharybin	bf65b64708	Refactor: De-duplicate local intersection reservoir sampling logic The code which was checking whether local intersection is to be recorded, and under which index was duplicated for triangles, motion triangles, and HIP-RT triangle filter function. This change moves the common logic to an utility function which is reused from all the places mentioned above. Pull Request: https://projects.blender.org/blender/blender/pulls/136244	2025-03-20 17:19:31 +01:00
Sergey Sharybin	7165146fb2	Cleanup: More spelling fixes in comments	2025-03-20 10:37:09 +01:00
Sergey Sharybin	ae4f6026dc	Cleanup: Spelling in comments	2025-03-20 10:36:12 +01:00
Bastien Montagne	dd98cede18	Merge branch 'blender-v4.4-release'	2025-03-14 18:20:26 +01:00
Sahar A. Kashi	9ad3b74867	Fix: SSS and Motion Blur or Curves not working on HIP-RT This change fixes the remaining failing tests with SSS when using HIP-RT. This includes crash when SSS is used on curves, and objects with motion blur and SSS rendering black. The root cause for both cases was the fact that traversal was always assuming regular BVH (built for triangles), while curves and motion triangles are using custom primitives, which requires specialized BVH traversal. This change includes: - Early output from `scene_intersect_local()` for non-triangle and non-motion-triangle primitives. This fixes `sss_hair.blend` test, and also avoids unnecessary BVH traversal when the local intersection is requested from curve object. The same early-output could be added to other BVH traversal implementation. - Use `hiprtGeomCustomTraversalAnyHitCustomStack` for motion triangles primitives. This fixes motion blur on objects with SSS render black. Fixes #135856 Co-authored-by: Sahar A. Kashi <sahar.alipourkashi@amd.com> Co-authored-by: Sergey Sharybin <sergey@blender.org> Pull Request: https://projects.blender.org/blender/blender/pulls/135943	2025-03-14 18:17:54 +01:00
Sergey Sharybin	977a334f6f	Merge branch 'blender-v4.4-release'	2025-03-12 19:24:01 +01:00
Sergey Sharybin	a3eb0faa3f	Fix: Incorrect ray time used for HIP-RT local intersections It was always hard-coded to be 0. It does not seem to result in any extra tests passing, but they are probably not sophisticated enough. Noticed while looking into details for the #135856. Pull Request: https://projects.blender.org/blender/blender/pulls/135878	2025-03-12 19:23:38 +01:00
Xavier Hallade	90a10dcd50	Cycles: Adjust inlining attributes for oneAPI device Now ccl_device sets inlining and ccl_device_inline forces inlining. This matches more closely with what is currently done for cuda and metal backends. I've measured from 1% to 6% overall performance improvement in rendering benchmark scenes on Arc B580, as well as a small decrease in compile time.	2025-03-03 18:20:02 +01:00

1 2 3 4 5 ...

329 Commits