Reduce the register pressure and branching in the switch() by using
subclass and cast from void* to the base class.
This ensures intersection functions are not inlined multiple times,
bringing performance back.
Alternative could be to avoid functions (they are quite large) but
that only partially resolves the performance regression.
Pull Request: https://projects.blender.org/blender/blender/pulls/136823
Instead of relying on the Intel extensions that may not be implemented,
we can use max_work_group_size until there is a better alternative.
Thanks to Codeplay for this proposal.
Co-authored-by: Georgi Mirazchiyski <georgi.mirazchiyski@codeplay.com>
Area light with a size of zero should not contribute to the scene, so
set the light as disabled.
This does not only fix the reported bug where such light is visible to
the camera, but also a regression in 4.2 where the light contributes to
the scene when light tree is off.
Pull Request: https://projects.blender.org/blender/blender/pulls/136763
This API is not properly implemented in other SYCL backends at the
moment and we don't want it to fail at runtime, so we conservatively
enable it only for Level-Zero.
The initial issues that led to the choice of forcing the use of
linker.exe seem gone and there is currently no strong reason to use
linker.exe explicitly, so let's simplify and use the default setting.
This makes it available in Cycles standalone, and the implementation
can be shared with Blender. This also makes it possible to compute
tangents after tessellation for adaptive subdivision.
There is a difference in UV map tangents when there are no UVs. They
are now generated from object space coordinates instead of auto
texture space coordinates. This is more efficient, and a corner case
that we don't have to keep compatible.
Co-authored-by: Brecht Van Lommel <brecht@blender.org>
Pull Request: https://projects.blender.org/blender/cycles/pulls/25
Add support for OSL parameter metadata named `defaultgeomprop`, whose
values are interpreted the same way as the property on MaterialX node
inputs. When set to `Tworld` the tangent is then automatically linked
to the shader and generated for the mesh.
Pull Request: https://projects.blender.org/blender/cycles/pulls/25
The initial limitation preventing from using -ffast-math, worked around
in 09df1f4caf, got fixed upstream in LLVM
and the fix is part of current DPC++ compiler:
63ecd2a725
We're now able to go back to using -ffast-math, which helps simplifying
the set of compiler flags.
No performance nor conformance change is expected from this change (most
of the gain is achieved already with the use of -cl-fast-relaxed-math
since 284b89a0a3) and this has been
verified on Arc B580 under Windows.
Instead of returning 0 in case the Intel extension for getting the count
of Execution Units isn't available, we now use
sycl::info::device::max_compute_units.
We keep using the Intel extension in priority since it logically goes
with sycl::ext::intel::info::device::gpu_hw_threads_per_eu used in
get_max_num_threads_per_multiprocessor(), for which there is no
sycl::info::device::max_threads_per_compute_unit replacement yet.
The debug set of Embree prebuilt libraries currently lacks SYCL support
while the release ones have it.
This case was not gracefully handled for debug builds with Embree on GPU
enabled, leading to linking errors, trying to resolve rtcNewSYCLDevice
and rtcIsSYCLDeviceSupported.
We now test for this case to explicitly disable the use of Embree on GPU
for debug builds on Windows and print this status from CMake.
This fixes the following warning with MSVC:
device_impl.cpp(287): warning C4805: '|=': unsafe mix of type 'bool' and type 'ccl::uint' in operation
The similar fix is applied to Metal code as well.
There is no short-circuiting boolean operator ||=, so expand the expression.
Pull Request: https://projects.blender.org/blender/blender/pulls/136561
HIP-RT functions do have access to kg, and it was used inconsistently:
some functions were passed actual kg, other were passed nullptr.
This change makes it consistent and passes kg everywhere.
Pull Request: https://projects.blender.org/blender/blender/pulls/136503
The code before this change was relying on the ShadowPayload have
the same "header" as RayPayload for some of the primitive types
(curve, motion triangle, point): intersection functions were shared
between "regular" and shadow rays (shadow in this case is shadow_all),
but extra filter function was used for shadow rays.
This is fragile if someone changes one of these structures. What is
worse is that compiler might actually decide to shuffle things in
some structs, or remove unused fields.
This change also solves confusion about ShadowPayload::prim_type
seemingly only being assigned to PRIMITIVE_NONE. With time it is
not impossible that compiler will also see this, and constant-fold
some checks, or even remove the field. If that happens then the
render result will be wrong. Maybe it is already happening as there
are some GPU and driver and optimization flag specific bugs in the
area.
It is unclear whether it was causing any actual problem: W7800
seems to render all hair correctly on Linux.
Also make some style decisions more consistent: for example,
the way how stop/continue search return value is commented.
Prefer lower vertical space for those.
Mainly readability purposes:
- Having variables called local_payload is ambiguous: does it refer to
LocalPayload type or to a variable be local in a function?
- Some of the functions are used for different ray types, so having the
type case in intersectFunc and filterFunc makes it easier to scan.
For the latter: now it is more obvious that Curve_Intersect_Shadow
expects RayPayload, but Curve_Filter_Shadow expects ShadowPayload.
It might not be a problem currently as ShadowPayload has the same
"header" RayPayload, but it might change in the future. Also, compiler
might optimize fields out from one but not from the other.
There is a known precision bug in the current HIP compiler version (RDNA2 family/Windows) that has already been fixed and will be available in
a future HIP SDK release. Enabling more precise math prevents the artifacts.
This may cause a 5-10% performance drop in some scenes.
Fix#136138: Microfacet BSDF
Fix#136449: Hair BSDF
Pull Request: https://projects.blender.org/blender/blender/pulls/136341
The new correction avoids washed out areas near the shadow terminator,
preserving more detail from normal and bump maps.
It implements the method from the paper "A Microfacet-Based Shadowing
Function to Solve the Bump Terminator Problem" by Alejandro Conty Estevez,
Pascal Lecocq, and Clifford Stein.
Pull Request: https://projects.blender.org/blender/blender/pulls/135380
This makes it possible to restore previous Blender 4.3 behavior of bump
mapping, where the large filter width was sometimes (ab)used to get a bevel
like effect on stepwise textures.
For bump from the displacement socket, filter width remains fixed at 0.1.
Ref #133991, #135841
Pull Request: https://projects.blender.org/blender/blender/pulls/136465
On the user level spatial splits on hair BVH leads to very long build times,
without giving too much advantage in the render times.
There is also some issues and possibly bugs in the builder which lead to all
sort of numerical issues (like divisions by zero). There are also performance
issues that comes from the fact that the alignment space is applied every time
primitive's aligned bounds are requested. It also seems that the splitting
might not be considering aligned space consistently when calculating SAH and
performing splits.
It does sound like issues we'd get fixed ideally, but the importance of the
BVH2 is fading out with the HW-RT becoming more and more popular.
This change contains fix needed for the split algorithm to avoid numerical
issue reported by UBSAN when rendering the `BVH2 particle simple.blend` from
the #126508.
Ref #126508
Ref #136245
Pull Request: https://projects.blender.org/blender/blender/pulls/136430
* Perform attribute interpolation as part of dicing.
* Remove temporary subd uv and face index attributes.
On a MacBook M3 with 12 P-cores and 4 E-cores, these changes overall give
a 10x-14x speedup on various scenes. Note that splitting is still single
threaded and can be expensive, and UV subdivision can be optimized more.
Pull Request: https://projects.blender.org/blender/blender/pulls/136411
* Move dicing out of DiagSplit, caller now uses EdgeDice
* Merge, rename and reorder various EdgeDice functions
* Compute triangle indices for subpatches in advance
Pull Request: https://projects.blender.org/blender/blender/pulls/136411
The transparent bounce test was too optimistic in regards to the intersection
being considered. The check needs to happen after it has been validated that
it is not duplicate.
It was already the case for Metal and HIP-RT, but not for Embree and BVH2.
Tests updated by: Alaska <Alaskayou01@gmail.com>
Pull Request: https://projects.blender.org/blender/blender/pulls/136325
The reason for this to happen is because when spatial split is used
the same intersection could be recorded twice (via different BVH nodes).
This change introduces check for the intersection being already recoded,
similar to the check in the local BVH. The check is done during BVH
intersection which allows to properly ignore intersections even for the
maximum bounce number check. A faster approach would be to do such
filtering after sorting, but then we can not keep bounce check in the
BVH code consistent with and without spatial splits.
Intuitively it seems that it should be possible to merge the new loop
with the one that checks for which intersection to keep. But it is not
so trivial in practice: it doesn't run for all intersections, and also
it is formulated in a way that updates isect_index for the next record.
Pull Request: https://projects.blender.org/blender/blender/pulls/136251
This issue only affects profiling mode (`CYCLES_METAL_PROFILING=1`). There's a modest limit to the number of concurrent counter sampling buffers per device, so instead of creating one per device queue, we create one per device that can be reused by successive device queues.
Authored by Emma Liu.
Pull Request: https://projects.blender.org/blender/blender/pulls/136248
The code which was checking whether local intersection is to be
recorded, and under which index was duplicated for triangles,
motion triangles, and HIP-RT triangle filter function.
This change moves the common logic to an utility function which
is reused from all the places mentioned above.
Pull Request: https://projects.blender.org/blender/blender/pulls/136244
Use the mesh wrapper mechanism from GPU subdivision to get the base mesh.
This can significantly reduce memory usage and render setup time if the
level was not manually set to zero.
Pull Request: https://projects.blender.org/blender/blender/pulls/135895