This will be needed to determine if there are volumes in the scene, before
allocation passes to aid volume sampling.
The kernels are now also loaded in the middle of scene update, at a place
where kernel features are known but before the kernels are needed for
displacement and background light evaluation..
Updating the camera to final resolution for progressive refinement still
happens later.
Pull Request: https://projects.blender.org/blender/blender/pulls/137228
The crash was caused by an overflow in the opgl_path_segment_storage
array. It happened because shadow catcher paths would write to the
segments but not clear them. This made it so the next render loop
iteration for the main path starts with non-empty segments in the
guiding data.
Disable training when megakernel is called for the shadow catcher
state.
To ensure this issue is not forgotten when the guiding is ported to
GPU add asserts in the `guiding.h`. While it is a no-op for default
GPU kernels sometimes we do compile debug kernels. But also it acts
as a plain-text reminder to the future-us in working on the code.
There is now also an assert before the main path megakernel to help
catching such cases in the future.
Pull Request: https://projects.blender.org/blender/blender/pulls/137291
The goal is to reduce the affect of the fmod() used in the noise code,
which was initially reported in the comment:
https://projects.blender.org/blender/blender/pulls/119884#issuecomment-1258902
Basic idea is to benefit from SIMD vectorization on CPU.
Tested on Linux i9-11900K and macOS on M2 Ultra, in both cases performance
after this change is very close to what it could be with the fmod() commented
out (the call itself, `p = p + precision_correction`).
On macOS the penalty of fmod() was about 10%, on Linux it was closer to 30%
when built with GCC-13. With Linux builds from the buildbot it is more like 18%.
The optimization is only done for 3d and 4d noise. It might be possible to
gain some performance improvement for 1d and 2d cases, but the approach would
need to be different: we'd need to optimize scalar version fmodf(). Maybe
tricks with integer cast will be faster (since we are a bit optimistic in the
kernel and do not guarantee exact behavior in extreme cases such as NaN inputs).
Pull Request: https://projects.blender.org/blender/blender/pulls/137109
Commit be63ebd961
This is causing issues with CUDA kernel compilation in some setups, even
though the builbot is ok. Since this isn't yet working for oneAPI anyway,
revert all the changes to Cycles kernel compilation for now.
Use VERBATIM to ensure spaces inside command line arguments don't get
escaped automatically.
On Linux and Windows the oneAPI kernel compilation still has problems.
There is an apparent bug with single quote escaping in add_custom_command
which means it's not easy to use VERBATIM.
At the moment MNEE locks up Cycles, or has rendering artifacts on
RDNA4 GPUs on WIndows.
This commit disables MNEE on that configuration until a fix
is avaliable.
Pull Request: https://projects.blender.org/blender/blender/pulls/136980
Currently MetalRT interpolates transformation matrix on per-element basis
which leads to issues like #135659.
This change adds implementation of for decomposed (Scale/Rotate/Translate)
motion interpolation, matching behavior of BVH2 and other HW-RT.
This requires macOS 15 and Xcode 16 in order to use this interpolation.
On older platforms and compilers old interpolation is used.
Currently there is no changes on the user (by default) and it is only
available via CYCLES_METALRT_PCMI environment variable. This is because
there are some issues with complex motion paths that need to be looked
into. Having code available makes it easier to do further debugging.
Ref #135659
Authored by Emma Liu
Pull Request: https://projects.blender.org/blender/blender/pulls/136253
The current logic disables WITH_CYCLES_ONEAPI_BINARIES when ocloc is not
found, which is fine, but prevented building for other non-Intel SYCL
targets without (unnecessary) ocloc.
The fix here is to remove spir64_gen target when
WITH_CYCLES_ONEAPI_BINARIES is disabled, instead of forcing only spir64.
Reduce the register pressure and branching in the switch() by using
subclass and cast from void* to the base class.
This ensures intersection functions are not inlined multiple times,
bringing performance back.
Alternative could be to avoid functions (they are quite large) but
that only partially resolves the performance regression.
Pull Request: https://projects.blender.org/blender/blender/pulls/136823
Instead of relying on the Intel extensions that may not be implemented,
we can use max_work_group_size until there is a better alternative.
Thanks to Codeplay for this proposal.
Co-authored-by: Georgi Mirazchiyski <georgi.mirazchiyski@codeplay.com>
Area light with a size of zero should not contribute to the scene, so
set the light as disabled.
This does not only fix the reported bug where such light is visible to
the camera, but also a regression in 4.2 where the light contributes to
the scene when light tree is off.
Pull Request: https://projects.blender.org/blender/blender/pulls/136763
This API is not properly implemented in other SYCL backends at the
moment and we don't want it to fail at runtime, so we conservatively
enable it only for Level-Zero.
The initial issues that led to the choice of forcing the use of
linker.exe seem gone and there is currently no strong reason to use
linker.exe explicitly, so let's simplify and use the default setting.
This makes it available in Cycles standalone, and the implementation
can be shared with Blender. This also makes it possible to compute
tangents after tessellation for adaptive subdivision.
There is a difference in UV map tangents when there are no UVs. They
are now generated from object space coordinates instead of auto
texture space coordinates. This is more efficient, and a corner case
that we don't have to keep compatible.
Co-authored-by: Brecht Van Lommel <brecht@blender.org>
Pull Request: https://projects.blender.org/blender/cycles/pulls/25
Add support for OSL parameter metadata named `defaultgeomprop`, whose
values are interpreted the same way as the property on MaterialX node
inputs. When set to `Tworld` the tangent is then automatically linked
to the shader and generated for the mesh.
Pull Request: https://projects.blender.org/blender/cycles/pulls/25
The initial limitation preventing from using -ffast-math, worked around
in 09df1f4caf, got fixed upstream in LLVM
and the fix is part of current DPC++ compiler:
63ecd2a725
We're now able to go back to using -ffast-math, which helps simplifying
the set of compiler flags.
No performance nor conformance change is expected from this change (most
of the gain is achieved already with the use of -cl-fast-relaxed-math
since 284b89a0a3) and this has been
verified on Arc B580 under Windows.
Instead of returning 0 in case the Intel extension for getting the count
of Execution Units isn't available, we now use
sycl::info::device::max_compute_units.
We keep using the Intel extension in priority since it logically goes
with sycl::ext::intel::info::device::gpu_hw_threads_per_eu used in
get_max_num_threads_per_multiprocessor(), for which there is no
sycl::info::device::max_threads_per_compute_unit replacement yet.
The debug set of Embree prebuilt libraries currently lacks SYCL support
while the release ones have it.
This case was not gracefully handled for debug builds with Embree on GPU
enabled, leading to linking errors, trying to resolve rtcNewSYCLDevice
and rtcIsSYCLDeviceSupported.
We now test for this case to explicitly disable the use of Embree on GPU
for debug builds on Windows and print this status from CMake.
This fixes the following warning with MSVC:
device_impl.cpp(287): warning C4805: '|=': unsafe mix of type 'bool' and type 'ccl::uint' in operation
The similar fix is applied to Metal code as well.
There is no short-circuiting boolean operator ||=, so expand the expression.
Pull Request: https://projects.blender.org/blender/blender/pulls/136561
HIP-RT functions do have access to kg, and it was used inconsistently:
some functions were passed actual kg, other were passed nullptr.
This change makes it consistent and passes kg everywhere.
Pull Request: https://projects.blender.org/blender/blender/pulls/136503
The code before this change was relying on the ShadowPayload have
the same "header" as RayPayload for some of the primitive types
(curve, motion triangle, point): intersection functions were shared
between "regular" and shadow rays (shadow in this case is shadow_all),
but extra filter function was used for shadow rays.
This is fragile if someone changes one of these structures. What is
worse is that compiler might actually decide to shuffle things in
some structs, or remove unused fields.
This change also solves confusion about ShadowPayload::prim_type
seemingly only being assigned to PRIMITIVE_NONE. With time it is
not impossible that compiler will also see this, and constant-fold
some checks, or even remove the field. If that happens then the
render result will be wrong. Maybe it is already happening as there
are some GPU and driver and optimization flag specific bugs in the
area.
It is unclear whether it was causing any actual problem: W7800
seems to render all hair correctly on Linux.
Also make some style decisions more consistent: for example,
the way how stop/continue search return value is commented.
Prefer lower vertical space for those.
Mainly readability purposes:
- Having variables called local_payload is ambiguous: does it refer to
LocalPayload type or to a variable be local in a function?
- Some of the functions are used for different ray types, so having the
type case in intersectFunc and filterFunc makes it easier to scan.
For the latter: now it is more obvious that Curve_Intersect_Shadow
expects RayPayload, but Curve_Filter_Shadow expects ShadowPayload.
It might not be a problem currently as ShadowPayload has the same
"header" RayPayload, but it might change in the future. Also, compiler
might optimize fields out from one but not from the other.