This commit fixes a issue where Cycles adaptive kernel compilation
would always undefine adaptive kernel features, resulting in various
issues like incorrect renders.
Pull Request: https://projects.blender.org/blender/blender/pulls/137804
When Emission Sampling in None, we loop through
kernel_data.integrator.num_lights to find the light intersection.
If the light shader changes, the number of lights might change, so we
need to tag the light manager for update to recount the lights.
Pull Request: https://projects.blender.org/blender/blender/pulls/137605
Emission estimate from auto conversion is downscaled to avoid building
a light tree that is expensive to evaluate; however, this estimate is
also used as the final emission value in the kernel if the emission is
constant.
Therefore, only downscale the estimate for the EMISSION_SAMPLING_AUTO
case.
Pull Request: https://projects.blender.org/blender/blender/pulls/137606
This commit adds some extra prints to terminal related to oneAPI driver
information in the situation that the driver version is considered
incompatible with the current version of Cycles.
Pull Request: https://projects.blender.org/blender/blender/pulls/137272
This will be needed to determine if there are volumes in the scene, before
allocation passes to aid volume sampling.
The kernels are now also loaded in the middle of scene update, at a place
where kernel features are known but before the kernels are needed for
displacement and background light evaluation..
Updating the camera to final resolution for progressive refinement still
happens later.
Pull Request: https://projects.blender.org/blender/blender/pulls/137228
The crash was caused by an overflow in the opgl_path_segment_storage
array. It happened because shadow catcher paths would write to the
segments but not clear them. This made it so the next render loop
iteration for the main path starts with non-empty segments in the
guiding data.
Disable training when megakernel is called for the shadow catcher
state.
To ensure this issue is not forgotten when the guiding is ported to
GPU add asserts in the `guiding.h`. While it is a no-op for default
GPU kernels sometimes we do compile debug kernels. But also it acts
as a plain-text reminder to the future-us in working on the code.
There is now also an assert before the main path megakernel to help
catching such cases in the future.
Pull Request: https://projects.blender.org/blender/blender/pulls/137291
The goal is to reduce the affect of the fmod() used in the noise code,
which was initially reported in the comment:
https://projects.blender.org/blender/blender/pulls/119884#issuecomment-1258902
Basic idea is to benefit from SIMD vectorization on CPU.
Tested on Linux i9-11900K and macOS on M2 Ultra, in both cases performance
after this change is very close to what it could be with the fmod() commented
out (the call itself, `p = p + precision_correction`).
On macOS the penalty of fmod() was about 10%, on Linux it was closer to 30%
when built with GCC-13. With Linux builds from the buildbot it is more like 18%.
The optimization is only done for 3d and 4d noise. It might be possible to
gain some performance improvement for 1d and 2d cases, but the approach would
need to be different: we'd need to optimize scalar version fmodf(). Maybe
tricks with integer cast will be faster (since we are a bit optimistic in the
kernel and do not guarantee exact behavior in extreme cases such as NaN inputs).
Pull Request: https://projects.blender.org/blender/blender/pulls/137109
Commit be63ebd961
This is causing issues with CUDA kernel compilation in some setups, even
though the builbot is ok. Since this isn't yet working for oneAPI anyway,
revert all the changes to Cycles kernel compilation for now.
Use VERBATIM to ensure spaces inside command line arguments don't get
escaped automatically.
On Linux and Windows the oneAPI kernel compilation still has problems.
There is an apparent bug with single quote escaping in add_custom_command
which means it's not easy to use VERBATIM.
At the moment MNEE locks up Cycles, or has rendering artifacts on
RDNA4 GPUs on WIndows.
This commit disables MNEE on that configuration until a fix
is avaliable.
Pull Request: https://projects.blender.org/blender/blender/pulls/136980
Currently MetalRT interpolates transformation matrix on per-element basis
which leads to issues like #135659.
This change adds implementation of for decomposed (Scale/Rotate/Translate)
motion interpolation, matching behavior of BVH2 and other HW-RT.
This requires macOS 15 and Xcode 16 in order to use this interpolation.
On older platforms and compilers old interpolation is used.
Currently there is no changes on the user (by default) and it is only
available via CYCLES_METALRT_PCMI environment variable. This is because
there are some issues with complex motion paths that need to be looked
into. Having code available makes it easier to do further debugging.
Ref #135659
Authored by Emma Liu
Pull Request: https://projects.blender.org/blender/blender/pulls/136253
The current logic disables WITH_CYCLES_ONEAPI_BINARIES when ocloc is not
found, which is fine, but prevented building for other non-Intel SYCL
targets without (unnecessary) ocloc.
The fix here is to remove spir64_gen target when
WITH_CYCLES_ONEAPI_BINARIES is disabled, instead of forcing only spir64.
Reduce the register pressure and branching in the switch() by using
subclass and cast from void* to the base class.
This ensures intersection functions are not inlined multiple times,
bringing performance back.
Alternative could be to avoid functions (they are quite large) but
that only partially resolves the performance regression.
Pull Request: https://projects.blender.org/blender/blender/pulls/136823
Instead of relying on the Intel extensions that may not be implemented,
we can use max_work_group_size until there is a better alternative.
Thanks to Codeplay for this proposal.
Co-authored-by: Georgi Mirazchiyski <georgi.mirazchiyski@codeplay.com>
Area light with a size of zero should not contribute to the scene, so
set the light as disabled.
This does not only fix the reported bug where such light is visible to
the camera, but also a regression in 4.2 where the light contributes to
the scene when light tree is off.
Pull Request: https://projects.blender.org/blender/blender/pulls/136763
This API is not properly implemented in other SYCL backends at the
moment and we don't want it to fail at runtime, so we conservatively
enable it only for Level-Zero.
The initial issues that led to the choice of forcing the use of
linker.exe seem gone and there is currently no strong reason to use
linker.exe explicitly, so let's simplify and use the default setting.
This makes it available in Cycles standalone, and the implementation
can be shared with Blender. This also makes it possible to compute
tangents after tessellation for adaptive subdivision.
There is a difference in UV map tangents when there are no UVs. They
are now generated from object space coordinates instead of auto
texture space coordinates. This is more efficient, and a corner case
that we don't have to keep compatible.
Co-authored-by: Brecht Van Lommel <brecht@blender.org>
Pull Request: https://projects.blender.org/blender/cycles/pulls/25
Add support for OSL parameter metadata named `defaultgeomprop`, whose
values are interpreted the same way as the property on MaterialX node
inputs. When set to `Tworld` the tangent is then automatically linked
to the shader and generated for the mesh.
Pull Request: https://projects.blender.org/blender/cycles/pulls/25
The initial limitation preventing from using -ffast-math, worked around
in 09df1f4caf, got fixed upstream in LLVM
and the fix is part of current DPC++ compiler:
63ecd2a725
We're now able to go back to using -ffast-math, which helps simplifying
the set of compiler flags.
No performance nor conformance change is expected from this change (most
of the gain is achieved already with the use of -cl-fast-relaxed-math
since 284b89a0a3) and this has been
verified on Arc B580 under Windows.
Instead of returning 0 in case the Intel extension for getting the count
of Execution Units isn't available, we now use
sycl::info::device::max_compute_units.
We keep using the Intel extension in priority since it logically goes
with sycl::ext::intel::info::device::gpu_hw_threads_per_eu used in
get_max_num_threads_per_multiprocessor(), for which there is no
sycl::info::device::max_threads_per_compute_unit replacement yet.
The debug set of Embree prebuilt libraries currently lacks SYCL support
while the release ones have it.
This case was not gracefully handled for debug builds with Embree on GPU
enabled, leading to linking errors, trying to resolve rtcNewSYCLDevice
and rtcIsSYCLDeviceSupported.
We now test for this case to explicitly disable the use of Embree on GPU
for debug builds on Windows and print this status from CMake.
This fixes the following warning with MSVC:
device_impl.cpp(287): warning C4805: '|=': unsafe mix of type 'bool' and type 'ccl::uint' in operation
The similar fix is applied to Metal code as well.
There is no short-circuiting boolean operator ||=, so expand the expression.
Pull Request: https://projects.blender.org/blender/blender/pulls/136561
HIP-RT functions do have access to kg, and it was used inconsistently:
some functions were passed actual kg, other were passed nullptr.
This change makes it consistent and passes kg everywhere.
Pull Request: https://projects.blender.org/blender/blender/pulls/136503