Perform delayed freeing of the geometry BVHs similar to OptiX.
Previously BLAS memory was allocated in the device class as part of
device_update, but released in the BVHHIPRT destructor which gets called
when deleting geometry outside of device_update.
To avoid the GPU accessing unmapped memory, do a delayed free of this
memory in the device class as part of either device_update or device
destruction. This ensures it is in sync with other device memory changes.
Fix#148276Fix#139013Fix#138043Fix#140763
Pull Request: https://projects.blender.org/blender/blender/pulls/147247
The Cycles cpu/device.cpp `device_cpu_capabilities()` function used to
fill out a string of supported CPU capabilities separated with spaces,
with some trailing-space cleaning logic at the end of the function.
However, if no check succeeded, and especially after commit 2bf6d0fd71
which left only one check and removed the need for removing trailing
spaces, the check would run against an empty string, resulting in an
unsigned 0 - 1 operation which would then cause an out of bound access
catched by ASan.
Fixed by removing the now superflous trailing space cleaning logic and
simplifying to a direct return.
Pull Request: https://projects.blender.org/blender/blender/pulls/148227
Only use this feature when building for 1 or 2 CUDA architectures.
Otherwise CMake will build the binaries in parallel, and NVCC will then
also launch multiple threads for each binary.
We could add more manual control for this, but the main use case for
this is local builds and an automatic heuristic seems more likely to
help than an option that developers or users might not discover.
For minimal memory usage WITH_CYCLES_CUDA_BUILD_SERIAL still exists
to use only 1 thread for CUDA compilation.
Pull Request: https://projects.blender.org/blender/blender/pulls/147303
This is an alternate solution to !146889 to improve labels in the
camera UI, while being much less invasive. It doesn't take custom
labels into account, but it simply uses the parameter names with title
case.
Pull Request: https://projects.blender.org/blender/blender/pulls/148141
Follow on from PR #141891. The `MTLAccelerationStructureUsagePreferFastIntersection` flag didn't exist until Xcode 26.0, so we ensure that it is defined for forward-compatibility. The runtime `if (@available(macos 26.0, *))` checks still remain.
Pull Request: https://projects.blender.org/blender/blender/pulls/147561
Related to https://projects.blender.org/blender/blender/pulls/147994 in
which clang-cl builds failed passing the Windows SDK libraries paths to
the compiler.
The previous CMake implementation tried to reverse engineer these paths
at CMake configuration time but failed with `clang-cl`.
The environment variables set by vcvars that could have been useful
aren't always available when cmake is called, so now we keep the `LIB`
environment variable intact at compile time and pass the other
additional compiler libraries paths - that are better defined at CMake
configuration time - separately through `-L` compiler arguments.
Pull Request: https://projects.blender.org/blender/blender/pulls/148035
When a Cycles render is uneven (E.g. a small portion of the image takes
significantly longer to render than the rest of the image), then GPUs
typically suffer from poor performance due to low occupancy unless a
large number of samples is scheduled to compensate for that small
complex region.
Due to how Cycles render scheduler is setup, Cycles tries to
increase GPU occupancy by increasing the number of scheduled samples,
but also balance render preview update time by scaling back
the number of scheduled samples based on the previous workloads time
per sample to try and fit within the target update time.
However using the previous workloads time per sample to scale back the
scheduled number of samples gives suboptimal results because the
previous workloads time per sample is usually not representative of the
new time per sample that occurs as GPU occupancy increases, because
increasing GPU occupancy typically results in reduced time per samples.
This commit improves on this issue by assuming that increasing GPU
occupancy linearly improves performance, and adjusts the function
that scales back the sample count to fit within a specific update
window to use this assumption. This leads to Cycles increasing the
amount of work scheduled onto the GPU quicker at the beginning of
uneven/low occupancy scenes, generally leading to improved performance,
especially when rendering at low sample counts or with strict
time limits.
Ref #147954
Pull Request: https://projects.blender.org/blender/blender/pulls/147950
Commit 6c6d1a9b63 allowed several units to OSL camera shaders'
parameters. Only meters 'm' were supported as distance units, because
others cannot be converted automatically into the shader' unit.
This commit allows using 'mm' in addition to 'm', and makes the
Blender property use a 'DISTANCE_CAMERA' subtype. This assumes that
someone using 'mm' specifically wants to use the value for camera
parameters, which means it can be used as is, without any conversion
in the shader.
The camera templates were updated to use a focal length in mm.
Pull Request: https://projects.blender.org/blender/blender/pulls/147347
In OSL custom cameras, the current aperture size depends on the focal
length of the Blender camera, even though it is not usable by the
shader. This gives incoherent values to the depth of field.
To ignore this factor, this commit makes the custom camera behave the
same as the orthographic camera, by not being multiplied by the focal
length.
To compensate for that, the multiplication must happen inside the
shader. This adjustment was done in the advanced camera shader
template.
Pull Request: https://projects.blender.org/blender/blender/pulls/147346
In some hardware configurations, it is possible that DPC++ or
Intel Drivers wrongfully report all devices twice. It is already
being worked on internally, and the fixes will be available in
the future - but for now, we need a workaround for this problem
in Blender as well, to ensure that our end-users are not impacted.
Pull Request: https://projects.blender.org/blender/blender/pulls/147731
This new version of the graphics compiler improves performance
for the majority of supported Intel devices and adds support
for upcoming Intel hardware. Such an upgrade also requires
an increase in the minimal supported driver version on Windows,
which is why these changes are combined together with
the ocloc upgrade.
Previously set minimal version 101.6557 was increased to 101.8132.
Pull Request: https://projects.blender.org/blender/blender/pulls/147460
These changes are already in place de facto, as the check for
the minimal driver version in the Blender code was already
updated. This commit simply adds the UI update, which I
initially forgot to perform.
Pull Request: https://projects.blender.org/blender/blender/pulls/147459
This PR adds Vulkan/oneAPI graphics interop to Cycles. Just like for
CUDA and HIP interop, persistent memory mapping is used, as there could
potentially be some overhead of continuously mapping/unmapping buffers.
Pull Request: https://projects.blender.org/blender/blender/pulls/144442
For metal version after 3.2 it's possible to log debugging messages, it
works similar to `printf()`, except for a few differences:
- `%s` is not supported,
- `double` doesn't exist, so no casting to double for `%f`,
- no `\n` needed at the end of the format string.
To see the print in the console, environment variables `MTL_LOG_LEVEL`
should be set to `MTLLogLevelDebug`, and `MTL_LOG_TO_STDERR` should be
set to `1`. See
https://developer.apple.com/documentation/metal/logging-shader-debug-messages
Right now `printf()`, `print_float()`, `print_float2()`,
`print_float3()` and `print_float4()` are supported.
Thanks to @fclem for finding this out.
Pull Request: https://projects.blender.org/blender/blender/pulls/146585
We used to set shader->osl_surface_ref during shader compilation and then
pushed it into the shared vectors.
This worked as long as everything was serial - but after the multithreading
change, we a) compile everything and then b) build the shared vectors since
just pushing into them from multiple threads would not work.
However, if there are multiple devices, then each shader will be compiled
multiple times - so in the end, shader->osl_surface_ref etc. will be set
to the last device's value. Then, we end up pushing that value into every
device's vectors, which breaks for the earler devices.
The fix is simple - just preallocate the vectors and pass the correct index
into the compilation function. This way, each thread can safely store its
result and we can get rid of shader->osl_surface_ref entirely.
Note that while multiple shaders are compiled in parallel, the loop over
devices for a given shader is serial, so there's no concern of conflicts
over other shader internals.
Pull Request: https://projects.blender.org/blender/blender/pulls/146617
Applies thin film iridescence to metals in Metallic BSDF and Principled BSDF.
To get the complex IOR values for each spectral band from F82 Tint colors,
the code uses the parametrization from "Artist Friendly Metallic Fresnel",
where the g parameter is set to F82. This IOR is used to find the phase shift,
but reflectance is still calculated with the F82 Tint formula after adjusting
F0 for the film's IOR.
Co-authored-by: Lukas Stockner <lukas@lukasstockner.de>
Co-authored-by: Weizhen Huang <weizhen@blender.org>
Co-authored-by: RobertMoerland <rmoerlandrj@gmail.com>
Pull Request: https://projects.blender.org/blender/blender/pulls/141131
This expands Cycles' support for handling OSL property metadata for
Custom camera parameters and translating it to Blender's UI.
Specifically, it adds support for:
- Translation inputs (`string vecsemantics = "POINT"`)
- Normal inputs (`string vecsemantics = "NORMAL"`)
- File inputs (`string widget = "filename"`)
- Angle inputs (`string unit = "radians"`)
- Distance inputs (`string unit = "m"`)
- Time inputs (`string unit = "s"` or `string unit = "sec"`)
- Enum inputs (`string widget = "mapper", string options = "left:0|right:1"`)
It also sets the default value correctly, and corrects a warning string to
also mention cameras in addition to nodes as possible users of OSL shaders.
Co-authored-by: Lukas Stockner <lukas@lukasstockner.de>
Pull Request: https://projects.blender.org/blender/blender/pulls/146736
The workaround of forcing BVH building into single thread
execution on the Blender side is not needed anymore,
because the problem was properly fixed in the upstream
since Embree upgrade in Blender 4.5
This reverts commit c0f0e2ca6f.
Pull Request: https://projects.blender.org/blender/blender/pulls/146859
In very old OpenEXR version there was a limit on the channel names, which meant
the pass names needed to be short like "DiffDir". Change them to be longer like
"Diffuse Direct".
* This breaks forward compatibility. Old Blender version will lose links when
reading compositing node setups with such passes, but #146571 will fix it
for 4.5 LTS.
* Add-ons, scripts and compositing setups in other applications that rely on these
names will also break.
* The find_by_type function for render passes has also been removed, as this was
already deprecated and replaced by find_by_name.
* We assume spaces in the name are ok, since we have passes with them already
and have not seen reports about compatibility issues.
Pull Request: https://projects.blender.org/blender/blender/pulls/142731
Signed integer overflow is undefined, but works reliably enough for us
anyway, so just silence it like the Blender implementation already does.
This also caused some tests like cycles_displacement_cpu to run much slower
(3s -> 42s) due to the overhead of detecting and ignoring repeated warnings.
Pull Request: https://projects.blender.org/blender/blender/pulls/146783
* Add adaptive subdivision properties natively on the subdivision surface
modifier, so that other engines may reuse them in the future. This also
resolve issues where they would not get copied properly.
* Remove "Feature Set" option in the render properties, this was the last
experimental one.
* Add space choice between "Pixel" and "Object". The latter is new and can
be used for object space dicing that works with instances. Instead of
a pixel size an object space edge length is specified.
* Add object space subdivision test.
Ref #53901
Pull Request: https://projects.blender.org/blender/blender/pulls/146723
MTLAccelerationStructureMotionCurveGeometryDescriptor.controlPointCount should specify the per-step control point count. Although the previous initialisation wasn't manifesting as incorrect behaviour it was technically wrong.
Pull Request: https://projects.blender.org/blender/blender/pulls/146568
The one-sample Monte Carlo estimator of the radiative transfer equation
is
<L> = T(t) / p(t) * (L_e + σ_s * L_s + σ_n * L),
Which means we can also use another p(t) than majorant * exp(-majorant * t)
for sampling the distance. Thus, we use the baked σ_max for distance
sampling, but adjust the majorant when we encounter a density that is
larger than σ_max.
Note that this is not really unbiased because such scaling is not always
applied, but seems to work well in practice when the majorant is
reasonable.
Pull Request: https://projects.blender.org/blender/blender/pulls/146589