This change puts all the block size macros in the same common header, so
they can be included in host side code without needing to also include
the kernels that are defined in the device headers that contained these
values.
This change also removes a magic number used to enqueue a kernel, which
happened to agree with the GPU_PARALLEL_SORT_BLOCK_SIZE macro.
Pull Request: https://projects.blender.org/blender/blender/pulls/143646
`Eavg` can still be 1 for very small roughness, causing division by zero
when computing `Ems`.
A roughness of 1e-5 gives an `Evg` of 0.999998, seems reasonable.
Pull Request: https://projects.blender.org/blender/blender/pulls/143637
Add new "Linear 3D Curves" option in the Curves panel in the render
properties. This renders curves as linear segments rather than smooth
curves, for faster render time at the cost of accuracy.
On NVIDIA Blackwell GPUs, this can give a 6x speedup compared to smooth
curves, due to hardware acceleration. On NVIDIA Ada there is still
a 3x speedup, and CPU and other GPU backends will also render this
faster.
A difference with smooth curves is that these have end caps, as this
was simpler to implement and they are usually helpful anyway.
In the future this functionality will also be used to properly support
the CURVE_TYPE_POLY on the new curves object.
Pull Request: https://projects.blender.org/blender/blender/pulls/139735
This fixes issues when using Embree on mutliple GPUs.
A previous workaround used separate contexts, this one now
lets us keep a single context for all GPUs.
Pull Request: https://projects.blender.org/blender/blender/pulls/143089
Update use_shading, use_camera and the shading system pointers in the same
location, so that when the render is interrupted they are in a consistent state.
The added null pointer checks are not strictly needed, but just in case it
goes out of sync for another reason.
Pull Request: https://projects.blender.org/blender/blender/pulls/143467
`sd->dPdu`, `sd->dPdv`, `sd->du` and `sd->dv` are computed from
`sd->dP` by constructing a local frame, so both results are the same, subject to some numerical differences.
This avoids constructing the local frame again, so might be faster.
Part of https://projects.blender.org/blender/blender/pulls/141278
Blend files compatibility:
If a World exists and "Use Nodes" is disabled, we add new nodes to the
existing node tree (or create one if it doesn't) that emulates the
behavior of a world without a node tree. This ensures backward and
forward compatibility.
Python API compatibility:
- `world.use_nodes` was removed from Python API => **Breaking change**
- `world.color` is still being used by Workbench, so it stays there,
although it has no effect anymore when using Cycles or EEVEE.
Python API changes:
Creating a World using `bpy.data.worlds.new()` now creates a World with
an empty (embedded) node tree. This was necessary to enable Python
scripts to add nodes without having to create a node tree (which is
currently not possible, because World node trees are embedded).
Pull Request: https://projects.blender.org/blender/blender/pulls/142342
It allows to implement tricks based on a knowledge whether the path
ever cam through a portal or not, and even something more advanced
based on the number of portals.
The main current objective is for strokes shading: stroke shader
uses Ray Portal BSDF to place ray to the center of the stroke and
point it in the direction of the surface it is generated for. This
gives stroke a single color which matches shading of the original
object. For this usecase to work the ray bounced from the original
surface should ignore the strokes, which is now possible by using
Portal Depth input and mixing with the Transparent BSDF. It also
helps to make shading look better when there are multiple stroke
layers.
A solution of using portal depth is chosen over a single flag due
to various factors:
- Last time we've looked into it it was a bit tricky to implement
as a flag due to us running out of bits.
- It feels to be more flexible solution, even though it is a bit
hard to come up with 100% compelling setup for it.
- It needs to be slightly different from the current "Is Foo"
flags, and be more "Is Portal Descendant" or something.
An extra uint16 is added to the state to count the portal depth,
but it is only allocated for scenes that use Ray Portal BSDF.
Portal BSDF still increments Transparent bounce, as it is required
to have some "limiting" factor so that ray does not get infinitely
move to different place of the scene.
Ref #125213
Pull Request: https://projects.blender.org/blender/blender/pulls/143107
When "generated" is required via Attribute Node, it is not available for
World and Point Cloud.
Make OSL match the SVM behavior to use the object coordinates
(see `svm/attribute.h`),
Pull Request: https://projects.blender.org/blender/blender/pulls/143198
Previously with adaptive subdivision this happened to work with the N
attribute, but that was not meant to be undisplaced. This adds a new
undisplaced_N attribute specifically for this purpose.
For backwards compatibility in Blender 4.5, this also keeps N undisplaced.
But that will be changed in 5.0.
Pull Request: https://projects.blender.org/blender/blender/pulls/142090
Currently MetalRT renders always use extended limits, which is needed to correctly render scenes where the max primitive count can exceed 2^28 or the instance count can exceed 2^24. This patch adopts Metal best practices of only enabling this flag if it is needed.
This PR is similar to #133364, but there are some notable differences:
1) The old PR made an overly optimistic assumption that all the relevant visibility bits could be squeezed into 8 bits. This new PR adopts the same approach that Optix takes of using 8 bits as a primary HW filter, and checking the full 32 bit mask inside the SW intersection handler.
~~2) I moved the scene scanning check from Scene into MetalDevice. This avoids platform specific details leaking into platform agnostic areas.~~
~~3) In live viewport mode, we always use extended limits in case we tip over the threshold.~~
_EDIT:_
2) The limits are scanned in `Scene::update_kernel_features`, and given to the device by a new `set_bvh_limits` method which returns true if the BVH and kernels need to be reloaded.
Pull Request: https://projects.blender.org/blender/blender/pulls/142401
When `delta` is significantly larger than the difference between `tmin`
and `tmax`, the precision of `theta_a` and `theta_b` reduces, resulting
in banding artefacts.
For equiangular sampling, we use uniform sampling to fix this case.
For light tree, we use the equality
atan(a) + atan(b) = atan2(a + b, 1 - a*b).
Pull Request: https://projects.blender.org/blender/blender/pulls/142845
Use Principled BSDF instead of Diffuse BSDF. This is a breaking change but
likely does not affect many scenes significantly. It adds some specularity
when an object does not have a a material assigned.
Fix#142538
Pull Request: https://projects.blender.org/blender/blender/pulls/142703
The code of the "oneapi_load_kernels" function before this modification
was loading kernels and compiling them, if needed, for all devices in
the associated GPU context. This makes sense for one GPU execution
scenario, as well as for execution scenario of multi identical GPU,
but in cases where Blender users have several different GPUs in
render, the previous implementation would compile all kernels
for all devices for each device, unnecessarily doing the same
work multiple times. Because of this, I am changing the
implementation so that now compilation happens only for the used
device per used device, ensuring that no unnecessary work is done.
No render performance changes are expected.
This improves the existing scene specialisation mechanism by replacing "kernel_data.kernel_features" with a function constant. It doesn't cause any additional compilation requests, but allows the backend compiler to eliminate more dead code. An additional compiler hint is provided for dead-stripping "volume_stack_enter_exit" which results in slightly faster rendering of non-volumetric scenes.
Pull Request: https://projects.blender.org/blender/blender/pulls/142235
When having a checkbox and a value both in one row together with an
animation decorator it is questionable whether the decorator should act
on animating the checkbox or the corresponding value.
We had similar cases before (e.g. 7c04ef210e)
In this case as well, one would think it is more desirable to animate
the actual Temperature **value** (instead of the checkbox), so this is
what this PR does.
Pull Request: https://projects.blender.org/blender/blender/pulls/142192
Cycles automatically tries to decide if the camera ray should be a
surface or a volume + surface camera ray by checking to see if the
scene contains a volumetric material, and if it does, is it near
where the camera rays are expected to spawn. This step is done
during scene intialization.
With the OSL camera, it is impratical to predict where the
camera rays will spawn during scene intialization, which makes it
impratical to predict if the OSL camera ray will spawn near a
volumetric object. So this commit marks all OSL cameras as
"inside a volume", leading to the spawning of volume + surface camera
rays for OSL cameras while the scene contains a volumetric material.
This leads to increased render times ranging between 1% - 5% in scenes
that use a OSL camera, has a volumetric object in it, and the
volumetric object is far away from the camera. Every other scene should
see no performance impact.
Testing was done on a AMD Ryzen 9 5950X and a NVIDIA GeForce RTX 4090.
Pull Request: https://projects.blender.org/blender/blender/pulls/142036
This is not an actual solution, it falls back to a perspective camera instead
of crashing. Note full_rastertocamera exists specifically for computing raster
size for adaptive subdivision, and changing it should not affect anything else.
Pull Request: https://projects.blender.org/blender/blender/pulls/141905
These changes introduce modifications to the SYCL queue creation
in OneapiDevice::create_queue. In case several DPC++ devices are
detected by Blender and exposed through it, we are now creating
a new SYCL context for each device, which allows us to prevent
execution failures due to some known issues in the DPC++ runtime
regarding multi GPU support. As this would have some small
performance impact, few percents, it is only applied to
multi GPU configurations, while the behavior for a single
GPU configuration remains the same.
Pull Request: https://projects.blender.org/blender/blender/pulls/141834