This implements a basic render time pass,
using HW-based counters to minimize render time impact.
x86-64 uses the TSC instruction for timing, while ARM64 uses the cntvct_el0
register. In theory TSC is not always super reliable (e.g. old CPUs had it tied
to their current clock rate), but for somewhat recent CPU models it should
be fine. If neither is available, it falls back to `std::chrono::steady_clock`,
which should still be very fast.
The output is in milliseconds of CPU-time per pixel.
Pull Request: https://projects.blender.org/blender/blender/pulls/125933
On its own, the main functionality of the Radial Tiling node
is the ability to divide a 2D Cartesian coordinate system into
as many radial segments as specified by the "Segments" input.
Each segment has its own affinely transformed coordinate system,
provided through the "Segment Coordinates" output, which can be
used to tile textures in a radially symmetric manner.
Additionally, a unique index is provided for every segment through
the "Segment ID" output, the width of each segment at Y-coordinate
of the "Segment Coordinates" output without normalization = 0 is
provided through the "Segment Width" output and the rotation value
of the affine transformation of the coordinate system of each segment
is provided through the "Segment Rotation" output.
The roundness of the coordinate lines of the "Segment Coordinates"
output can be controlled through the "Roundness" inputs.
This can be used to make the coordinate systems of the segments
a mix of Cartesian and polar coordinates.
Lastly, the lines of points of the "Segment Coordinates" output with
constant Y-coordinates have the shape of polygon with rounded corners,
which can be used to procedurally create rounded polygons.
Pull Request: https://projects.blender.org/blender/blender/pulls/127711
These are causing quite a big difference in existing files, which is not
easy to address in versioning. Since the goal of removing this was to
simplify things for us and that's not the case, just revert this change.
This reverts commit ab21755aaf.
Ref #139923
Pull Request: https://projects.blender.org/blender/blender/pulls/146336
For some reason, the `underwater_caustics` test was failing on Metal
after #140480 even though that test doesn't use the Sky Texture.
After messing with the file for a while, going back to the previous version
and adding the changes back one at a time, I've now arrived at a version
that behaves the same way as the #140480 version without breaking the test.
No idea what is the underlying issue, but we've had problems with the MNEE
kernels before so maybe just a compiler thing.
Pull Request: https://projects.blender.org/blender/blender/pulls/146335
This mode is based on the same athmospheric model as the previous one, but now
also accounts for multiple scattering and reflections from the ground.
This increases the accuracy, especially at low elevations.
Also renames some options for consistency:
- The previous "Nishita" model is now "Single Scattering"
- "Dust" is now "Aerosols"
- Default altitude is now 100m.
Co-authored-by: Lukas Stockner <lukas@lukasstockner.de>
Pull Request: https://projects.blender.org/blender/blender/pulls/140480
"Use Nodes" was removed in the compositor to simplify the compositing
workflow. This introduced a slight inconsistency with the Shader Node
Editor.
This PR removes "Use Nodes" for object materials.
For Line Style, no changes are planned (not sure how to preserve
compatibility yet).
This simplifies the state of objects; either they have a material or
they don't.
Backward compatibility:
- If Use Nodes is turned Off, new nodes are added to the node tree to
simulate the same material:
- DNA: Only `use_nodes` is marked deprecated
- Python API:
- `material.use_nodes` is marked deprecated and will be removed in
6.0. Reading it always returns `True` and setting it has no effect.
- `material.diffuse_color`, `material.specular` etc.. Are not used by
EEVEE anymore but are kept because they are used by Workbench.
Forward compatibility:
Always enable 'Use Nodes' when writing blend files.
Known Issues:
Some UI tests are failing on macOS
Pull Request: https://projects.blender.org/blender/blender/pulls/141278
Metal-RT implementation for curve intersect has an additional self
intersection check happening in curve_ribbon_accept(). It is done
for all curve types that has PRIMITIVE_CURVE_RIBBON bit set on them,
including Thick Linear curves. However, the logic in the function is
hardcoded to handle flat ribbon curves with the Catmull Rom basis.
This change makes it so curve_ribbon_accept() is only called for the
ribbon curve type, not when type has ribbon bit set.
Additionally, other places where curve type was checked as a bitmask
were fixed.
Ref #146072
Pull Request: https://projects.blender.org/blender/blender/pulls/146140
Always enforce tiling of some size, up to the 8k tile size.
Rendering very big images without tiles have a lot of challenges.
While solving those challenges is not impossible, it does not seem to
be a practical time investment.
The internals of the way how Cycles work, including Cycles Standalone
is not affected by this change.
A possible downside is that path guiding might not work exactly how one
would expect it to due to lack of information sharing across multiple
tiles. This is something that never worked nicely, and camera animation
and border render has the same issues, so it is not considered a stopper
for this change.
Fixes#145900
Co-authored-by: Brecht Van Lommel <brecht@blender.org>
Pull Request: https://projects.blender.org/blender/blender/pulls/146031
This was added for a fairly specialezed use case and is no longer being used
as far as we know. A future replacement would be to add a USD/Hydra procedural,
for which most of the groundwork already exists.
Pull Request: https://projects.blender.org/blender/blender/pulls/146021
HIP-RT device:
- Add missing flags from the common flags query to the final compiler options
- Switch logging utility from printf to LOG_INFO_IMPORTANT
- Remove redundant compiler options already covered by common flags
HIP device:
- Add compiler command to logging
- Update C++ standard to C++17 to resolve compiler warnings
Pull Request: https://projects.blender.org/blender/blender/pulls/145284
This adds a function that can turn an existing `bNodeTree` into an inlined one.
The new node tree has all node groups, repeat zones, closures and bundles
inlined. So it's just a flat tree that ideally can be consumed easily by render
engines. As part of the process, it also does constant folding.
The goal is to support more advanced features from geometry nodes (repeat zones,
etc.) in shader nodes which the evaluator is more limited because it has to be
able to run on the GPU. Creating an inlined `bNodeTree` is likely the most
direct way to get but may also be limiting in the future. Since this is a fairly
local change, it's likely still worth it to support these features in all render
engines without having to make their evaluators significantly more complex.
Some limitations apply here that do not apply in Geometry Nodes. For example,
the iterations count in a repeat zone has to be a constant after constant
folding.
There is also a `Test Inlining Shader Nodes` operator that creates the inlined
tree and creates a group node for it. This is just for testing purposes.
#145811 will make this functionality available to the Python API as well so that
external renderers can use it too.
This recently changed after a fix in 28f93d5443
but we get better performance by ensuring int3 is packed instead.
Packing int3 currently gives a 7% speedup when rendering wdas_cloud on
Intel Arc B580.
Pull Request: https://projects.blender.org/blender/blender/pulls/145593
When shadow linking is enabled, `intersect_dedicated_light` is scheduled even
if the `PATH_RAY_SUBSURFACE` flag is set. This checks the flag and schedules
`intersect_subsurface` instead.
Pull Request: https://projects.blender.org/blender/blender/pulls/145621
These don't really work as scene linear with sRGB transfer function for e.g.
ACEScg, there are not enough bits. If you want wide gamut you need to use
float colors.
Pull Request: https://projects.blender.org/blender/blender/pulls/145763
Experiments have shown that the OptiX denoiser performs best when
operating on images that have their origin at the top-left corner,
while Blender renders with the origin at the bottom-left corner.
Simply flipping the image vertically before and after denoising is a
relatively trivial operation, so this patch introduces this as an
additional preprocessing and postprocessing step for denoising when the
OptiX denoiser is used. Additionally, this patch also removes an unused
helper function, now that OptiX 8.0 is the minimum.
Pull Request: https://projects.blender.org/blender/blender/pulls/145358
There are several Driver versions which are constructing the wrong,
semantically, version which would force Blender to decline the Intel
device for oneAPI backend usage, based on this. Unfortunately,
the upstream fix is taking a long time to be finally delivered to
the distros and end-users, so it is better if Blender will detect
this wrong version string and parse it properly, allowing these
devices to be used - as the wrong driver version string is the only
issue here, besides this the driver functionality is fine.
Pull Request: https://projects.blender.org/blender/blender/pulls/145658
The Embree scene contains a TBB task group that has a parent pointer to the
task group it was created in. In Cycles this task group was only temporarily
created on the stack, resulting in a dangling parent pointer.
The simple solution is to make the Cycles side task group persistent too.
Many thanks to Aras for figuring this one out, this was a very tricky one.
Pull Request: https://projects.blender.org/blender/blender/pulls/145515
Meshes that require adaptive subdivision are currently tesselated one at
a time. Change this part of device update to be done in parallel.
To remove the possibility of the status message going backwards, a mutex
was required to keep that portion of the loop atomic.
Results for the loop in question: On one particular scene with over 300
meshes requiring tesselation, the update time drops from ~16 seconds to
~3 seconds. The attached synthetic test drops from ~9 seconds down to ~1
second.
Pull Request: https://projects.blender.org/blender/blender/pulls/145220
This pull request removes ROCm 5 code path and adds ROCm 7 runtime to
library search list.
ROCm 5 runtime is no longer shipped with AMD drivers, and ROCm 5 compiler
is no longer compatible with newer driver versions.
It also adds ROCm 7 runtime to the list of runtime libraries to look for.
Starting later this year, ROCm 7 runtime will be bundled with the driver
installer, and all future runtime fixes and improvements will target ROCm 7.
Once ROCm 7 runtime is rolled out, ROCm 6 compiler will continue to work
with it for about a year as a transitional measure. Beyond that, compatibility
is not guaranteed.
Pull Request: https://projects.blender.org/blender/blender/pulls/145279
The function `recursive_build()` can change the `shared_ptr` by making
it an internal node. If `root_` is modified, it could happen that when
the children are accessing `root_->bbox.min` that memory is already
freed.
Storing `bbox_min` separately seems to fix the issue. No error is seen
after running the tests repeatedly for 2000 times.
Pull Request: https://projects.blender.org/blender/blender/pulls/145239
The compiler in the CUDA 13 toolkit dropped support for Maxwell, Pascal and Volta architectures (sm_5X, sm_6X and sm_70), which affects both CUDA and OptiX kernel compilation for Cycles. This patch makes it so building CUDA kernel binaries for those architectures are skipped when CUDA 13 is used, but it will still build them if there is a CUDA 11 toolkit available (e.g. on buildbot), like how things are handled for other architectures. The OptiX PTX kernel is compiled with the minimum architecture available (compute_75 with CUDA 13, compute_50 with previous CUDA versions).
In addition, loading the PTX kernel after initializing OptiX version 9.0 would fail with a OPTIX_ERROR_INVALID_FUNCTION_USE, due to the use of "optixTrace" within direct callables (as part of the AO and bevel SVM nodes). Starting with OptiX 9.0 this is no longer allowed, rather one has to use "optixTraverse" in those cases. This patch thus changes the affected intersection routines to use "optixTraverse". As a side effect it also simplifies the `scene_intersect_shadow` function, which no longer invokes the closest hit program, and can just quickly return hit status. The minimum OptiX version Cycles requires is already 8.0, which supports "optixTraverse", so it can just be applied always.
Finally, this patch also adds the `--split-compile=0` argument to nvcc when available, which tells the compiler to internally split the module into pieces that can be processed in parallel on multiple threads (the `=0` notes to use as many threads as there are CPU cores), which can greatly improving compile times, while not making compromises on performance.
Pull Request: https://projects.blender.org/blender/blender/pulls/145130
This re-applies pull request #140671, but with a fix for #144713 where the
non-pointer part of IntegratorStateGPU was not initialized.
This PR is a more extensive follow on from #123551 (removal of AMD and Intel
GPU support).
All supported Apple GPUs have Metal 3 and tier 2 argument buffer support.
The invariant resource properties `gpuAddress` and `gpuResourceID` can be
written directly into GPU structs once at setup time rather than once per
dispatch. More background info can be found in this article:
https://developer.apple.com/documentation/metal/improving-cpu-performance-by-using-argument-buffers?language=objc
Code changes:
- All code relating to `MTLArgumentEncoder` is removed
- `KernelParamsMetal` updates are directly written into
`id<MTLBuffer> launch_params_buffer` which is used for the "static"
dispatch arguments
- Dynamic dispatch arguments are small enough to be encoded using the
`MTLComputeCommandEncoder.setBytes` function, eliminating the need for
cycling temporary arg buffers
Fix#144713
Co-authored-by: Brecht Van Lommel <brecht@noreply.localhost>
Pull Request: https://projects.blender.org/blender/blender/pulls/145175
Blender grid rendering interprets voxel transforms in such a way that the voxel
values are located at the center of a voxel. This is inconsistent with OpenVDB
where the values are located at the lower corners for the purpose or sampling
and related algorithms.
While it is possible to offset grids when communicating with the OpenVDB
library, this is also error-prone and does not add any major advantage.
Every time a grid is passed to OpenVDB we currently have to take care to
transform by half a voxel to ensure correct sampling weights are used that match
the density displayed by the viewport rendering.
This patch changes volume grid generation, conversion, and rendering code so
that grid transforms match the corner-located values in OpenVDB.
- The volume primitive cube node aligns the grid transform with the location of
the first value, which is now also the same as min/max bounds input of the
node.
- Mesh<->Grid conversion does no longer require offsetting grid transform and
mesh vertices respectively by 0.5 voxels.
- Texture space for viewport rendering is offset by half a voxel, so that it
covers the same area as before and voxel centers remain at the same texture
space locations.
Co-authored-by: Brecht Van Lommel <brecht@blender.org>
Pull Request: https://projects.blender.org/blender/blender/pulls/138449
Supports baking to object and tangent space.
Compatible with Cycles Vector Displacement node which has the
(tangent, normal, bitangent) convention.
The viewport situation is a bit confusing: seems that Eevee
does not handle vector displacement properly and rips all faces
apart. Cycles renders the displaced object correctly.
Not entirely happy with the UI, as displacement space does not
really belong to the Output, but so doesn't Low Resolution Mesh.
Perhaps the best would be to have a separate pass to revisit the
settings, and also make it more clear what the Low Resolution Mesh
actually does.
Pull Request: https://projects.blender.org/blender/blender/pulls/145014
Almost all settings were duplicated between BakeData and RenderData.
The only missing field was the bake type, which is stored as a custom
property in Cycles.
This change:
- Removes unused bake_samples and bake_biasdist.
- Migrates settings like bake_margin to BakeData.
- Switches multires baker to use bake_margin.
- Introduces bake type in the BakeData, the same way how it was
defined in RenderData::bake_mode.
Pull Request: https://projects.blender.org/blender/blender/pulls/144984
The main idea is to switch Bake from Multires from legacy DerivedMesh
to Subdiv. On the development side of things this change removes a lot
of code, also making it easier easier to rework CustomData and related
topics, without being pulled down by the DerivedMesh.
On the user level switch to Subdiv means:
- Much more closer handling of the multi-resolution data: the derived
mesh code was close, but not exactly the same when it comes to the
final look of mesh.
Other than less obvious cases (like old DerivedMesh approach doing
recursive subdivision instead of pushing subdivided vertices on the
limit surface) there are more obvious ones like difference in edge
creases, and non-supported vertex creases by the DerivedMesh.
- UV interpolation is done correctly now when baking to non-base level
(baking to multi-resolution level >= 1).
Previously in this case the old derived mesh interpolation was used
to interpolate face-varying data, which gives different results from
the OpenSubdiv interpolation.
- Ngon faces are properly supported now.
A possible remaining issue is the fact that getting normal from CCG
always uses smooth interpolation. Based on the code it always has been
the case, so while it is something to look into it might be considered
a separate topic to dig into.