Metal-RT implementation for curve intersect has an additional self
intersection check happening in curve_ribbon_accept(). It is done
for all curve types that has PRIMITIVE_CURVE_RIBBON bit set on them,
including Thick Linear curves. However, the logic in the function is
hardcoded to handle flat ribbon curves with the Catmull Rom basis.
This change makes it so curve_ribbon_accept() is only called for the
ribbon curve type, not when type has ribbon bit set.
Additionally, other places where curve type was checked as a bitmask
were fixed.
Ref #146072
Pull Request: https://projects.blender.org/blender/blender/pulls/146140
Only use when Windows Automatic Color Management (ACM) is enabled.
That way we know Windows will automatically convert from extended sRGB
linear to the appropriate color space for the display.
When we previously tried this there were some issues, but I think it was due
to ACM being off. This has been confirmed to work on NVIDIA and AMD, and
Intel is not expected to support this (yet) for the same reasons there is no HDR.
Pull Request: https://projects.blender.org/blender/blender/pulls/146041
`GHOST_SwapWindowBuffers` doesn't fit well when using swapchains. In
that case an approach where swap chain images are acquired and released
would map better. This PR introduces `GHOST_SwapWindowBufferAcquire`
and `GHOST_SwapWindowBufferRelease` to be more in line with vulkan swap
chains.
Previous implementation would first record all GPU commands based on
the last used swap chain. In case a swapchain needed to be recreated
(window resize, move to other monitor) the recorded commands would
not match the swap chain and could lead to artifacts.
OpenGL only implements the release functions as they don't
have a mechanism to acquire a swap chain image. (Need to validate with
the Metal API how this is working and adapt is needed).
Currently when starting blender on a HDR capable display the first frame
would be based on an sRGB surface and presented on an extended RGB
(or other) surface. As these don't match the first frame could be incorrect and
also lead to UBs as another surface is expected.
Pull Request: https://projects.blender.org/blender/blender/pulls/145728
Always enforce tiling of some size, up to the 8k tile size.
Rendering very big images without tiles have a lot of challenges.
While solving those challenges is not impossible, it does not seem to
be a practical time investment.
The internals of the way how Cycles work, including Cycles Standalone
is not affected by this change.
A possible downside is that path guiding might not work exactly how one
would expect it to due to lack of information sharing across multiple
tiles. This is something that never worked nicely, and camera animation
and border render has the same issues, so it is not considered a stopper
for this change.
Fixes#145900
Co-authored-by: Brecht Van Lommel <brecht@blender.org>
Pull Request: https://projects.blender.org/blender/blender/pulls/146031
This was added for a fairly specialezed use case and is no longer being used
as far as we know. A future replacement would be to add a USD/Hydra procedural,
for which most of the groundwork already exists.
Pull Request: https://projects.blender.org/blender/blender/pulls/146021
HIP-RT device:
- Add missing flags from the common flags query to the final compiler options
- Switch logging utility from printf to LOG_INFO_IMPORTANT
- Remove redundant compiler options already covered by common flags
HIP device:
- Add compiler command to logging
- Update C++ standard to C++17 to resolve compiler warnings
Pull Request: https://projects.blender.org/blender/blender/pulls/145284
This adds a function that can turn an existing `bNodeTree` into an inlined one.
The new node tree has all node groups, repeat zones, closures and bundles
inlined. So it's just a flat tree that ideally can be consumed easily by render
engines. As part of the process, it also does constant folding.
The goal is to support more advanced features from geometry nodes (repeat zones,
etc.) in shader nodes which the evaluator is more limited because it has to be
able to run on the GPU. Creating an inlined `bNodeTree` is likely the most
direct way to get but may also be limiting in the future. Since this is a fairly
local change, it's likely still worth it to support these features in all render
engines without having to make their evaluators significantly more complex.
Some limitations apply here that do not apply in Geometry Nodes. For example,
the iterations count in a repeat zone has to be a constant after constant
folding.
There is also a `Test Inlining Shader Nodes` operator that creates the inlined
tree and creates a group node for it. This is just for testing purposes.
#145811 will make this functionality available to the Python API as well so that
external renderers can use it too.
This recently changed after a fix in 28f93d5443
but we get better performance by ensuring int3 is packed instead.
Packing int3 currently gives a 7% speedup when rendering wdas_cloud on
Intel Arc B580.
Pull Request: https://projects.blender.org/blender/blender/pulls/145593
When shadow linking is enabled, `intersect_dedicated_light` is scheduled even
if the `PATH_RAY_SUBSURFACE` flag is set. This checks the flag and schedules
`intersect_subsurface` instead.
Pull Request: https://projects.blender.org/blender/blender/pulls/145621
These don't really work as scene linear with sRGB transfer function for e.g.
ACEScg, there are not enough bits. If you want wide gamut you need to use
float colors.
Pull Request: https://projects.blender.org/blender/blender/pulls/145763
Qualcomm GPU doesn't support image load/store on framebuffers. We use
load store to convert to the correct color space. This PR will only
set the VK_IMAGE_USAGE_STORAGE_BIT when HDR is available. As Qualcomm
doesn't support HDR using Vulkan this is safe.
Pull Request: https://projects.blender.org/blender/blender/pulls/145775
Experiments have shown that the OptiX denoiser performs best when
operating on images that have their origin at the top-left corner,
while Blender renders with the origin at the bottom-left corner.
Simply flipping the image vertically before and after denoising is a
relatively trivial operation, so this patch introduces this as an
additional preprocessing and postprocessing step for denoising when the
OptiX denoiser is used. Additionally, this patch also removes an unused
helper function, now that OptiX 8.0 is the minimum.
Pull Request: https://projects.blender.org/blender/blender/pulls/145358
Improving readability of GHOST_ContextVK
- Introduces GHOST_Extensions for a common interface to device/instance
extensions
- Introduces GHOST_InstanceVK for instance based API (used to be part of
GHOST_DeviceVK)
During the refactor found out that the generic queue family would always
be 0, this is most of the time the case, but could lead to issues when
setting up more complex setups.
Pull Request: https://projects.blender.org/blender/blender/pulls/145721
There are several Driver versions which are constructing the wrong,
semantically, version which would force Blender to decline the Intel
device for oneAPI backend usage, based on this. Unfortunately,
the upstream fix is taking a long time to be finally delivered to
the distros and end-users, so it is better if Blender will detect
this wrong version string and parse it properly, allowing these
devices to be used - as the wrong driver version string is the only
issue here, besides this the driver functionality is fine.
Pull Request: https://projects.blender.org/blender/blender/pulls/145658
Up until now, when encountering an OpenXR error / exception in debug
mode, only the raw OpenXR error enum int value would be displayed, which
wasn't really descriptive nor useful. To remedy this, this commit adds
an `xrResultToString` call to additionally convert this value into its
corresponding enum string.
See PR for an example error print.
Pull Request: https://projects.blender.org/blender/blender/pulls/142582
The Embree scene contains a TBB task group that has a parent pointer to the
task group it was created in. In Cycles this task group was only temporarily
created on the stack, resulting in a dangling parent pointer.
The simple solution is to make the Cycles side task group persistent too.
Many thanks to Aras for figuring this one out, this was a very tricky one.
Pull Request: https://projects.blender.org/blender/blender/pulls/145515
Multiple previous changes made resource pools obsolete. Resource
pools were used to keep track of resources when the frame is rendered.
Multiple frames can be rendered at the same time and resources could
overlap.
This has been replaced (not this commit) to be part of the render graph
and when an submission has completed the resources are recycled.
Continuation of: https://projects.blender.org/blender/blender/pulls/145408
Pull Request: https://projects.blender.org/blender/blender/pulls/145511
Multiple previous changes made resource pools obsolete. Resource
pools were used to keep track of resources when the frame is rendered.
Multiple frames can be rendered at the same time and resources could
overlap.
This has been replaced (not this commit) to be part of the render graph
and when an submission has completed the resources are recycled.
Pull Request: https://projects.blender.org/blender/blender/pulls/145408
Meshes that require adaptive subdivision are currently tesselated one at
a time. Change this part of device update to be done in parallel.
To remove the possibility of the status message going backwards, a mutex
was required to keep that portion of the loop atomic.
Results for the loop in question: On one particular scene with over 300
meshes requiring tesselation, the update time drops from ~16 seconds to
~3 seconds. The attached synthetic test drops from ~9 seconds down to ~1
second.
Pull Request: https://projects.blender.org/blender/blender/pulls/145220
This PR disables HDR when driver doesn't support
`VK_EXT_swapchain_colorspace`. It could be that Windows detect support
(DirectX) and allow users to enable HDR. But that it isn't supported by
the Vulkan part of the driver.
In case of Windows there might be an issue with using SRGB non
linear on F16 swapchains where somewhere an additional transform is
being performed. After adding support for other color spaces we should
remove the support of this swapchain configuration.
Pull Request: https://projects.blender.org/blender/blender/pulls/145325
This pull request removes ROCm 5 code path and adds ROCm 7 runtime to
library search list.
ROCm 5 runtime is no longer shipped with AMD drivers, and ROCm 5 compiler
is no longer compatible with newer driver versions.
It also adds ROCm 7 runtime to the list of runtime libraries to look for.
Starting later this year, ROCm 7 runtime will be bundled with the driver
installer, and all future runtime fixes and improvements will target ROCm 7.
Once ROCm 7 runtime is rolled out, ROCm 6 compiler will continue to work
with it for about a year as a transitional measure. Beyond that, compatibility
is not guaranteed.
Pull Request: https://projects.blender.org/blender/blender/pulls/145279
The crash happens in specific cases when doing euclidean resection
in cases when first 3 3D estimates are correlated. While it is a bit
cryptic on user level, what it could mean is:
- Either there are overlapping markers
- Or the optical system is highly out of calibration.
Either of these cases is not the primary goal to support with camera
solvers in Libmv for the time being, so while this change avoids the
crash there is possibility to make algorithm more robustly handle the
tracking data and potentially give better result.
Such correlation via some intermediate steps leads to the algorithm
attempting to perform SVD decomposition of a matrix with NaN values
in it.
The exact reason of the crash is not super clear as it happens deep
in the JacobiSVD decomposition code in Eigen and caused by invalid
index somewhere. It might be a bug in Eigen which doesn't do range
checking, hoping that the math does not fail in cases like NaN and
inf.
Pull Request: https://projects.blender.org/blender/blender/pulls/145301
* Improve accuracy of warning when HDR display is not supported, taking into
account HDR mode on/off on Windows.
* When HDR mode is disabled on Windows, don't create a HDR swapchain. This
saves memory, and avoids a color difference on NVIDIA. That's because NVIDIA
is the only GPU we've tested that allows a HDR swapchain when HDR mode is
off, and we don't currently know the expected transforms for that case.
* Recreate swapchain when HDR mode on/off switch is detected.
* Update HDR info when window gains focus.
Note this means there is no wide gamut when Windows HDR is off, but it was
already not working. For that we may need to add support for something like
10bit VK_COLOR_SPACE_DISPLAY_P3_NONLINEAR_EXT, or whatever is commonly
available outside of HDR mode.
Pull Request: https://projects.blender.org/blender/blender/pulls/144959
The function `recursive_build()` can change the `shared_ptr` by making
it an internal node. If `root_` is modified, it could happen that when
the children are accessing `root_->bbox.min` that memory is already
freed.
Storing `bbox_min` separately seems to fix the issue. No error is seen
after running the tests repeatedly for 2000 times.
Pull Request: https://projects.blender.org/blender/blender/pulls/145239
The compiler in the CUDA 13 toolkit dropped support for Maxwell, Pascal and Volta architectures (sm_5X, sm_6X and sm_70), which affects both CUDA and OptiX kernel compilation for Cycles. This patch makes it so building CUDA kernel binaries for those architectures are skipped when CUDA 13 is used, but it will still build them if there is a CUDA 11 toolkit available (e.g. on buildbot), like how things are handled for other architectures. The OptiX PTX kernel is compiled with the minimum architecture available (compute_75 with CUDA 13, compute_50 with previous CUDA versions).
In addition, loading the PTX kernel after initializing OptiX version 9.0 would fail with a OPTIX_ERROR_INVALID_FUNCTION_USE, due to the use of "optixTrace" within direct callables (as part of the AO and bevel SVM nodes). Starting with OptiX 9.0 this is no longer allowed, rather one has to use "optixTraverse" in those cases. This patch thus changes the affected intersection routines to use "optixTraverse". As a side effect it also simplifies the `scene_intersect_shadow` function, which no longer invokes the closest hit program, and can just quickly return hit status. The minimum OptiX version Cycles requires is already 8.0, which supports "optixTraverse", so it can just be applied always.
Finally, this patch also adds the `--split-compile=0` argument to nvcc when available, which tells the compiler to internally split the module into pieces that can be processed in parallel on multiple threads (the `=0` notes to use as many threads as there are CPU cores), which can greatly improving compile times, while not making compromises on performance.
Pull Request: https://projects.blender.org/blender/blender/pulls/145130
This re-applies pull request #140671, but with a fix for #144713 where the
non-pointer part of IntegratorStateGPU was not initialized.
This PR is a more extensive follow on from #123551 (removal of AMD and Intel
GPU support).
All supported Apple GPUs have Metal 3 and tier 2 argument buffer support.
The invariant resource properties `gpuAddress` and `gpuResourceID` can be
written directly into GPU structs once at setup time rather than once per
dispatch. More background info can be found in this article:
https://developer.apple.com/documentation/metal/improving-cpu-performance-by-using-argument-buffers?language=objc
Code changes:
- All code relating to `MTLArgumentEncoder` is removed
- `KernelParamsMetal` updates are directly written into
`id<MTLBuffer> launch_params_buffer` which is used for the "static"
dispatch arguments
- Dynamic dispatch arguments are small enough to be encoded using the
`MTLComputeCommandEncoder.setBytes` function, eliminating the need for
cycling temporary arg buffers
Fix#144713
Co-authored-by: Brecht Van Lommel <brecht@noreply.localhost>
Pull Request: https://projects.blender.org/blender/blender/pulls/145175
Implementation of the design task #142969.
This adds the following:
- Exact GPU interpolation of curves of all types.
- Radius attribute support.
- Cyclic curve support.
- Resolution attribute support.
- New Cylinder hair shape type.

What changed:
- EEVEE doesn't compute random normals for strand hairs anymore. These are considered legacy now.
- EEVEE now have an internal shadow bias to avoid self shadowing on hair.
- Workbench Curves Strip display option is no longer flat and has better shading.
- Legacy Hair particle system evaluates radius at control points before applying additional subdivision. This now matches Cycles.
- Color Attribute Node without a name do not fetch the active color attribute anymore. This now matches Cycles.
Notes:
- This is not 100% matching the CPU implementation for interpolation (see the epsilons in the tests).
- Legacy Hair Particle points is now stored in local space after interpolation.
The new cylinder shape allows for more correct hair shading in workbench and better intersection in EEVEE.
| | Strand | Strip | Cylinder |
| ---- | --- | --- | --- |
| Main |  |  | N/A |
| PR |  |  |  |
| | Strand | Strip | Cylinder |
| ---- | --- | --- | --- |
| Main |  | | N/A |
| PR | ||  |
Cyclic Curve, Mixed curve type, and proper radius support:

Test file for attribute lookup: [test_attribute_lookup.blend](/attachments/1d54dd06-379b-4480-a1c5-96adc1953f77)
Follow Up Tasks:
- Correct full tube segments orientation based on tangent and normal attributes
- Correct V resolution property per object
- More attribute type support (currently only color)
TODO:
- [x] Attribute Loading Changes
- [x] Generic Attributes
- [x] Length Attribute
- [x] Intercept Attribute
- [x] Original Coordinate Attribute
- [x] Cyclic Curves
- [x] Legacy Hair Particle conversion
- [x] Attribute Loading
- [x] Additional Subdivision
- [x] Move some function to generic headers (VertBuf, OffsetIndices)
- [x] Fix default UV/Color attribute assignment
Pull Request: https://projects.blender.org/blender/blender/pulls/143180