Meshes that require adaptive subdivision are currently tesselated one at
a time. Change this part of device update to be done in parallel.
To remove the possibility of the status message going backwards, a mutex
was required to keep that portion of the loop atomic.
Results for the loop in question: On one particular scene with over 300
meshes requiring tesselation, the update time drops from ~16 seconds to
~3 seconds. The attached synthetic test drops from ~9 seconds down to ~1
second.
Pull Request: https://projects.blender.org/blender/blender/pulls/145220
This pull request removes ROCm 5 code path and adds ROCm 7 runtime to
library search list.
ROCm 5 runtime is no longer shipped with AMD drivers, and ROCm 5 compiler
is no longer compatible with newer driver versions.
It also adds ROCm 7 runtime to the list of runtime libraries to look for.
Starting later this year, ROCm 7 runtime will be bundled with the driver
installer, and all future runtime fixes and improvements will target ROCm 7.
Once ROCm 7 runtime is rolled out, ROCm 6 compiler will continue to work
with it for about a year as a transitional measure. Beyond that, compatibility
is not guaranteed.
Pull Request: https://projects.blender.org/blender/blender/pulls/145279
The function `recursive_build()` can change the `shared_ptr` by making
it an internal node. If `root_` is modified, it could happen that when
the children are accessing `root_->bbox.min` that memory is already
freed.
Storing `bbox_min` separately seems to fix the issue. No error is seen
after running the tests repeatedly for 2000 times.
Pull Request: https://projects.blender.org/blender/blender/pulls/145239
The compiler in the CUDA 13 toolkit dropped support for Maxwell, Pascal and Volta architectures (sm_5X, sm_6X and sm_70), which affects both CUDA and OptiX kernel compilation for Cycles. This patch makes it so building CUDA kernel binaries for those architectures are skipped when CUDA 13 is used, but it will still build them if there is a CUDA 11 toolkit available (e.g. on buildbot), like how things are handled for other architectures. The OptiX PTX kernel is compiled with the minimum architecture available (compute_75 with CUDA 13, compute_50 with previous CUDA versions).
In addition, loading the PTX kernel after initializing OptiX version 9.0 would fail with a OPTIX_ERROR_INVALID_FUNCTION_USE, due to the use of "optixTrace" within direct callables (as part of the AO and bevel SVM nodes). Starting with OptiX 9.0 this is no longer allowed, rather one has to use "optixTraverse" in those cases. This patch thus changes the affected intersection routines to use "optixTraverse". As a side effect it also simplifies the `scene_intersect_shadow` function, which no longer invokes the closest hit program, and can just quickly return hit status. The minimum OptiX version Cycles requires is already 8.0, which supports "optixTraverse", so it can just be applied always.
Finally, this patch also adds the `--split-compile=0` argument to nvcc when available, which tells the compiler to internally split the module into pieces that can be processed in parallel on multiple threads (the `=0` notes to use as many threads as there are CPU cores), which can greatly improving compile times, while not making compromises on performance.
Pull Request: https://projects.blender.org/blender/blender/pulls/145130
This re-applies pull request #140671, but with a fix for #144713 where the
non-pointer part of IntegratorStateGPU was not initialized.
This PR is a more extensive follow on from #123551 (removal of AMD and Intel
GPU support).
All supported Apple GPUs have Metal 3 and tier 2 argument buffer support.
The invariant resource properties `gpuAddress` and `gpuResourceID` can be
written directly into GPU structs once at setup time rather than once per
dispatch. More background info can be found in this article:
https://developer.apple.com/documentation/metal/improving-cpu-performance-by-using-argument-buffers?language=objc
Code changes:
- All code relating to `MTLArgumentEncoder` is removed
- `KernelParamsMetal` updates are directly written into
`id<MTLBuffer> launch_params_buffer` which is used for the "static"
dispatch arguments
- Dynamic dispatch arguments are small enough to be encoded using the
`MTLComputeCommandEncoder.setBytes` function, eliminating the need for
cycling temporary arg buffers
Fix#144713
Co-authored-by: Brecht Van Lommel <brecht@noreply.localhost>
Pull Request: https://projects.blender.org/blender/blender/pulls/145175
Blender grid rendering interprets voxel transforms in such a way that the voxel
values are located at the center of a voxel. This is inconsistent with OpenVDB
where the values are located at the lower corners for the purpose or sampling
and related algorithms.
While it is possible to offset grids when communicating with the OpenVDB
library, this is also error-prone and does not add any major advantage.
Every time a grid is passed to OpenVDB we currently have to take care to
transform by half a voxel to ensure correct sampling weights are used that match
the density displayed by the viewport rendering.
This patch changes volume grid generation, conversion, and rendering code so
that grid transforms match the corner-located values in OpenVDB.
- The volume primitive cube node aligns the grid transform with the location of
the first value, which is now also the same as min/max bounds input of the
node.
- Mesh<->Grid conversion does no longer require offsetting grid transform and
mesh vertices respectively by 0.5 voxels.
- Texture space for viewport rendering is offset by half a voxel, so that it
covers the same area as before and voxel centers remain at the same texture
space locations.
Co-authored-by: Brecht Van Lommel <brecht@blender.org>
Pull Request: https://projects.blender.org/blender/blender/pulls/138449
Supports baking to object and tangent space.
Compatible with Cycles Vector Displacement node which has the
(tangent, normal, bitangent) convention.
The viewport situation is a bit confusing: seems that Eevee
does not handle vector displacement properly and rips all faces
apart. Cycles renders the displaced object correctly.
Not entirely happy with the UI, as displacement space does not
really belong to the Output, but so doesn't Low Resolution Mesh.
Perhaps the best would be to have a separate pass to revisit the
settings, and also make it more clear what the Low Resolution Mesh
actually does.
Pull Request: https://projects.blender.org/blender/blender/pulls/145014
Almost all settings were duplicated between BakeData and RenderData.
The only missing field was the bake type, which is stored as a custom
property in Cycles.
This change:
- Removes unused bake_samples and bake_biasdist.
- Migrates settings like bake_margin to BakeData.
- Switches multires baker to use bake_margin.
- Introduces bake type in the BakeData, the same way how it was
defined in RenderData::bake_mode.
Pull Request: https://projects.blender.org/blender/blender/pulls/144984
The main idea is to switch Bake from Multires from legacy DerivedMesh
to Subdiv. On the development side of things this change removes a lot
of code, also making it easier easier to rework CustomData and related
topics, without being pulled down by the DerivedMesh.
On the user level switch to Subdiv means:
- Much more closer handling of the multi-resolution data: the derived
mesh code was close, but not exactly the same when it comes to the
final look of mesh.
Other than less obvious cases (like old DerivedMesh approach doing
recursive subdivision instead of pushing subdivided vertices on the
limit surface) there are more obvious ones like difference in edge
creases, and non-supported vertex creases by the DerivedMesh.
- UV interpolation is done correctly now when baking to non-base level
(baking to multi-resolution level >= 1).
Previously in this case the old derived mesh interpolation was used
to interpolate face-varying data, which gives different results from
the OpenSubdiv interpolation.
- Ngon faces are properly supported now.
A possible remaining issue is the fact that getting normal from CCG
always uses smooth interpolation. Based on the code it always has been
the case, so while it is something to look into it might be considered
a separate topic to dig into.
`world_use_portal` is not needed anymore, now that we always add world
as object (b20b4218d5).
We now check if background light is enabled only in
`test_enabled_lights()`, depending on the sample settings.
Pull Request: https://projects.blender.org/blender/blender/pulls/144710
It works with the beta we are using to build Blender 4.5, but the official
release is a bit different. This fix was tested to work with OSL 1.14.7.
Thanks to Paul Zander for finding the OSL commit that lead to this.
Pull Request: https://projects.blender.org/blender/blender/pulls/144715
It has ~1.2x speed-up on CPU and ~1.5x speed-up on GPU (tested on Metal
M2 Ultra).
Individual samples are noisier, but equal time renders are mostly
better.
Note that volume emission renders differently than before.
Pull Request: https://projects.blender.org/blender/blender/pulls/144451
Currently, it was discovered that in the case of several different
Intel dGPUs being present in the system, the experimental L0 copy
optimization does not work correctly in the Intel Driver, which is
causing crashes in the driver and Blender application. So, to avoid
this situation and restore functionality on these platforms,
a workaround was added to disable this extension from being used if
such a configuration is detected. In the future, when this problem is
fully fixed in all Intel Drivers, this workaround can be removed from
the Blender source code to restore some performance that was lost on
configurations of several dGPUs because of this workaround.
Pull Request: https://projects.blender.org/blender/blender/pulls/144262
Guide the probability to scatter in or transmit through the volume.
Only applied for primary rays.
Co-authored-by: Brecht Van Lommel <brecht@blender.org>
The distance sampling is mostly based on weighted delta tracking from
[Monte Carlo Methods for Volumetric Light Transport Simulation]
(http://iliyan.com/publications/VolumeSTAR/VolumeSTAR_EG2018.pdf).
The recursive Monte Carlo estimation of the Radiative Transfer Equation is
\[\langle L \rangle=\frac{\bar T(x\rightarrow y)}{\bar p(x\rightarrow
y)}(L_e+\sigma_s L_s + \sigma_n L).\]
where \(\bar T(x\rightarrow y) = e^{-\bar\sigma\Vert x-y\Vert}\) is the
majorant transmittance between points \(x\) and \(y\), \(p(x\rightarrow
y) = \bar\sigma e^{-\bar\sigma\Vert x-y\Vert}\) is the probability of
sampling point \(y\) from point \(x\) following exponential
distribution.
At each recursive step, we randomly pick one of the two events
proportional to their weights:
* If \(\xi < \frac{\sigma_s}{\sigma_s+\vert\sigma_n\vert}\), we sample
scatter event and evaluate \(L_s\).
* Otherwise, no real collision happens and we continue the recursive
process.
The emission \(L_e\) is evaluated at each step.
This also removes some unused volume settings from the UI:
* "Max Steps" is removed, because the step size is automatically specified
by the volume octree. There is a hard-coded threshold `VOLUME_MAX_STEPS`
to prevent numerical issues.
* "Homogeneous" is automatically detected during density evaluation
An option "Unbiased" is added to the UI. When enabled, densities above
the majorant are clamped.
Due to numerical issues this was creating many wrong self-overlapping.
It was necessary for skipping empty regions, but not any more with the
volume Octree approach
Since we sample the same light for distance sampling and equiangular
sampling, the sample is invalid anyway, so just avoid sampling direct
light for distance sampling too.
This fits better with the way normal and displacement maps are typically
combined. Previously there was a mixing of displaced normal and undisplaced
tangent, which was broken behavior.
Additionally, to undisplaced_N and undisplaced_tangent attributes must now
always be used to get undisplaced coordinates. The regular N and tangent
attributes now always include displacement.
Ref #142022
Pull Request: https://projects.blender.org/blender/blender/pulls/143109
Modify shader update so we simplify the graphs first to determine the
kernel features, then load the kernels, and only then update data on the
device. This avoids errors due to mismatched kernels and shaders.
Pull Request: https://projects.blender.org/blender/blender/pulls/144238