The Embree scene contains a TBB task group that has a parent pointer to the
task group it was created in. In Cycles this task group was only temporarily
created on the stack, resulting in a dangling parent pointer.
The simple solution is to make the Cycles side task group persistent too.
Many thanks to Aras for figuring this one out, this was a very tricky one.
Pull Request: https://projects.blender.org/blender/blender/pulls/145515
Meshes that require adaptive subdivision are currently tesselated one at
a time. Change this part of device update to be done in parallel.
To remove the possibility of the status message going backwards, a mutex
was required to keep that portion of the loop atomic.
Results for the loop in question: On one particular scene with over 300
meshes requiring tesselation, the update time drops from ~16 seconds to
~3 seconds. The attached synthetic test drops from ~9 seconds down to ~1
second.
Pull Request: https://projects.blender.org/blender/blender/pulls/145220
This fits better with the way normal and displacement maps are typically
combined. Previously there was a mixing of displaced normal and undisplaced
tangent, which was broken behavior.
Additionally, to undisplaced_N and undisplaced_tangent attributes must now
always be used to get undisplaced coordinates. The regular N and tangent
attributes now always include displacement.
Ref #142022
Pull Request: https://projects.blender.org/blender/blender/pulls/143109
Add new "Linear 3D Curves" option in the Curves panel in the render
properties. This renders curves as linear segments rather than smooth
curves, for faster render time at the cost of accuracy.
On NVIDIA Blackwell GPUs, this can give a 6x speedup compared to smooth
curves, due to hardware acceleration. On NVIDIA Ada there is still
a 3x speedup, and CPU and other GPU backends will also render this
faster.
A difference with smooth curves is that these have end caps, as this
was simpler to implement and they are usually helpful anyway.
In the future this functionality will also be used to properly support
the CURVE_TYPE_POLY on the new curves object.
Pull Request: https://projects.blender.org/blender/blender/pulls/139735
Previously with adaptive subdivision this happened to work with the N
attribute, but that was not meant to be undisplaced. This adds a new
undisplaced_N attribute specifically for this purpose.
For backwards compatibility in Blender 4.5, this also keeps N undisplaced.
But that will be changed in 5.0.
Pull Request: https://projects.blender.org/blender/blender/pulls/142090
All GPU backends now support NanoVDB, using our own kernel side code
that is easily portable. This simplifies kernel and device code.
Volume bounds are now built from the NanoVDB grid instead of OpenVDB,
to avoid having to keep around the OpenVDB grid after loading.
While this reduces memory usage, it does have a performance impact,
particularly for the Cubic filter. That will be addressed by
another commit.
Pull Request: https://projects.blender.org/blender/blender/pulls/132908
Keep around the dummy BVH for lights, even if it serves no purpose for now.
Previously I assumed it was not needed, but there is some device specific
code that assumes it exists, and not much point trying to refactor that now
when in the future we actually want to create a BVH for lights.
Pull Request: https://projects.blender.org/blender/blender/pulls/139798
This makes it available in Cycles standalone, and the implementation
can be shared with Blender. This also makes it possible to compute
tangents after tessellation for adaptive subdivision.
There is a difference in UV map tangents when there are no UVs. They
are now generated from object space coordinates instead of auto
texture space coordinates. This is more efficient, and a corner case
that we don't have to keep compatible.
Co-authored-by: Brecht Van Lommel <brecht@blender.org>
Pull Request: https://projects.blender.org/blender/cycles/pulls/25
* Move dicing out of DiagSplit, caller now uses EdgeDice
* Merge, rename and reorder various EdgeDice functions
* Compute triangle indices for subpatches in advance
Pull Request: https://projects.blender.org/blender/blender/pulls/136411
* Add SubdAttributeInterpolation class for linear attribute interpolation.
* Dicing computes ptex UV and face ID for interpolation.
* Simplify mesh storage of subd primitive counts
* Remove kernel code for subd attribute interpolation
* Remove patch table packing and upload
The old optimization adds a fair amount of complexity to the kernel, affecting
performance even when not using the feature. It's also not that useful as it
does not work for UVs that needs special interpolation. With this simpler code
it should be easier to make it feature complete.
Pull Request: https://projects.blender.org/blender/blender/pulls/135681
There is a bug in Embree that makes BVH updates crash. Disabling multithreaded
BVH updates after the initial BVH build appears to work around it, at the cost
of some performance.
This will not affect performance of the initial BVH build, transforming objects
or editing a single mesh. It will only affect performance when multiple smaller
meshes are edited together, as those can no longer have their BVH updated in
parallel or benefit from parallellization over many primitives.
Pull Request: https://projects.blender.org/blender/blender/pulls/134747
The issue here is that motion_steps handling is a bit complex, and the
parallel synchronization of geometry does not play well with it.
The obvious result of this was a crash related to the main thread
checking attributes while the geometry sync was changing them, but
there was also another race condition that could result in ending up
with the wrong motion_steps.
Specific changes:
- Change place where `motion_steps` is set to avoid concurrent access
- Change the default `motion_steps` to zero, since they won't be
explicitly set if there's no motion now
- Don't skip `motion_steps` copy in `sync_X` since it's no longer set
in `sync_object` and we need to transfer the value in case it was set
to 3 by the velocity code since that's no longer the default
Pull Request: https://projects.blender.org/blender/blender/pulls/133669
Check was misc-const-correctness, combined with readability-isolate-declaration
as suggested by the docs.
Temporarily clang-format "QualifierAlignment: Left" was used to get consistency
with the prevailing order of keywords.
Pull Request: https://projects.blender.org/blender/blender/pulls/132361
* Use .empty() and .data()
* Use nullptr instead of 0
* No else after return
* Simple class member initialization
* Add override for virtual methods
* Include C++ instead of C headers
* Remove some unused includes
* Use default constructors
* Always use braces
* Consistent names in definition and declaration
* Change typedef to using
Pull Request: https://projects.blender.org/blender/blender/pulls/132361
Simple local optimization: not doing the rather expensive normals setups
(face and vertex) for Catmull-Clark subsivisions (which do not make use of
these normals and regenerate them internally).
Pull Request: https://projects.blender.org/blender/blender/pulls/132469
The slowdown was caused by the volume step calculation returning an
infinite value. This was caused by the calculation happening before
the object bounds are calculated via the code path which does some
early update for the displacement and hair transparency. The actual
value was never re-calculated after bounds are valid.
The solution is to only clear need-update after the final call of
the device_update_flags().
Pull Request: https://projects.blender.org/blender/blender/pulls/121042
Only Embree CPU BVH was built in the multi-device case. However, one
Embree GPU BVH is needed per GPU, so we now reuse the same logic as in
the other backends.
Pull Request: https://projects.blender.org/blender/blender/pulls/107992
HIP RT enables AMD hardware ray tracing on RDNA2 and above, and falls back to a
to shader implementation for older graphics cards. It offers an average 25%
sample rendering rate improvement in Cycles benchmarks, on a W6800 card.
The ray tracing feature functions are accessed through HIP RT SDK, available on
GPUOpen. HIP RT traversal functionality is pre-compiled in bitcode format and
shipped with the SDK.
This is not yet enabled as there are issues to be resolved, but landing the
code now makes testing and further changes easier.
Known limitations:
* Not working yet with current public AMD drivers.
* Visual artifact in motion blur.
* One of the buffers allocated for traversal has a static size. Allocating it
dynamically would reduce memory usage.
* This is for Windows only currently, no Linux support.
Co-authored-by: Brecht Van Lommel <brecht@blender.org>
Ref #105538
Scene.cpp and Geometry.cpp are large file it can be broken up into smaller easier to handle files. This change has been broken out from #105403 to make understanding the changes easier.
geometry.cpp is broken up into:
1. geometry.cpp
2. geometry_attributes.cpp
3. geometry_bvh.cpp
4. geometry_mesh.cpp
scene.h & scene.cpp is broken into:
1. scene.h
2. scene.cpp
3. devicescene.h
4. devicescene.cpp
Pull Request: https://projects.blender.org/blender/blender/pulls/107079
For example
```
OIIOOutputDriver::~OIIOOutputDriver()
{
}
```
becomes
```
OIIOOutputDriver::~OIIOOutputDriver() {}
```
Saves quite some vertical space, which is especially handy for
constructors.
Pull Request: https://projects.blender.org/blender/blender/pulls/105594
To improve mesh upload speeds and reduce the size of the scene data which allows larger scenes to be rendered.
The meshes in Cycles are currently stored as flattened meshes, where each triangle is stored as a set of 3 vertices. Unflattening writes out the vertices in a list according to the index buffer. This uses a lot of memory and for current hardware does not provide a noticeable benefit. This change unflattens the mesh by directly using the meshes vertex and index buffers directly and skips the unflattening. This change allows for larger scenes and also a reduction in the sizes of the meshes. Further it results in a decrease the amount of time it takes to upload the data to a GPU. This is especially important for when multiple GPUs are used in a single machine.
Pull Request #105173
The image manager used to handle OSL textures on the GPU by
default loads images after displacement is evaluated. This is a
problem when the displacement shader uses any textures, hence
why the geometry manager already makes the image manager
load any images used in the displacement shader graph early
(`GeometryManager::device_update_displacement_images`).
This only handled Cycles image nodes however, not OSL nodes, so
if any `texture` calls were made in OSL those would be missed and
therefore crash when accessed on the GPU. Unfortunately it is not
simple to determine which textures referenced by OSL are needed
for displacement, so the solution for now is to simply load all of
them early if true displacement is used.
This patch also fixes the result of the displacement shader not
being used properly in OptiX.
Maniphest Tasks: T104240
Differential Revision: https://developer.blender.org/D17162
The `MultiDevice` implementation of `get_cpu_osl_memory` returns a
nullptr when there is no CPU device in the mix. As such access to that
crashed in `update_osl_globals`. But that only updates maps that are not
currently used on the GPU anyway, so can just skip that when the CPU
is not used for rendering.
Maniphest Tasks: T104216
Materials now have an enum to set the emission sampling method, to be
either None, Auto, Front, Back or Front & Back. This replace the
previous "Multiple Importance Sample" option.
Auto is the new default, and uses a heuristic to estimate the emitted
light intensity to determine of the mesh should be considered as a light
for sampling. Shaders sometimes have a bit of emission but treating them
as a light source is not worth the memory/performance overhead.
The Front/Back settings are not important yet, but will help when a
light tree is added. In that case setting emission to Front only on
closed meshes can help ignore emission from inside the mesh interior that
does not contribute anything.
Includes contributions by Brecht Van Lommel and Alaska.
Ref T77889
The SVM attribute map is always generated and uses a simple
linear search to lookup by an opaque ID, so can reuse that for OSL
as well and simply use the attribute name hash as ID instead of
generating a unique value separately. This works for both object
and geometry attributes since the SVM attribute map already
stores both. Simplifies code somewhat and reduces memory
usage slightly.
This patch was split from D15902.
Differential Revision: https://developer.blender.org/D15918