All the relevant code is C++ now, so we don't need to complicate things
with the trip through C anymore. We will still need some wrappers, since
opensubdiv is an optional dependency though. The goal is to make it
simpler to remove the unnecessary/costly abstraction levels between
Blender mesh data and the opensubdiv code.
This de-duplicate some passes in the raytracing
pipeline and make it more ready for adoption
of arbitrary closure evaluation. This last part
means the removal of some per closure type
options.
The put in common the tile classification step
that is now done only once for all 3 closure
type. Also add some speedup to the tile
compaction phase that is now only twice
faster.
The horizon-scan setup was also de-duplicated
and run only if needed, which can save up to
0.5ms is complex scenes.
However, this moves the max-roughness and and
resolution scaling to a common parameter.
This is to be able to support arbitrary closure
evaluation where multiple closure with conflicting
parameters could be evaluated in one tracing pass.
Pull Request: https://projects.blender.org/blender/blender/pulls/116009
"mesh" reads much better than "me" since "me" is a different word.
There's no reason to avoid using two more characters here. Replacing
all of these at once is better than encountering it repeatedly and
doing the same change bit by bit.
Due to changes in the build environment shader_builder wasn't able to
compile on macOs. This patch reverts several recent changes to CMake files.
* dbb2844ed9
* 94817f64b9
* 1b6cd937ff
The idea is that in the near future shader_builder will run on the buildbot as
part of any regular build to ensure that changes to the CMake doesn't break
shader_builder and we only detect it after a few days.
Pull Request: https://projects.blender.org/blender/blender/pulls/115929
This PR makes it so that locked materials as well as hidden materials will not have their edit points and edit lines visible.
Note: Previously in grease pencil, strokes with hidden materials would still display the edit lines. This behavior is now fixed in GPv3.
Pull Request: https://projects.blender.org/blender/blender/pulls/115740
This allows for parallel processing of refraction.
Also fix a limitation of using AO node in
refraction materials.
This is needed for the grouping of raytracing
passes.
This increases VRAM consumption a bit (8MB for fullHD
frame) but has no impact on performance.
This include a needed fix to the `draw::Texture::swap`.
### Later work
- Limit the memory overhead to the cases where it is needed.
Pull Request: https://projects.blender.org/blender/blender/pulls/115912
This adds a tile classification pass to the gbuffer.
This is then compact into streams of tiles for each
complexity level of lighting evaluation.
The benefit over a simpler approach of using a per
object stencil value is that we can have a per
tile granularity of the lighting complexity.
To avoid quad overshading, we use a prepass that
tags different stencil values for each complexity
level. This allows to still use a fullscreen quad
for the light evaluation pass and remove the
diagonal overshading cost.
This doesn't use compute shader at all to leverage
render pass merging and in-tile memory loads.
Using `atomicOr` for adding together the `eClosureBits`
revealed to be too slow. Using multiple non-atomic
writes to many data values is faster and not much
memory hungry.
### Performance
The whole tile scheduling process takes ~70µs for
a half covered 3800x790 framebuffer and doesn't
get much more slower than this.
Using simpler lighting shader helps reduce the cost
of the lighting pass by half in most common cases.
SSS materials stay the most costly.
Pull Request: https://projects.blender.org/blender/blender/pulls/115820
This PR implements the Lookdev (HDRI) Spheres overlay for EEVEE-Next. There are
also improvements for lookdev:
* Scene lighting (direct and indirect are) applied to the spheres.
* Shadow is applied to the spheres.
This is done by virtually placing the balls at the near clip plane of the camera/viewport.

Pull Request: https://projects.blender.org/blender/blender/pulls/115465
NDEBUG is part of the C standard and disables asserts. Only this will
now be used to decide if asserts are enabled.
DEBUG was a Blender specific define, that has now been removed.
_DEBUG is a Visual Studio define for builds in Debug configuration.
Blender defines this for all platforms. This is still used in a few
places in the draw code, and in external libraries Bullet and Mantaflow.
Pull Request: https://projects.blender.org/blender/blender/pulls/115774
Also use const arguments, move a null check from the callback to the
PBVH function, and reorganice the PBVH code to be in a consistent
place in the file and to simplify the logic.
Move the contents of `ANIM_bone_collections.h` into its C++
`ANIM_bone_collections.hh` sibling. Blender is C++ by now that we can do
without the C header.
No functional changes.
Convert shrinkwrap data arrays to use C++ arrays and BitVector,
use references in "EditMeshData" code, and store both structs
with `std::unique_ptr` instead of a raw allocation.
If I remember correctly, this was needed during development of
89e3ba4e25, but not anymore. Now the face normals are used
if faces are sharp. And if there is a mixture of sharp and smooth faces,
then normals_domain will be Corner anyway.
In a basic test with a 16 million face grid, this saved 30ms every draw
extraction. It also saves about 12 bytes per corner which was cached on
the mesh before.
There's a chance I'm missing something and this will require changes
elsewhere, but it's working well in my testing.
This is just a bit more ergonomic and works a bit better with our C++
math vector types. Also, during PBVH build, don't store the center of
each triangle/grid. That's redundant and can always be recalculated
from the bounds.
Rely on the depsgraph to detect scene updates,
using the `view_update` callback.
Remove `SceneHandle` and `scene_sync`.
Remove `reset_recalc_flag`.
Remove most `sampling.reset()` calls and their related logic,
and move the remaining ones to `Instance`.
Re-sync lights on `light_threshold` changes.
Pull Request: https://projects.blender.org/blender/blender/pulls/115758
The copied the material index and its smoothness to every grid,
resulting in 4 bytes of memory per base mesh face corner. That's
wasteful, since it's trivial to loop up the original data from the base
mesh attributes as necessary. This way we can also avoid the
dereference.
Automatic memory management and clearer ownership! Requires
removing `MEM_CXX_CLASS_ALLOC_FUNCS` from `MeshRuntime`,
but that's used very inconsistently anyway, and `MeshRuntime` isn't
that large.
Instead of allocating a separate bitmap per grid for the hide status, store
all the bits in a recently added C++ data structure that stores all bits in one
contiguous memory chunk. When nothing is hidden, nothing is allocated
(that saves 32 MB for a 16 million vertex multires sculpt). Intuitively it
could have better performance because of the cache benefits of
contiguous memory, but this is hard to measure. It also has a nicer
API than `BLI_bitmap`.
I discussed this with Sergey in person recently. Most of the changes are
just straightforward refactors. The part that isn't is a change to the "show/hide"
operator to structure it similarly to the mesh handling in 4e66769ec0.
Pull Request: https://projects.blender.org/blender/blender/pulls/115687
This layout is more flexible and polymorphic.
While the worst case is worse (4 + 3 layers),
the common case is more optimized (2 + 2 layers).
The average written closure data is also lower
since we can compact the data for special cases
which are quite frequent.
Some adjustment had to be made in the denoise an
tile classify shaders.
Pull Request: https://projects.blender.org/blender/blender/pulls/115541
Avoid reusing the custom data type enum with additional values. Instead
use std::variant and type names to properly distinguish between custom
and generic attribute requests. Use a Vector to hold the requests.
Also attempt to simplify the string key building process for requests
and groups of requests in batches. Previously for every PBVH node it
would rebuild the key 3 times, now it only does it once. It's hard to
measure, but that process did show up in profiles, so performance is
probably slightly improved when many nodes are handled at once.