In some multi-functions (such as a simple add function), the virtual method
call overhead to access array elements adds significant overhead. For these
simple functions it makes sense to generate optimized versions for different
types of virtual arrays. This is done by giving the compiler all the information
it needs to devirtualize virtual arrays.
In my benchmark this speeds up processing a lot of data with small function 2-3x.
This devirtualization should not be done for larger functions, because it increases
compile time and binary size, while providing a negilible performance benefit.
When a function is executed for many elements (e.g. per point) it is often the case
that some parameters are different for every element and other parameters are
the same (there are some more less common cases). To simplify writing such
functions one can use a "virtual array". This is a data structure that has a value
for every index, but might not be stored as an actual array internally. Instead, it
might be just a single value or is computed on the fly. There are various tradeoffs
involved when using this data structure which are mentioned in `BLI_virtual_array.hh`.
It is called "virtual", because it uses inheritance and virtual methods.
Furthermore, there is a new virtual vector array data structure, which is an array
of vectors. Both these types have corresponding generic variants, which can be used
when the data type is not known at compile time. This is typically the case when
building a somewhat generic execution system. The function system used these virtual
data structures before, but now they are more versatile.
I've done this refactor in preparation for the attribute processor and other features of
geometry nodes. I moved the typed virtual arrays to blenlib, so that they can be used
independent of the function system.
One open question for me is whether all the generic data structures (and `CPPType`)
should be moved to blenlib as well. They are well isolated and don't really contain
any business logic. That can be done later if necessary.
Some generic algorithms from the standard library like `std::any_of`
did not work with all container and iterator types. To improve the
situation, this patch adds various type members to containers
and iterators.
Custom iterators for Set, Map and IndexRange now have an iterator
category, which soe algorithms require. IndexRange could become
a random access iterator, but adding all the missing methods can
be done when it is necessary.
This is simply a convenience when using this type. More similar
constructors can be added in the future when they are useful.
Differential Revision: https://developer.blender.org/D10714
This adds the LineArt grease pencil modifier.
It takes objects or collections as input and generates various grease
pencil lines from these objects with the help of the active scene
camera. For example it can generate contour lines, intersection lines
and crease lines to name a few.
This is really useful as artists can then use 3D meshes to automatically
generate grease pencil lines for characters, enviroments or other
visualization purposes.
These lines can then be baked and edited as regular grease pencil lines.
Reviewed By: Sebastian Parborg, Antonio Vazquez, Matias Mendiola
Differential Revision: http://developer.blender.org/D8758
No functional changes.
This function replaces some of the logic in
`DRW_select_buffer_find_nearest_to_point` that traverses a buffer in a
spiral way to search for a closer pixel (not the closest).
Differential Revision: https://developer.blender.org/D10548
This commit refactors the point distribute node to skip realizing
the instances created by the point instance node or the collection
and object info nodes. Realizing instances is not necessary here
because it copies all the mesh data and and interpolates all
attributes from the instances when this operation does not
need to modify the input geometry at all.
In the tree leaves test file this patch improves the performance of
the node by about 14%. That's not very much, the gain is likely larger
for more complicated input instances with more attributes (especially
attributes on different domains, where interpolation would be necessary
to join all of the instances). Another possible performance improvement
would be to parallelize the code in this node where possible.
The point distribution code unfortunately gets quite a bit more
complicated because it has to handle the complexity of having many
inputs instead of just one.
Note that this commit changes the randomness of the distribution
in some cases, as if the seed input had changed.
Differential Revision: https://developer.blender.org/D10596
This reverts commit 7a34bd7c28.
Broke windows build. Can apparently fix with /Zc:preprocessor flag
for windows but need a Windows dev to make that fix.
The commit rB6f63417b500d that made exact boolean work on meshes
with holes (like Suzanne) unfortunately dramatically slowed things
down on other non-manifold meshes that don't have holes and didn't
need the per-triangle insideness test.
This adds a hole_tolerant parameter, false by default, that the user
can enable to get good results on non-manifold meshes with holes.
Using false for this parameter speeds up the time from 90 seconds
to 10 seconds on an example with 1.2M triangles.
The Exact boolean used in the cell fracture addon incorrectly
kept some outside faces: due to some raycasts going into open
eye socket then out of the head, leading to one ray direction
(out of 8) saying the face was inside the head. The current
code allowed 1 of 8 rays only as "inside" to accommodate the
case of a plane used in boolean to bisect. But this cell fracture
case needs more confidence of being inside. So changed the
test for intersection to require at least 3 of 8 rays to be inside.
Maybe the number of rays to indicate insideness should be exposed
as an option, to allow user tuning according to the degree of
"non-volumeness" of the arguments, but will try at least for now
to magically guess the right value of the rays-inside threshold.
Note: all of this only for the case where the arguments are not
all PWN (approx: manifold). The all-PWN case doesn't use raycast.
Instead of returning a raw pointer, `LinearAllocator.construct(...)` now returns
a `destruct_ptr`, which is similar to `unique_ptr`, but does not deallocate
the memory and only calls the destructor instead.
The main change is that large allocations are done separately now.
Also, buffers that small allocations are packed into, have a maximum
size now. Using larger buffers does not really provider performance
benefits, but increases wasted memory.
Triangulating ngons could fail with the method that was being
used: projecting along the dominant normal axis and then using CDT.
It could fail if the ngon has self crossings or might be so after
the described projection.
Switched to using projection along the normal itself, and also to
using polyfill which produces some kind of triangulation no matter
what in such circumstances. This will also likely be faster if
there are a lot of ngons in the meshes, since the exact arithmetic
CDT was being used before, and now float arithmetic is used.
This was caused by the window_translate_m4 not offsetting the winmat in the
right direction for perspective view. Thus leading to incorrect weights.
The workbench sample weight computation was also inverted.
This fix will change the sampling pattern for EEVEE too (it will just
mirror it in perspective view).
Using `FunctionRef` is better than using `std::function`, templates and c function
pointers in some cases. The trade offs are explained in more detail in code documentation.
The following are some of the main benefits of using `FunctionRef`:
* It is convenient to use with all kinds of callables.
* It is cheaper to construct, copy and (possibly) call compared to `std::function`.
* Functions taking a `FunctionRef` as parameter don't need to be declared
in header files (as is necessary when using templates usually).
Differential Revision: https://developer.blender.org/D10476
The Exact modifier code had been written to avoid using BMesh but
in the initial release the modifier still converted all Meshes to
BMeshes, and then after running the boolean code on the BMeshes,
converted the result back to a Mesh.
This change skips that. Most of the work here is in getting the
Custom Data layers right. The approach taken is to merge default
layers from all operand meshes into the final result, and then
use the original verts, edges, polys, and loops to copy or interpolate
the appropriate custom data layers from all operands into the result.
Previously, methods like `Span.drop_front` would crash when more
elements would be dropped than are available. While this is most
efficient, it is not very practical in some use cases. Also other languages
silently clamp the index, so one can easily write wrong code accidentally.
Now, `Span.drop_front` and similar methods will only crash when n
is negative. Too large values will be clamped down to their maximum
possible value. While this is slightly less efficient, I did not have a case
where this actually mattered yet. If it does matter in the future, we can
add a separate `*_unchecked` method.
This should not change the behavior of existing code.
* WITH_CPU_SSE was renamed to WITH_CPU_SIMD, and now covers both SSE and Neon.
* For macOS sse2neon.h is included as part of the precompiled libraries.
* For Linux it is enabled if the sse2neon.h header file is detected. However
this library does not have official releases and is not shipped with any Linux
distribution, so manual installation and configuration is required to get this
working.
Ref D8237, T78710