The only user was the Python API. Convert that to use the C++ API.
That simplifies things a bit even, since the encoding of "arrays of arrays"
is a fair amount simpler with the C++ data structures. The motivation
is to simplify the changes from #111061.
IMB_transform is used by Sequencer (and other places) to do image
translation/rotation/scale on the CPU. This PR speeds up parts of it,
particularly when bilinear filtering is used. No behavior changes are
expected.
- Don't use virtual function calls inside inner loop. The code was using
class hierarchies with virtual calls just to do equivalent of "outside
of image? ignore" and "wrap UV coordinates or not?" decisions. Make those
use non-virtual function based code.
- Simplify pixel sampling functions to only do the work as needed by
anything within Blender codebase. For example, bilinear sampling of uchar
images always uses 4 RGBA channels and never does "UV wrap" logic.
- Bilinear interpolation uchar: completely branchless SIMD code now.
- Bilinear interpolation float: 2x floor() calls instead of 4x floor() +
2x ceil(), and final sample blending is done with SIMD.
Sequencer at 4K UHD resolution, with two image strips that need a transform,
playback framerate:
- Windows Ryzen 5950X: 18.7fps -> 26.2fps (IMB_transform time per frame goes
26.3ms -> 11.2ms)
- Mac M1 Max: 27.3fps -> 31.4fps
At that point the IMB_transform is not the slowest part of where playback
takes time (but rather sequencer effect application etc.).
Note: the amount of _actual code_ got a bit smaller. But I've added 100 lines
of unit tests in BLI_math_interp_test.cc, the bilinear interpolation
functions were only tested very indirectly by CPU compositor template
image tests.
Pull Request: https://projects.blender.org/blender/blender/pulls/115653
The logic can be much simpler when curves are selected rather than points,
because then we just copy all of the points in each curve. Like some other
operators, implement both cases.
This PR makes it so that locked materials as well as hidden materials will not have their edit points and edit lines visible.
Note: Previously in grease pencil, strokes with hidden materials would still display the edit lines. This behavior is now fixed in GPv3.
Pull Request: https://projects.blender.org/blender/blender/pulls/115740
The preprocessor checks around `renameat2` usage seem to confuse Clang
15 on FreeBSD at least, split them in two.
Caused by 050d48edfc, report and patch by Shane Ambler, thanks!
NDEBUG is part of the C standard and disables asserts. Only this will
now be used to decide if asserts are enabled.
DEBUG was a Blender specific define, that has now been removed.
_DEBUG is a Visual Studio define for builds in Debug configuration.
Blender defines this for all platforms. This is still used in a few
places in the draw code, and in external libraries Bullet and Mantaflow.
Pull Request: https://projects.blender.org/blender/blender/pulls/115774
Suppress false positive Valgrind warnings which flooded the output.
- BLI_mempool alloc/free & iteration.
- Set alignment padding bytes at the end of MEM_* allocations
as "defined" since this causes many false positive warnings
in blend file writing and MEMFILE comparisons.
- Set MEM_* allocations as undefined when `--debug-memory`
is passed in to account for debug initialization.
- Initialize pad bytes in TextLine allocations.
This commit aim at making the behaviors of `BLI_rename` and
`BLI_rename_overwrite` more consistent and coherent across all
supported platforms.
* `BLI_rename` now only succeeds in case the target `to` path does not
exists (similar to Windows `rename` behavior).
* `BLI_rename_overwrite` allows to replace an existing target `to` file
or (empty) directory (similar to Unix `rename` behavior).
NOTE: In case the target is open by some process on the system, trying
to overwrite it will still fail on Windows, while it should succeed on
Unix-like systems.
The main change for Windows is the usage of `MoveFileExW`
instead of `_wrename`, which allows for 'native support' of file
overwrite (using the `MOVEFILE_REPLACE_EXISTING` flag). Directories
still need to be explicitly removed though.
The main change for *nix systems is the use of `renamex_np` (OSX) or
`renameat2` (most Linux systems) to allow forbidding renaming to an
already existing target in an 'atomic' way.
NOTE: While this commit aims at avoiding the TOC/TOU problem as
much as possible by using available system's primitives for most
common cases, there are some situations where race conditions
(filesystem changes between checks on FS state, and actual rename
operation) remain possible.
Pull Request: https://projects.blender.org/blender/blender/pulls/115096
Use blender::Set which is similar but offsers better type safety
and likely better performance as well. The only remaining user
was the mesh edit mode knife tool, and replacing that usage
with `Set` and `Map` was straightforward.
This utility was already duplicated in two places and planned to be used
more. While we should usually avoid creating arrays the size of the
indexed array (rather than the size of the mask), sometimes it does seem
to be the best option, and we're helped by the fact that most memory
stays unintialized for a small mask (allocating but not writing to memory
pages at all generally isn't too expensive).
Pull Request: https://projects.blender.org/blender/blender/pulls/115491
Workaround potential C-API `FILE` incompatibility by reading the
file data into memory, compiling & running it - matching existing logic
for text buffers text buffers. This replaces the in-lined stub-script
that re-opened the file from Python.
While the down-side of the stub-script was minor, it required some
non-obvious logic and had the disadvantage of requiring 2x scripts to
execute whenever a file was executed on WIN32.
Expose BLI_file_read_data_as_mem_from_handle as a public function
since it's useful to be able to read an existing FILE into memory.
This adds a new function, `compare_meshes`,
as a replacement for `BKE_mesh_cmp`.
The main benefits of the new version are the following:
- The code is written in c++, and makes use of the new attributes API.
- It adds an additional check, to see if the meshes only differ by
their indices. This is useful to verify correctness of new algorithmic
changes in mesh code, which might produce mesh elements in a different
order than the original algorithm. The tests will still fail, but the
error will show that the indices changed.
Some downsides:
- The code is more complex, due to having to be index-independent.
- The code is probably slower due to having to do comparisons "index-
independently". I have not tested this, as correctness was my priority
for this patch. A future update could look to improve the speed,
if that is desired.
- This is technically a breaking API change, since it changes the
returned values of `rna_Mesh_unit_test_compare`. I don't think that
there are many people (if any) using this, besides our own unit tests.
All tests that pass with `BKE_mesh_cmp` still pass with the new version.
**NOTE:**
Currently, mesh edge indices are allowed to be different in the
comparison, because `BKE_mesh_cmp` also allowed this. There are some
tests which would fail otherwise. These tests should be updated, and
then the corresponding code as well.
I wrote up a more detailed explanation of the algorithm here:
https://hackmd.io/@bo-JY945TOmvepQ1tAWy6w/SyuaFtay6
Pull Request: https://projects.blender.org/blender/blender/pulls/112794
Especially on windows, direct output to `cout` via `<<` is very expensive.
Instead, use fmtlib to do all formatting into a no-alloc `fmt::memory_buffer`,
and output that with one call to `cout`.
timeit utilities are not used much by default, but during development or
profiling one often uncomments macros like `DEBUG_TIME` that then enable
`SCOPED_TIMER` or `SCOPED_TIMER_AVERAGED`.
Having one `SCOPED_TIMER_AVERAGED` inside sequencer `draw_channels`, with
empty timeline and all default channels; the overhead in % of `draw_channels`
duration of said scoped timer before and after this change:
- Windows: 29% -> 5%
- Mac: 5.0% -> 4.4%
Pull Request: https://projects.blender.org/blender/blender/pulls/115233
This path merges the Musgrave and Noise Texture nodes into a single
combined Noise Texture node. The reasoning is that both nodes
intrinsically do the same thing, which is the layering of Perlin noise
derivatives to produce fractal noise. So the patch de-duplicates code
and unifies the use of fractal noise for the end use.
Since the Noise node had a Distortion input and a Color output, while
the Musgrave node did not, those are now available to the Musgrave types
as new functionalities.
The Dimension input of the Musgrave node is analogous to the Roughness
input of the Noise node, so both inputs were unified to follow the same
behavior of the Roughness input, which is arguable more intuitive to
control. Similarly, the Detail input was slightly different across both
nodes, since the Noise node evaluated one extra layer of noise. This was
also unified to follow the behavior of the Noise node.
The patch, coincidentally fixes an unreported bug causing repeated
output for certain noise types and another floating precision bug
#112180.
The versioning code implemented with this patch ensures backward
compatibility for both the Musgrave and Noise Texture nodes. When
opening older Blender files in Blender 4.1 the output of both nodes are
guaranteed to always be exactly identical to that of Blender files
created before the nodes were merged in all cases.
Forward compatibility with Blender 4.0 is implemented by #114236.
Forward compatibility with Blender 3.6 LTS is implemented by #115015.
Pull Request: #111187
When using `SCOPED_TIMER` or `SCOPED_TIMER_AVERAGED`
the display would switch from ns to ms
once the value is over 0.1 ms with a precision of 1.
So when the timer value is hovering in the range of 0.1 - 0.2 ms it is
not giving any useful information.
Fix this by adding another digit to the precision of ms.
Pull Request: https://projects.blender.org/blender/blender/pulls/114724
The problem is in the way of identifying "fast" intersections through bounds.
In the existing code, before testing the intersections (to identify
holes) the polys are sorted according to the bounds
(in the order x1 < x2 || y1 < y2).
Then a for loop is used on the order returned by sort.
Each time the bound of a polygon intersects with another, it is joined
and the bound is added.
The problem with this solution is that some bounds may not intersect
with the first, but could intersect one that is joined to the first,
which, as it is cleared, makes the intersection undetected.
The solution is to remove this code with `qsort` and create a
"target_map" that identifies a source polygon and a dest polygon.
Pull Request: https://projects.blender.org/blender/blender/pulls/114600
Cleanup talked about in the previous semi-related PR, #114501
- saacos, saasin, sasqrt have been 100% identical to saacosf,
saasinf, sasqrtf since 2012.
- For all the above, there exist more intuitively named safe_acosf,
safe_asinf, safe_sqrtf that do the same thing, so switch all code to those.
Pull Request: https://projects.blender.org/blender/blender/pulls/114593
Function Module Inclusive Time Exclusive Time
--------------------------------------------------------------------------
mesh_render_data_update_normals blender 297.51 0.00
315 -> 297
acos() usage in all places related to normal calculations shows up in the
profiler. Given that "angle between faces" is only additional heuristic
weight in there (the effect of it at all is very subtle), approximate but
faster version of acos() might be just fine. Especially since some other
parts of Blender (e.g. mikktspace) use approximate acos in a conceptually
the same part.
- Adds safe_acos_approx() to BLI_math_base.hh. Implementation the same
as already exists in Cycles; max error 0.00258 degrees. Between 2x and 4x
faster in my tests.
- Changes all normals related calculations to use the function above instead
of saacos.
Computing normals on a Stanford Lucy (14m verts) mesh:
- Mac (arm64, M1 Max): 247ms -> 229ms
- Win (x64, Ryzen 5950X): 276ms -> 250ms
All places that are about "normal calculation" were changed, including e.g.
Corrective Smooth modifier. Applying that one to the same 14m vertices mesh,
Mac M1 Max: 9.96s -> 9.76s
Tiny changes in several test output expectations w.r.t. normals are
observed, these were reviewed and updated expectations checked in svn.
Pull Request: https://projects.blender.org/blender/blender/pulls/114501