- enum class StripEarlyOut instead of raw integer defines
- SeqRenderState default initializer instead of seq_render_state_init
- Vector<Sequence*> instead of manually sized arrays of pointers
- some const to several function arguments
Pull Request: https://projects.blender.org/blender/blender/pulls/117829
Part of overall "improve image filtering situation" (#116980), this PR addresses
two issues:
- Bilinear (default) image filtering makes half a source pixel wide transparent
border around the image. This is very noticeable when scaling images/movies up
in VSE. However, when there is no scaling up but you have slightly rotated
image, this creates a "somewhat nice" anti-aliasing around the edge.
- The other filtering kinds (e.g. cubic) do not have this behavior. So they do
not create unexpected transparency when scaling up (yay), however for slightly
rotated images the edge is "jagged" (oh no).
More detail and images in PR.
Pull Request: https://projects.blender.org/blender/blender/pulls/117717
There exist a bunch of "give me a (filtered) image pixel at this location"
functions, some with duplicated functionality, some with almost the same but
not quite, some that look similar but behave slightly differently, etc.
Some of them were in BLI, some were in ImBuf.
This commit tries to improve the situation by:
* Adding low level interpolation functions to `BLI_math_interp.hh`
- With documentation on their behavior,
- And with more unit tests.
* At `ImBuf` level, there are only convenience inline wrappers to the above BLI
functions (split off into a separate header `IMB_interp.hh`). However, since
these wrappers are inline, some things get a tiny bit faster as a side
effect. E.g. VSE image strip, scaling to 4K resolution (Windows/Ryzen5950X):
- Nearest filter: 2.33 -> 1.94ms
- Bilinear filter: 5.83 -> 5.69ms
- Subsampled3x3 filter: 28.6 -> 22.4ms
Details on the functions:
- All of them have `_byte` and `_fl` suffixes.
- They exist in 4-channel byte (uchar4) and float (float4), as well as
explicitly passed amount of channels for other float images.
- New functions in BLI `blender::math` namespace:
- `interpolate_nearest`
- `interpolate_bilinear`
- `interpolate_bilinear_wrap`. Note that unlike previous "wrap" function,
this one no longer requires the caller to do their own wrapping.
- `interpolate_cubic_bspline`. Previous similar function was called just
"bicubic" which could mean many different things.
- Same functions exist in `IMB_interp.hh`, they are just convenience that takes
ImBuf and uses data pointer, width, height from that.
Other bits:
- Renamed `mod_f_positive` to `floored_fmod` (better matches `safe_floored_modf`
and `floored_modulo` that exist elsewhere), made it branchless and added more
unit tests.
- `interpolate_bilinear_wrap_fl` no longer clamps result to 0..1 range. Instead,
moved the clamp to be outside of the call in `paint_image_proj.cc` and
`paint_utils.cc`. Though the need for clamping in there is also questionable.
Pull Request: https://projects.blender.org/blender/blender/pulls/117387
Before we always did a float conversion of the colors even if we
wouldn't use the result.
Because the float conversion does alpha premultiplication, we can
use some shortcuts to skip a lot of computations
In a edit with a lot of fully transparent pixels, I observed a reduction
in total render time from 1m 31s to 1m 17s.
Pull Request: https://projects.blender.org/blender/blender/pulls/117134
It's pretty simple, but threading it, and making it write out whole
pixel at a time (instead of one byte at a time) still makes it faster.
4K resolution, five Color strips blended over each other, playback on
Windows/VS2022, Ryzen 5950X:
- Playback 9.2FPS -> 11.5FPS
- do_solid_color for one effect, median time 7.7ms -> 3.8ms
Additionally, the solid color on byte output was not doing float->byte
color rounding & clamping properly, and on float output it was writing
255.0 into alpha instead of 1.0. So fix that too.
Pull Request: https://projects.blender.org/blender/blender/pulls/117058
Now that the code is in C++, quite some duplication between "byte" and
"float" effect code paths can be reduced (easier than it was in C times).
So I did that, removing about 400 lines of code.
In that process I accidentally made Gaussian Blur faster, since while
reducing the amount of code I noticed it was doing some things
sub-optimally (calculated kernel tables for each job, etc.). Applying
100x100 gaussian blur on 4K UHD resolution image strip on Ryzen 5950X
went 630ms -> 450ms.
Pull Request: https://projects.blender.org/blender/blender/pulls/116089
Wipe effect was completely single threaded. So multi-thread that.
Additionally, simplify math done in Clock wipe code; asin() + hypot()
+ some checks is just atan2() really.
Applying Wipe to 4K UHD sequencer output, on Windows Ryzen 5950X:
- Single/Double: 99.1 -> 9.3 ms (10.6x faster)
- Iris: 153.3 -> 12.3 ms (12.4x faster)
- Clock: 301.4 -> 14.5 ms (20.8x faster)
The same on Mac M1 Max:
- Single: 74.5 -> 13.4 ms (5.6x faster)
- Iris: 84.2 -> 14.3 ms (5.9x faster)
- Clock: 185.3 -> 18.8 ms (9.8x faster)
Pull Request: https://projects.blender.org/blender/blender/pulls/115837
Glow effect was doing the correct thing algorithmically (separable gaussian
blur), but it was 1) completely single-threaded, and 2) did operations in
several passes over the source images, instead of doing them in one go.
- Adds multi-threading to Glow effect.
- Combines some operations, e.g. instead of IMB_buffer_float_from_byte
followed by IMB_buffer_float_premultiply, do
IMB_colormanagement_transform_from_byte_threaded which achieves the same,
but more efficiently.
- Simplifies the code: removing separate loops around image boundaries is
both less code and slightly faster; use float4 vector type for more
compact code; use Array classes instead of manual memory allocation, etc.
- Removes IMB_buffer_float_unpremultiply and IMB_buffer_float_premultiply
since they are no longer used by anything whatsoever.
Applying Glow to 4K UHD sequencer output, on Windows Ryzen 5950X:
- Blur distance 4: 935ms -> 109ms (8.5x faster)
- Blur distance 20: 3526ms -> 336ms (10.5x faster)
Same on Mac M1 Max:
- Blur distance 4: 732ms -> 126ms (5.8x faster)
- Blur distance 20: 3047ms -> 528ms (5.7x faster)
Pull Request: https://projects.blender.org/blender/blender/pulls/115818
Gamma Cross code seems to be coming from year 2005 or earlier, with complex
table based machinery to approximate "raise to power" calculations. Which,
for Gamma Cross, have always been hardcoded to 2.0 "since forever". So
simplify all that, i.e. replace all the table lookup/interpolation things
with just `x*x` and `sqrt(x)`.
Applying Gamma Cross on 4K UHD resolution, Windows Ryzen 5950X machine:
36.2ms -> 8.1ms
Pull Request: https://projects.blender.org/blender/blender/pulls/115801
BLF code is not threadsafe, yet font loading gets called over and over
by text strips when the font file is missing, including e.g. from
depsgraph evaluation code when duplicating the strip for evaluation.
WARNING: This is a quick fix for deblocking the Blender studio, proper
fix (and report) still needs to be worked on.
Conversion from timeline frame to frame index was done by casting to
integer, which followed logic of ffmpeg seeking. However this is not
best approach in some cases - for example when FPS of scene and movie
differs by a very small amount. In this case the first frame could be
duplicated and all other frames will appear as offset by one frame.
In particular this may happen scene is set to 29.97 fps and movie is
encoded at 30000/1001 fps. A frame will still have to be duplicated, but
it should be frame where decimal of frame index crosses 0.5 to keep
audio and video in sync as best as possible.
Pull Request: https://projects.blender.org/blender/blender/pulls/113870
With such small proxy sizes (combined with a small blur radius), the
kernels `halfWidth` can get zero, which leads to a memory allocation of
same zero size and writing to that memory leads to overflow/crashes/can
only go downhill from there.
Now early out in such cases [which leads to
slightly different output -- well if the "buggy" output survives and
does not crash that is].
(alternatively we could just prevent the overflow and still let do
`RVBlurBitmap2_float` do stuff that it really shouldnt imo, see first version of the PR)
Pull Request: https://projects.blender.org/blender/blender/pulls/111660
Listing the "Blender Foundation" as copyright holder implied the Blender
Foundation holds copyright to files which may include work from many
developers.
While keeping copyright on headers makes sense for isolated libraries,
Blender's own code may be refactored or moved between files in a way
that makes the per file copyright holders less meaningful.
Copyright references to the "Blender Foundation" have been replaced with
"Blender Authors", with the exception of `./extern/` since these this
contains libraries which are more isolated, any changed to license
headers there can be handled on a case-by-case basis.
Some directories in `./intern/` have also been excluded:
- `./intern/cycles/` it's own `AUTHORS` file is planned.
- `./intern/opensubdiv/`.
An "AUTHORS" file has been added, using the chromium projects authors
file as a template.
Design task: #110784
Ref !110783.
Using ClangBuildAnalyzer on the whole Blender build, it was pointing
out that BLI_math.h is the heaviest "header hub" (i.e. non tiny file
that is included a lot).
However, there's very little (actually zero) source files in Blender
that need "all the math" (base, colors, vectors, matrices,
quaternions, intersection, interpolation, statistics, solvers and
time). A common use case is source files needing just vectors, or
just vectors & matrices, or just colors etc. Actually, 181 files
were including the whole math thing without needing it at all.
This change removes BLI_math.h completely, and instead in all the
places that need it, includes BLI_math_vector.h or BLI_math_color.h
and so on.
Change from that:
- BLI_math_color.h was included 1399 times -> now 408 (took 114.0sec
to parse -> now 36.3sec)
- BLI_simd.h 1403 -> 418 (109.7sec -> 34.9sec).
Full rebuild of Blender (Apple M1, Xcode, RelWithDebInfo) is not
affected much (342sec -> 334sec). Most of benefit would be when
someone's changing BLI_simd.h or BLI_math_color.h or similar files,
that now there's 3x fewer files result in a recompile.
Pull Request #110944