Part of overall "improve image filtering situation" (#116980), this PR addresses
two issues:
- Bilinear (default) image filtering makes half a source pixel wide transparent
border around the image. This is very noticeable when scaling images/movies up
in VSE. However, when there is no scaling up but you have slightly rotated
image, this creates a "somewhat nice" anti-aliasing around the edge.
- The other filtering kinds (e.g. cubic) do not have this behavior. So they do
not create unexpected transparency when scaling up (yay), however for slightly
rotated images the edge is "jagged" (oh no).
More detail and images in PR.
Pull Request: https://projects.blender.org/blender/blender/pulls/117717
Part of overall "improve filtering situation" (#116980): replace Subsampled3x3
(added for blender 3.5 in f210842a72 et al.) strip scaling filter with a
general Box filter.
Subsampled3x3 is really a Box filter ("average pixel values over NxM region"),
hardcoded to 3x3 size. As such, it works pretty well when downscaling images by
3x on each axis. But starts to break down and introduce aliasing at other
scaling factors. Also when scaling up or scaling down by less than 3x, using
total of 9 samples is a bit of overkill and hurts performance.
So instead, calculate the amount of NxM samples needed by looking at scaling
factors on X/Y axes. Note: use at least 2 samples on each axis, so that when
rotation is present, the result edges will get some anti-aliasing, just like it
was happening in previous filter implementation.
Images in PR.
Pull Request: https://projects.blender.org/blender/blender/pulls/117584
Part of overall "improve filtering situation" (#116980) task:
Add "Cubic Mitchell" filtering option to VSE strips. This is a cubic (4x4)
filter that generally looks better than bilinear, while not blurring the image
as much as the Cubic BSpline filter that exists elsewhere within Blender. It is
also default in many other apps.
Rename the (very recently added) VSE Bicubic filter option to Cubic BSpline.
Images in the PR.
Pull Request: https://projects.blender.org/blender/blender/pulls/117517
There exist a bunch of "give me a (filtered) image pixel at this location"
functions, some with duplicated functionality, some with almost the same but
not quite, some that look similar but behave slightly differently, etc.
Some of them were in BLI, some were in ImBuf.
This commit tries to improve the situation by:
* Adding low level interpolation functions to `BLI_math_interp.hh`
- With documentation on their behavior,
- And with more unit tests.
* At `ImBuf` level, there are only convenience inline wrappers to the above BLI
functions (split off into a separate header `IMB_interp.hh`). However, since
these wrappers are inline, some things get a tiny bit faster as a side
effect. E.g. VSE image strip, scaling to 4K resolution (Windows/Ryzen5950X):
- Nearest filter: 2.33 -> 1.94ms
- Bilinear filter: 5.83 -> 5.69ms
- Subsampled3x3 filter: 28.6 -> 22.4ms
Details on the functions:
- All of them have `_byte` and `_fl` suffixes.
- They exist in 4-channel byte (uchar4) and float (float4), as well as
explicitly passed amount of channels for other float images.
- New functions in BLI `blender::math` namespace:
- `interpolate_nearest`
- `interpolate_bilinear`
- `interpolate_bilinear_wrap`. Note that unlike previous "wrap" function,
this one no longer requires the caller to do their own wrapping.
- `interpolate_cubic_bspline`. Previous similar function was called just
"bicubic" which could mean many different things.
- Same functions exist in `IMB_interp.hh`, they are just convenience that takes
ImBuf and uses data pointer, width, height from that.
Other bits:
- Renamed `mod_f_positive` to `floored_fmod` (better matches `safe_floored_modf`
and `floored_modulo` that exist elsewhere), made it branchless and added more
unit tests.
- `interpolate_bilinear_wrap_fl` no longer clamps result to 0..1 range. Instead,
moved the clamp to be outside of the call in `paint_image_proj.cc` and
`paint_utils.cc`. Though the need for clamping in there is also questionable.
Pull Request: https://projects.blender.org/blender/blender/pulls/117387
Rename: anim -> ImBufAnim
Rename: anim_index -> ImBufAnimIndex
There were cases where removing redundant "struct" qualifier caused
a warning since the name of the struct member was also anim.
Use uppercase type name to conform with other types names.
Ref !117394
The term `PIL` stands for "platform independent library." It exists since the `Initial Revision`
commit from 2002. Nowadays, we generally just use the `BLI` (blenlib) prefix for such code
and the `PIL` prefix feels more confusing then useful. Therefore, this patch renames the
`PIL` to `BLI`.
Pull Request: https://projects.blender.org/blender/blender/pulls/117325
Blender asserts when an Image node is directly connected to a Denoise
node in the GPU compositor. This is because the node reads the image to
host memory, but the cached image didn't specifiy host read usage. This
patch adds the host read usage flag to the IMB module GPU image creation
functions.
Make Subsampling 3x3 filter twice faster (on 4K UHD resolution,
Windows/VS2022/Ryzen5950X: 52.7ms -> 28.3ms), by reformulating how it works:
Conceptually Subsampling filter is a box filter: it sums up N source image
pixels, computes their average and outputs the result. Critical thing is,
that should be done in premultiplied space so that colors from fully or
mostly transparent regions do not "override" opaque colors.
Previously, when operating on byte images, the code achieved this by always
working on byte values, doing "progressively smaller" lerp into byte color
result, taking care of premultiplication and again storing the "straight"
alpha for each sample being processed. This meant that for each sample, there
are 3 divisions involved! This also led to some precision loss, since for all
9 samples all the intermediate results would only be stored at byte precision.
Reformulate that by simply accumulating the premultiplied color as a float.
This gets rid of all divisions, except the last step when said float needs to
be written back into a byte color.
The unit test results have a tiny difference, since now it is arguably better
(as per above, previously it was having some precision loss).
Pull Request: https://projects.blender.org/blender/blender/pulls/117125
There were some abstractions inside `IMB_transform` implementation
that were kinda getting in the way. I want to speed up Subsampled3x3
filtering by reformulating the math being done, but that needs to
sample byte images while storing intermediate result as a float -- but
the whole `Pixel`, `PixelPointer` et al. machinery was not quite prepared
to deal with that. Feels like those helper classes do not achieve much,
just make the code longer.
So remove them (`Pixel`, `PixelPointer`, `ChannelConverter` etc.) and
use simple functions. Now the code is more straightforward (IMHO) and
270 lines shorter (40% of the whole file!), and allows me to do the
follow-up optimization easier.
No functional changes.
Pull Request: https://projects.blender.org/blender/blender/pulls/117182
Double precision pixel coordinate interpolation was added in 3a65d2f591 to
fix an issue of "wobbly" resulting image at very high scale factors. But
the root cause of that was the fact that the scanline loop was repeatedly
adding the floating point step for each pixel, instead of doing multiplication
by pixel index (repeated floating point additions can "drift" due to
imprecision, whereas multiplications are much more accurate).
Change all that math back to use single precision floats. I checked the
original issue the commit was fixing, it is still fine. Also added a gtest
to cover this situation: `imbuf_transform, nearest_very_large_scale`
This makes `IMB_transform` a tiny bit faster, on Windows/VS2022/Ryzen5950X
scaling an image at 4K resolution:
- Bilinear: 5.92 -> 5.83 ms
- Subsampled3x3: 53.6 -> 52.7 ms
Pull Request: https://projects.blender.org/blender/blender/pulls/117160
Part of overall "improve filtering situation" (#116980) task:
* Add Bicubic filtering option to strip Transform "Filter" setting.
Previously this option only existed in Transform Effect "Interpolation"
setting.
- With this addition, it feels like the transform effect could
possibly be marked as legacy/deprecated, since the regular Transform
that is on all strips can do everything that Transform Effect did?
* Speed up bicubic filtering (used now in VSE, but also in CPU Compositor,
image paint, etc.) by slightly simplifying the code and using some SIMD.
Upscaling 96x54 image to 3840x2160 resolution, using Bicubic filtering:
- Windows (VS2022, Ryzen 5950X): 35.5ms -> 15.1ms
- Mac (clang 15, M1 Max): 29.6ms -> 24.4ms
* Add gtest coverage for bicubic functionality.
Pull Request: https://projects.blender.org/blender/blender/pulls/117100
Code inside `IMB_transform` (which is pretty much only used inside VSE
to do translation/rotation/scale of image or movie strips) was not
correctly doing mapping between pixel and texel spaces. This is similar
to e.g. GPU rasterization rules and has to do with whether some
coordinate refers to pixel/texel "corner" or "center" etc. It's a long
topic, but short summary would be:
- Coordinates refer to pixel/texel corner,
- Do sampling at pixel centers,
- Bilinear filter should use floor(x-0.5) and floor(x-0.5)+1 texels.
Also, there was a sign error introduced in Subsampling 3x3 filter, in
commit b3fd169259.
After making the PR, I found out that this seems to fix#90785, #112923
and possibly some others.
Long explanation with lots of images is in the PR.
Pull Request: https://projects.blender.org/blender/blender/pulls/116628
Now that OIIO has proper `valid_file` APIs for the formats we care
about, and which take MemReaders, we can remove the code added to TIFF,
PSD, and PNG as part of 5cc8fea7e9.
Additionally, this change eliminates the recent console spew on startup
where the TIFF loader is asked to load non-TIFF files (it is based on
the ordering of the filetype array)[1]. We now make a `valid_file` check
during open to address this.
[1] `: Not a TIFF or MDI file, bad magic number 12150 (0x2f76).`
Pull Request: https://projects.blender.org/blender/blender/pulls/116826
- Improve the look of them, so they feel less like from year
1998 (more details and images in the PR).
- Some of the scopes got slightly faster in the process, others
stayed the same performance (details below).
- Remove VSE Scopes related data from SpaceSeq DNA, move it into
runtime instead.
Along with the 4.1 libraries upgrade, we are bumping the clang-format
version from 8-12 to 17. This affects quite a few files.
If not already the case, you may consider pointing your IDE to the
clang-format binary bundled with the Blender precompiled libraries.
The new 2.5.x version of OIIO came with the OpenEXR Core library enabled
by default. Unfortunately it's not quite ready to be used so disable it
for now.
The current issue will prevent DWA-A/B from loading correctly. There may
also be a potential ZIPS compression related quirk that upstream is
still working through as well.
Pull Request: https://projects.blender.org/blender/blender/pulls/116591
After doing regular movie frame decoding, there's a "postprocess" step for
each incoming frame, that does deinterlacing if needed, then YUV->RGB
conversion, then vertical image flip and additional interlace filtering if
needed. While this postprocess step is not the "heavy" part of movie
playback, it still takes 2-3ms per each 1080p resolution input frame that
is being played.
This PR does two things:
- Similar to #116008, uses multi-threaded `sws_scale` to do YUV->RGB
conversion.
- Reintroduces "do vertical flip while converting to RGB", where possible.
That was removed in 2ed73fc97e due to issues on arm64 platform, and
theory that negative strides passed to sws_scale is not an officially
supported usage.
My take on the last point: negative strides to sws_scale is a fine and
supported usage, just ffmpeg had a bug specifically on arm64 where they
were accidentally not respected. They fixed that for ffmpeg 6.0, and
backported it to all versions back to 3.4.13 -- you would not backport
something to 10 releases unless that was an actual bug fix!
I have tested the glitch_480p.mp4 that was originally attached to the
bug report #94237 back then, and it works fine both on x64 (Windows)
and arm64 (Mac).
Timings, ffmpeg_postprocess cost for a single 1920x1080 resolution movie
strip inside VSE:
- Windows/VS2022 Ryzen 5950X: 3.04ms -> 1.18ms
- Mac/clang15 M1 Max: 1.10ms -> 0.71ms
Pull Request: https://projects.blender.org/blender/blender/pulls/116309
IMB_transform is used by Sequencer (and other places) to do image
translation/rotation/scale on the CPU. This PR speeds up parts of it,
particularly when bilinear filtering is used. No behavior changes are
expected.
- Don't use virtual function calls inside inner loop. The code was using
class hierarchies with virtual calls just to do equivalent of "outside
of image? ignore" and "wrap UV coordinates or not?" decisions. Make those
use non-virtual function based code.
- Simplify pixel sampling functions to only do the work as needed by
anything within Blender codebase. For example, bilinear sampling of uchar
images always uses 4 RGBA channels and never does "UV wrap" logic.
- Bilinear interpolation uchar: completely branchless SIMD code now.
- Bilinear interpolation float: 2x floor() calls instead of 4x floor() +
2x ceil(), and final sample blending is done with SIMD.
Sequencer at 4K UHD resolution, with two image strips that need a transform,
playback framerate:
- Windows Ryzen 5950X: 18.7fps -> 26.2fps (IMB_transform time per frame goes
26.3ms -> 11.2ms)
- Mac M1 Max: 27.3fps -> 31.4fps
At that point the IMB_transform is not the slowest part of where playback
takes time (but rather sequencer effect application etc.).
Note: the amount of _actual code_ got a bit smaller. But I've added 100 lines
of unit tests in BLI_math_interp_test.cc, the bilinear interpolation
functions were only tested very indirectly by CPU compositor template
image tests.
Pull Request: https://projects.blender.org/blender/blender/pulls/115653
Currently, the OpenEXR reader can't read single-layer XYZ images as a
combined image. Single-view images will only read the X channel, while
multi-view images will interpret the Z as a depth image and the Y and Z
channels will be treated as float passes.
This patch allows the reading of XYZ channels as combined images. For
single-view images, we just extend the RGB-like channel names we match
to contain XYZ. For multi-view images we only treat the Z channel as a
Depth one if no X and Y channels exists, since the Z in this case is
part of the XYZ image.
Pull Request: https://projects.blender.org/blender/blender/pulls/115290
Due to changes in the build environment shader_builder wasn't able to
compile on macOs. This patch reverts several recent changes to CMake files.
* dbb2844ed9
* 94817f64b9
* 1b6cd937ff
The idea is that in the near future shader_builder will run on the buildbot as
part of any regular build to ensure that changes to the CMake doesn't break
shader_builder and we only detect it after a few days.
Pull Request: https://projects.blender.org/blender/blender/pulls/115929
Glow effect was doing the correct thing algorithmically (separable gaussian
blur), but it was 1) completely single-threaded, and 2) did operations in
several passes over the source images, instead of doing them in one go.
- Adds multi-threading to Glow effect.
- Combines some operations, e.g. instead of IMB_buffer_float_from_byte
followed by IMB_buffer_float_premultiply, do
IMB_colormanagement_transform_from_byte_threaded which achieves the same,
but more efficiently.
- Simplifies the code: removing separate loops around image boundaries is
both less code and slightly faster; use float4 vector type for more
compact code; use Array classes instead of manual memory allocation, etc.
- Removes IMB_buffer_float_unpremultiply and IMB_buffer_float_premultiply
since they are no longer used by anything whatsoever.
Applying Glow to 4K UHD sequencer output, on Windows Ryzen 5950X:
- Blur distance 4: 935ms -> 109ms (8.5x faster)
- Blur distance 20: 3526ms -> 336ms (10.5x faster)
Same on Mac M1 Max:
- Blur distance 4: 732ms -> 126ms (5.8x faster)
- Blur distance 20: 3047ms -> 528ms (5.7x faster)
Pull Request: https://projects.blender.org/blender/blender/pulls/115818
NDEBUG is part of the C standard and disables asserts. Only this will
now be used to decide if asserts are enabled.
DEBUG was a Blender specific define, that has now been removed.
_DEBUG is a Visual Studio define for builds in Debug configuration.
Blender defines this for all platforms. This is still used in a few
places in the draw code, and in external libraries Bullet and Mantaflow.
Pull Request: https://projects.blender.org/blender/blender/pulls/115774