All 2D vectors related to image transform code were changed to float2.
Previously, it was decided, that 4x4 matrix should be used for 2D
affine transform, but this is changed to 3x3 now.
Texture painting code did rely on `IMB_transform` with 4x4 matrix.
To avoid large changes, I have added function
`BLI_rctf_transform_calc_m3_pivot_min`.
Main motivation is cleaner code - ease of use of c++ API, and avoiding
returning values by arguments.
Pull Request: https://projects.blender.org/blender/blender/pulls/133692
When using clangd or running clang-tidy on headers there are
currently many errors. These are noisy in IDEs, make auto fixes
impossible, and break features like code completion, refactoring
and navigation.
This makes source/blender headers work by themselves, which is
generally the goal anyway. But #includes and forward declarations
were often incomplete.
* Add #includes and forward declarations
* Add IWYU pragma: export in a few places
* Remove some unused #includes (but there are many more)
* Tweak ShaderCreateInfo macros to work better with clangd
Some types of headers still have errors, these could be fixed or
worked around with more investigation. Mostly preprocessor
template headers like NOD_static_types.h.
Note that that disabling WITH_UNITY_BUILD is required for clangd to
work properly, otherwise compile_commands.json does not contain
the information for the relevant source files.
For more details see the developer docs:
https://developer.blender.org/docs/handbook/tooling/clangd/
Pull Request: https://projects.blender.org/blender/blender/pulls/132608
Implements #130836:
interpolate_*_wrapmode_fl take InterpWrapMode wrap_u and wrap_v arguments.
U and V coordinate axes can have different wrap modes: clamp/extend,
border/zero, wrap/repeat.
Note that this removes inconsistency where cubic interpolation was
returning zero for samples completely outside the image, but all other
functions were not, and the behavior was not matching the function
documentation either.
Use the new functions in the new compositor CPU backend.
Possible performance impact for other places (e.g. VSE): measured on
4K resolution, transformed (scaled and rotated) 4K EXR image:
- Nearest filter: no change,
- Bilinear filter: no change,
- Cubic BSpline filter: slight performance decrease, IMB_transform
19.5 -> 20.7 ms (Ryzen 5950X, VS2022). Feels acceptable.
Pull Request: https://projects.blender.org/blender/blender/pulls/130893
This patch adds clamped boundaries variants of the nearest interpolation
functions in the BLI module. The naming convention used by the bilinear
functions were followed.
Needed by #119414.
Pull Request: https://projects.blender.org/blender/blender/pulls/119732
Part of overall "improve image filtering situation" (#116980), this PR addresses
two issues:
- Bilinear (default) image filtering makes half a source pixel wide transparent
border around the image. This is very noticeable when scaling images/movies up
in VSE. However, when there is no scaling up but you have slightly rotated
image, this creates a "somewhat nice" anti-aliasing around the edge.
- The other filtering kinds (e.g. cubic) do not have this behavior. So they do
not create unexpected transparency when scaling up (yay), however for slightly
rotated images the edge is "jagged" (oh no).
More detail and images in PR.
Pull Request: https://projects.blender.org/blender/blender/pulls/117717
Part of overall "improve filtering situation" (#116980): replace Subsampled3x3
(added for blender 3.5 in f210842a72 et al.) strip scaling filter with a
general Box filter.
Subsampled3x3 is really a Box filter ("average pixel values over NxM region"),
hardcoded to 3x3 size. As such, it works pretty well when downscaling images by
3x on each axis. But starts to break down and introduce aliasing at other
scaling factors. Also when scaling up or scaling down by less than 3x, using
total of 9 samples is a bit of overkill and hurts performance.
So instead, calculate the amount of NxM samples needed by looking at scaling
factors on X/Y axes. Note: use at least 2 samples on each axis, so that when
rotation is present, the result edges will get some anti-aliasing, just like it
was happening in previous filter implementation.
Images in PR.
Pull Request: https://projects.blender.org/blender/blender/pulls/117584
Part of overall "improve filtering situation" (#116980) task:
Add "Cubic Mitchell" filtering option to VSE strips. This is a cubic (4x4)
filter that generally looks better than bilinear, while not blurring the image
as much as the Cubic BSpline filter that exists elsewhere within Blender. It is
also default in many other apps.
Rename the (very recently added) VSE Bicubic filter option to Cubic BSpline.
Images in the PR.
Pull Request: https://projects.blender.org/blender/blender/pulls/117517
There exist a bunch of "give me a (filtered) image pixel at this location"
functions, some with duplicated functionality, some with almost the same but
not quite, some that look similar but behave slightly differently, etc.
Some of them were in BLI, some were in ImBuf.
This commit tries to improve the situation by:
* Adding low level interpolation functions to `BLI_math_interp.hh`
- With documentation on their behavior,
- And with more unit tests.
* At `ImBuf` level, there are only convenience inline wrappers to the above BLI
functions (split off into a separate header `IMB_interp.hh`). However, since
these wrappers are inline, some things get a tiny bit faster as a side
effect. E.g. VSE image strip, scaling to 4K resolution (Windows/Ryzen5950X):
- Nearest filter: 2.33 -> 1.94ms
- Bilinear filter: 5.83 -> 5.69ms
- Subsampled3x3 filter: 28.6 -> 22.4ms
Details on the functions:
- All of them have `_byte` and `_fl` suffixes.
- They exist in 4-channel byte (uchar4) and float (float4), as well as
explicitly passed amount of channels for other float images.
- New functions in BLI `blender::math` namespace:
- `interpolate_nearest`
- `interpolate_bilinear`
- `interpolate_bilinear_wrap`. Note that unlike previous "wrap" function,
this one no longer requires the caller to do their own wrapping.
- `interpolate_cubic_bspline`. Previous similar function was called just
"bicubic" which could mean many different things.
- Same functions exist in `IMB_interp.hh`, they are just convenience that takes
ImBuf and uses data pointer, width, height from that.
Other bits:
- Renamed `mod_f_positive` to `floored_fmod` (better matches `safe_floored_modf`
and `floored_modulo` that exist elsewhere), made it branchless and added more
unit tests.
- `interpolate_bilinear_wrap_fl` no longer clamps result to 0..1 range. Instead,
moved the clamp to be outside of the call in `paint_image_proj.cc` and
`paint_utils.cc`. Though the need for clamping in there is also questionable.
Pull Request: https://projects.blender.org/blender/blender/pulls/117387
Make Subsampling 3x3 filter twice faster (on 4K UHD resolution,
Windows/VS2022/Ryzen5950X: 52.7ms -> 28.3ms), by reformulating how it works:
Conceptually Subsampling filter is a box filter: it sums up N source image
pixels, computes their average and outputs the result. Critical thing is,
that should be done in premultiplied space so that colors from fully or
mostly transparent regions do not "override" opaque colors.
Previously, when operating on byte images, the code achieved this by always
working on byte values, doing "progressively smaller" lerp into byte color
result, taking care of premultiplication and again storing the "straight"
alpha for each sample being processed. This meant that for each sample, there
are 3 divisions involved! This also led to some precision loss, since for all
9 samples all the intermediate results would only be stored at byte precision.
Reformulate that by simply accumulating the premultiplied color as a float.
This gets rid of all divisions, except the last step when said float needs to
be written back into a byte color.
The unit test results have a tiny difference, since now it is arguably better
(as per above, previously it was having some precision loss).
Pull Request: https://projects.blender.org/blender/blender/pulls/117125
There were some abstractions inside `IMB_transform` implementation
that were kinda getting in the way. I want to speed up Subsampled3x3
filtering by reformulating the math being done, but that needs to
sample byte images while storing intermediate result as a float -- but
the whole `Pixel`, `PixelPointer` et al. machinery was not quite prepared
to deal with that. Feels like those helper classes do not achieve much,
just make the code longer.
So remove them (`Pixel`, `PixelPointer`, `ChannelConverter` etc.) and
use simple functions. Now the code is more straightforward (IMHO) and
270 lines shorter (40% of the whole file!), and allows me to do the
follow-up optimization easier.
No functional changes.
Pull Request: https://projects.blender.org/blender/blender/pulls/117182
Double precision pixel coordinate interpolation was added in 3a65d2f591 to
fix an issue of "wobbly" resulting image at very high scale factors. But
the root cause of that was the fact that the scanline loop was repeatedly
adding the floating point step for each pixel, instead of doing multiplication
by pixel index (repeated floating point additions can "drift" due to
imprecision, whereas multiplications are much more accurate).
Change all that math back to use single precision floats. I checked the
original issue the commit was fixing, it is still fine. Also added a gtest
to cover this situation: `imbuf_transform, nearest_very_large_scale`
This makes `IMB_transform` a tiny bit faster, on Windows/VS2022/Ryzen5950X
scaling an image at 4K resolution:
- Bilinear: 5.92 -> 5.83 ms
- Subsampled3x3: 53.6 -> 52.7 ms
Pull Request: https://projects.blender.org/blender/blender/pulls/117160
Part of overall "improve filtering situation" (#116980) task:
* Add Bicubic filtering option to strip Transform "Filter" setting.
Previously this option only existed in Transform Effect "Interpolation"
setting.
- With this addition, it feels like the transform effect could
possibly be marked as legacy/deprecated, since the regular Transform
that is on all strips can do everything that Transform Effect did?
* Speed up bicubic filtering (used now in VSE, but also in CPU Compositor,
image paint, etc.) by slightly simplifying the code and using some SIMD.
Upscaling 96x54 image to 3840x2160 resolution, using Bicubic filtering:
- Windows (VS2022, Ryzen 5950X): 35.5ms -> 15.1ms
- Mac (clang 15, M1 Max): 29.6ms -> 24.4ms
* Add gtest coverage for bicubic functionality.
Pull Request: https://projects.blender.org/blender/blender/pulls/117100
Code inside `IMB_transform` (which is pretty much only used inside VSE
to do translation/rotation/scale of image or movie strips) was not
correctly doing mapping between pixel and texel spaces. This is similar
to e.g. GPU rasterization rules and has to do with whether some
coordinate refers to pixel/texel "corner" or "center" etc. It's a long
topic, but short summary would be:
- Coordinates refer to pixel/texel corner,
- Do sampling at pixel centers,
- Bilinear filter should use floor(x-0.5) and floor(x-0.5)+1 texels.
Also, there was a sign error introduced in Subsampling 3x3 filter, in
commit b3fd169259.
After making the PR, I found out that this seems to fix#90785, #112923
and possibly some others.
Long explanation with lots of images is in the PR.
Pull Request: https://projects.blender.org/blender/blender/pulls/116628
IMB_transform is used by Sequencer (and other places) to do image
translation/rotation/scale on the CPU. This PR speeds up parts of it,
particularly when bilinear filtering is used. No behavior changes are
expected.
- Don't use virtual function calls inside inner loop. The code was using
class hierarchies with virtual calls just to do equivalent of "outside
of image? ignore" and "wrap UV coordinates or not?" decisions. Make those
use non-virtual function based code.
- Simplify pixel sampling functions to only do the work as needed by
anything within Blender codebase. For example, bilinear sampling of uchar
images always uses 4 RGBA channels and never does "UV wrap" logic.
- Bilinear interpolation uchar: completely branchless SIMD code now.
- Bilinear interpolation float: 2x floor() calls instead of 4x floor() +
2x ceil(), and final sample blending is done with SIMD.
Sequencer at 4K UHD resolution, with two image strips that need a transform,
playback framerate:
- Windows Ryzen 5950X: 18.7fps -> 26.2fps (IMB_transform time per frame goes
26.3ms -> 11.2ms)
- Mac M1 Max: 27.3fps -> 31.4fps
At that point the IMB_transform is not the slowest part of where playback
takes time (but rather sequencer effect application etc.).
Note: the amount of _actual code_ got a bit smaller. But I've added 100 lines
of unit tests in BLI_math_interp_test.cc, the bilinear interpolation
functions were only tested very indirectly by CPU compositor template
image tests.
Pull Request: https://projects.blender.org/blender/blender/pulls/115653
Using ClangBuildAnalyzer on the whole Blender build, it was pointing
out that BLI_math.h is the heaviest "header hub" (i.e. non tiny file
that is included a lot).
However, there's very little (actually zero) source files in Blender
that need "all the math" (base, colors, vectors, matrices,
quaternions, intersection, interpolation, statistics, solvers and
time). A common use case is source files needing just vectors, or
just vectors & matrices, or just colors etc. Actually, 181 files
were including the whole math thing without needing it at all.
This change removes BLI_math.h completely, and instead in all the
places that need it, includes BLI_math_vector.h or BLI_math_color.h
and so on.
Change from that:
- BLI_math_color.h was included 1399 times -> now 408 (took 114.0sec
to parse -> now 36.3sec)
- BLI_simd.h 1403 -> 418 (109.7sec -> 34.9sec).
Full rebuild of Blender (Apple M1, Xcode, RelWithDebInfo) is not
affected much (342sec -> 334sec). Most of benefit would be when
someone's changing BLI_simd.h or BLI_math_color.h or similar files,
that now there's 3x fewer files result in a recompile.
Pull Request #110944
A lot of files were missing copyright field in the header and
the Blender Foundation contributed to them in a sense of bug
fixing and general maintenance.
This change makes it explicit that those files are at least
partially copyrighted by the Blender Foundation.
Note that this does not make it so the Blender Foundation is
the only holder of the copyright in those files, and developers
who do not have a signed contract with the foundation still
hold the copyright as well.
Another aspect of this change is using SPDX format for the
header. We already used it for the license specification,
and now we state it for the copyright as well, following the
FAQ:
https://reuse.software/faq/
The goal is to make it more explicit and centralized operation to
assign and steal buffer data, with proper ownership tracking.
The buffers and ownership flags are wrapped into their dedicated
structures now.
There should be no functional changes currently, it is a preparation
for allowing implicit sharing of the ImBuf buffers. Additionally, in
the future it is possible to more buffer-specific information (such
as color space) next to the buffer data itself. It is also possible
to clean up the allocation flags (IB_rect, ...) to give them more
clear naming and not have stored in the ImBuf->flags as they are only
needed for allocation.
The most dangerous part of this change is the change of byte buffer
data from `int*` to `uint8_t*`. In a lot of cases the byte buffer was
cast to `uchar*`, so those casts are now gone. But some code is
operating on `int*` so now there are casts in there. In practice this
should be fine, since we only support 64bit platforms, so allocations
are aligned. The real things to watch out for here is the fact that
allocation and offsetting from the byte buffer now need an explicit 4
channel multiplier.
Once everything is C++ it will be possible to simplify public
functions even further.
Pull Request: https://projects.blender.org/blender/blender/pulls/107609
Straightforward port. I took the oportunity to remove some C vector
functions (ex: copy_v2_v2).
This makes some changes to DRWView to accomodate the alignement
requirements of the float4x4 type.
Straightforward port. I took the oportunity to remove some C vector
functions (ex: `copy_v2_v2`).
This makes some changes to DRWView to accomodate the alignement
requirements of the float4x4 type.
Only the area where the source buffer will be drawn on the destination buffer
will be evaluated. This will improve performance in sequencer and image editors as
less calcuations needs to happen.
Micro improvement to store delta uv per subsample. This reduces
branching that might happen on the CPU, but also makes it possible
to add other sub-sampling filters as well.
No changes for the end-user.
Image Transform use linear or nearest sampling during editing and exporting.
This gets sampling is fine for images that aren't scaled. When sequencing
however you mostly want use some sort of scaling, that leads to poorer
quality.
This change will use sub-sampling to improve the quality. This is only
enabled when rendering. During editing the subsampling is disabled to
keep the user interface reacting as expected.
Another improvement is that image transform is stopped at the moment
it hadn't sampled at least 4 samples for a scan line. In that case we
expect that there are no further samples that would change to result.
In a future patch this could be replaced by a ray/bounding bo intersection
as that would remove some unneeded loops in both the single sampled and
sub sampled approach.
Issue was caused by imprecise math due to using float numbers.
Use double instead.
No negative performance impact was observed.
Reviewed By: jbakker
Differential Revision: https://developer.blender.org/D16517
This is the conventional way of dealing with unused arguments in C++,
since it works on all compilers.
Regex find and replace: `UNUSED\((\w+)\)` -> `/*$1*/`
Use a shorter/simpler license convention, stops the header taking so
much space.
Follow the SPDX license specification: https://spdx.org/licenses
- C/C++/objc/objc++
- Python
- Shell Scripts
- CMake, GNUmakefile
While most of the source tree has been included
- `./extern/` was left out.
- `./intern/cycles` & `./intern/atomic` are also excluded because they
use different header conventions.
doc/license/SPDX-license-identifiers.txt has been added to list SPDX all
used identifiers.
See P2788 for the script that automated these edits.
Reviewed By: brecht, mont29, sergey
Ref D14069