Replace (only two remaining) usages of C-style IMB_processor_apply_threaded
with just threading::parallel_for which is much easier to use in C++ without
intermediate structs.
IMB_display_buffer_acquire got faster as a result -- parallel for has lower
overhead compared to the task pool approach that the previous
implementation was using. While at it, noticed that
IMB_display_buffer_acquire was clearing just-allocated memory, immediately
before overwriting it. So that is now gone too.
IMB_display_buffer_acquire time during playback of 4K resolution float
content in VSE (Ryzen 5950X, Windows): 10.7ms -> 7.7ms
Pull Request: https://projects.blender.org/blender/blender/pulls/135269
IMB_alpha_under_color_[byte/float] functions are used when preparing
the rendered image for image/movie output with RGB channels (i.e. no
transparency). They were single threaded before, multi-thread them.
Time taken by them on 4K resolution image (mix of various transparency
values in source), on Ryzen 5950X/Windows:
- IMB_alpha_under_color_byte: 10.1ms -> 1.9ms
- IMB_alpha_under_color_float: 14.6ms -> 8.8ms (smaller speedup since
it becomes memory bandwidth limited)
Pull Request: https://projects.blender.org/blender/blender/pulls/135258
OpenEXR DWA compression in Blender is derived from a more user-friendly
quality slider which has an intuitive range 0 .. 100.
Initially the mapping was done so that the visually lossless JPEG
quality of 97 was mapped to the default DWA compression 45. A point was
made that we should make it so default quality is mapped to the default
compression, following the intent of DWA for rendering and compositing
the main target.
This change adjusts the mapping so that quality of 90 is mapped to DWA
compression 45.
This change relies on the library update to fully utilize the DWA
compression #135037.
This change leads to the difference in the way proxies of EXR images
are generated:
```
DWA compression Size (bytes)
Before the change 750 175,208,243
After the change 225 77,838,827
```
It is worth noting that the DWA compression seemed to be ignored in
the 4.4 branch before this change (this is what the original report is
about, a bit indirectly).
This is measured on the Fabrik Eingang footage converted to EXR. The
absolute value is ptobably not that important, it just shows the
reduction in size. This also leads to a lower quality of the proxy
image, but it is not worse than an actual JPEG proxy: the quality is
set to rather low 50 for the strip proxies.
Ref #134802
Pull Request: https://projects.blender.org/blender/blender/pulls/135103
Float->byte rendered image dithering uses triangle noise algorithm. Keep
the algorithm the same, just make some improvements and fix some issues:
1) The hash function for noise was using "trig" hash from "On generating
random numbers" (Rey 1998), but that is not a great quality hash, plus it
can produce very different results between CPUs/GPUs. Replace it with
"iqint3" (recommended by "Hash Functions for GPU Rendering", JCGT 2020),
which is same performance on GPU, faster on CPU, and much better quality.
This is the same hash as Cycles already uses elsewhere. Also it is purely
integer based, so exactly the same results on all platforms.
2) For the above point, replace `dither_random_value` to take integer
pixel coordinates and adjust calling code accordingly. Some previous
callers were (accidentally?) passing integer coordinates already. Other
places actually get a tiny bit simpler, since they now no longer need an
extra multiplication.
3) The CPU dithering path was wrongly introducing bias, i.e. making the
image lighter. The CPU path also needs dither noise to be in [-1..+1]
range (not [-0.5..+1.5]!) just like GPU path does, since the later
float->byte conversion already does rounding.
4) The CPU dithering path was using thread-slice-local Y coordinate,
meaning the dithering pattern was repeating vertically. The more CPU cores
you use, the worse the repetition.
5) Change the way that uniform noise is converted to triangle noise.
Previous implementation was based on one shadertoy from 2015, change it
to another shadertoy from 2020. The new one fixes issues with the old way,
and it just works on the CPU too, so now both CPU and GPU code paths are
exactly the same.
6) Cleanup: remove DitherContext, just a single float is enough
Performance and image comparisons in the PR.
Pull Request: https://projects.blender.org/blender/blender/pulls/135224
Looks like "divers" comes from ancient times, Dutch word meaning "misc".
But by now, everything in that file is about conversion between different
pixel data types.
Pull Request: https://projects.blender.org/blender/blender/pulls/135165
There's no point in having non-threaded image color space conversion functions.
So merge the threaded and non-threaded functions and clarify names while at it:
- IMB_colormanagement_transform & IMB_colormanagement_transform_threaded
-> IMB_colormanagement_transform_float
- IMB_colormanagement_transform_byte & IMB_colormanagement_transform_byte_threaded
-> IMB_colormanagement_transform_byte
- IMB_colormanagement_transform_from_byte & IMB_colormanagement_transform_from_byte_threaded
-> IMB_colormanagement_transform_byte_to_float
These places were doing single-threaded colorspace conversion previously, and
thus now are potentially faster:
- IMB_rect_from_float (used in many places)
- EXR image "save as render" saving (image_exr_from_scene_linear_to_output)
- Object baking (write_internal_bake_pixels, write_external_bake_pixels)
- General image saving, clipboard copy, movie preparation
(IMB_colormanagement_imbuf_for_write)
- Linear conversion when reading HDR images/movies
(colormanage_imbuf_make_linear)
- EXR multi-layer conversion (render_result_new_from_exr)
For one case I benchmarked, which is to render out a 2D stabilized 10 bit input
movie clip out of VSE, the total render time went from 49sec down to 44sec
(Ryzen 5950X), one of the single-threaded parts was the colorspace conversion
in the movieclip code.
Pull Request: https://projects.blender.org/blender/blender/pulls/135155
Speedup IMB_rotate_orthogonal (used for example in auto-rotating
videos that were shot sideways on a phone) by: 1) not copying previous
pixel values into new result, only for them to be immediately
overwritten by rotated pixels, and 2) using multi-threading.
Performing rotation of 1920x1080 resolution HDR (float) video frame
goes from 20ms down to 5ms (Ryzen 5950X, Windows)
Pull Request: https://projects.blender.org/blender/blender/pulls/135158
The File Output node crashes when saving a 16-bit vector image in an
RGBA image. That's because the OIIO writer assumes 4-channel buffer
while the buffer provided by the node is only 3-channel. To fix this,
the OIIO writer is extended to support all possible combination of
source and target channels.
Pull Request: https://projects.blender.org/blender/blender/pulls/134789
All 2D vectors related to image transform code were changed to float2.
Previously, it was decided, that 4x4 matrix should be used for 2D
affine transform, but this is changed to 3x3 now.
Texture painting code did rely on `IMB_transform` with 4x4 matrix.
To avoid large changes, I have added function
`BLI_rctf_transform_calc_m3_pivot_min`.
Main motivation is cleaner code - ease of use of c++ API, and avoiding
returning values by arguments.
Pull Request: https://projects.blender.org/blender/blender/pulls/133692
Since one user-defined conversion operator is allowed during implicit conversion,
and after this conversion here is a constructor which can accept result
of conversion, there was a backdoor for a vector types to up-cast their
dimensions via cast to pointer type of a component of a vector. Since it was
implicit and non-intentional it led to buffer overflows.
Pull Request: https://projects.blender.org/blender/blender/pulls/132927
* Ensure valid bit depth is set along with file type
* Guard against invalid inputs in stereo imbuf creation
* Remove some unused code
Thanks Yiming Wu for finding the cause.
Pull Request: https://projects.blender.org/blender/blender/pulls/133499
This works around ffmpeg bug https://trac.ffmpeg.org/ticket/10755
where for specific files that are:
- Ogg container format, with supported audio stream (e.g. Vorbis),
- But the video stream is not Ogg-compatible (e.g. Theora), but rather
it is an embedded "album art" (AV_DISPOSITION_ATTACHED_PIC) in
MJPEG, PNG or some other non-Ogg format.
Calling any sort of ffmpeg "seek" function on that video stream just
aborts from innards of ffmpeg.
So to work around this:
- Detect such files (ogg container, non-theora video, attached picture
disposition) and for those:
- Never seek within them, and only ever decode one frame. Return that
frame for any & all "give me a frame" requests.
- Additionally, calculating "how many frames this video has" for such
files also returns nonsense ("millions of frames") since their frame
rate is set to like 90000 or similar. So pretend they have a "sane"
frame rate. Do all this frame rate calculation just once when opening
the video, and use that result in all other places.
- Never build proxies for such video files, since e.g. "timecode"
for them does not make sense.
All of this could be removed once/if ffmpeg fixes their issue.
Pull Request: https://projects.blender.org/blender/blender/pulls/132920
Looks like this regressed in c1f5d8d023 (blender 3.1), basically
since then if there was no video, then no audio was ever written
either.
From what I can tell, the original change tried to fix the problem
that "file size autosplit" logic was after video, but before audio
data writing. So it moved audio writing to be before the split (good),
but also (not sure whether by accident) moved audio writing to
only happen if video is written.
Pull Request: https://projects.blender.org/blender/blender/pulls/132874
A work around ffmpeg issue that everyone (e.g. OBS) seems to be doing.
By default ffmpeg uses built-in VP8/VP9 decoders, however those
do not detect alpha channel (https://trac.ffmpeg.org/ticket/8344 -
the bug filed in 2019, currently still open in ffmpeg 7.1. There's
an older report from 2016 too, https://trac.ffmpeg.org/ticket/5792).
The trick for VP8/VP9 is to explicitly force use of libvpx decoder.
Only do this where alpha_mode=1 metadata is set. Note that in order
to work, the previously initialized format context must be closed
and a fresh one with explicitly requested codec must be created.
Pull Request: https://projects.blender.org/blender/blender/pulls/132795
All other YUV based codecs switch from default 4:2:0 YUV layout
(which is lossy) to a full resolution 4:4:4 YUV. However AV1 was not
doing that, probably by oversight.
Pull Request: https://projects.blender.org/blender/blender/pulls/132738
The rest of blender does handle multi-layer EXR images, using the
"combined" or RGBA/RGB layers when the visual result is needed. Make
VSE do the same.
While fixing this, I found several issues in other not well tested code
and had to fix them:
- IMB_buffer_float_from_float_threaded was wrongly using source channels
as destination channels, producing garbage result.
- IMB_scale_into_new was not assigning channels to destination image.
Pull Request: https://projects.blender.org/blender/blender/pulls/132790
When using clangd or running clang-tidy on headers there are
currently many errors. These are noisy in IDEs, make auto fixes
impossible, and break features like code completion, refactoring
and navigation.
This makes source/blender headers work by themselves, which is
generally the goal anyway. But #includes and forward declarations
were often incomplete.
* Add #includes and forward declarations
* Add IWYU pragma: export in a few places
* Remove some unused #includes (but there are many more)
* Tweak ShaderCreateInfo macros to work better with clangd
Some types of headers still have errors, these could be fixed or
worked around with more investigation. Mostly preprocessor
template headers like NOD_static_types.h.
Note that that disabling WITH_UNITY_BUILD is required for clangd to
work properly, otherwise compile_commands.json does not contain
the information for the relevant source files.
For more details see the developer docs:
https://developer.blender.org/docs/handbook/tooling/clangd/
Pull Request: https://projects.blender.org/blender/blender/pulls/132608
This renames the struct `Sequence` to `Strip`.
While the motivation for this partially comes from
the "Sequence Design" #131329, it seems like this
is a good refactor whether the design gets implemented
or not.
The `Sequence` represents what users see as strips in the
VSE. Many places in the code already refere to a `Sequence`
as "strip". It's the C-style "base class" of all strip types.
This also renames the python RNA type `bpy.types.Sequence`
to `bpy.types.Strip` which means that this technically breaks
the python API.
Pull Request: https://projects.blender.org/blender/blender/pulls/132179
This caused build errors on the docs builder, I can't seem to reproduce
locally, so revert for now and have another look at some point in the
future.
Sadly as these changes usually go, this took 5c515e26bb and
2f0fc7fc9f with it as well.
Pull Request: https://projects.blender.org/blender/blender/pulls/132559
Not entirely straightforward, some manual edits were done since when
this library was created, some of the work was already done.
- Remove any bf_imbuf_movie paths from INC
- Add a dependency though LIB when missing
- Add public dependency to bf_imbuf in bf_imbuf_movie since it uses the
imbuf headers in its public headers.
- Fix namespace not to have underscores
context: https://devtalk.blender.org/t/cmake-cleanup/30260
Pull Request: https://projects.blender.org/blender/blender/pulls/132407