Fix #142912: VSE thumbnail cache processing slows down UI

VSE cache processing was handling thumbnail generation requests in
parallel, however often that was resulting in CPU core over-subscription
that negatively affects user interface smoothness. Process requests
serially; some parts of image/video loading are internally multi-threaded
already. This makes thumbnails load somewhat slower, but while that
is happening the rest of blender feels much smoother, and the CPU
fans don't spin up as much.

Additionally, make the video thumbnail loading code use smallest
available proxy file, if present. This was overlooked in the original
new thumb cache implementation (!126405).

SpriteFright timeline edit v135 (mostly JPG images, some movies),
loading thumbnails for the whole timeline on Ryzen 5950X (Windows):
- 4.2.1: 35 seconds, CPU usage ~35%, UI smooth
- Main: 13 seconds, CPU usage ~90%, UI choppy
- This PR: 19 seconds, CPU usage ~35%, UI smooth

Manually made timeline with 10 video files:
- 4.2.1: 12 seconds, CPU usage ~50%, UI smooth
- Main: 6 seconds, CPU usage ~100%, UI choppy
- This PR: 9 seconds (no proxies) / 4 seconds (with proxies), CPU usage
  ~50%, UI smooth

Candidate for 4.5.x backport

Pull Request: https://projects.blender.org/blender/blender/pulls/144120
This commit is contained in:
Aras Pranckevicius
2025-08-07 12:33:18 +02:00
committed by Aras Pranckevicius
parent d84d779661
commit 2f68e7e4b4

View File

@@ -319,20 +319,19 @@ void ThumbGenerationJob::run_fn(void *customdata, wmJobWorkerStatus *worker_stat
return a.frame_index < b.frame_index;
});
/* Process the requests in parallel. Split requests into approximately 4 groups:
* we don't want to go too wide since that would potentially mean that a single input
* movie gets assigned to more than one thread, and the thumbnail loading itself
* is somewhat-threaded already. */
int64_t grain_size = math::max<int64_t>(8, requests.size() / 4);
threading::parallel_for(requests.index_range(), grain_size, [&](IndexRange range) {
/* Note: we could process thumbnail cache requests somewhat in parallel,
* but let's not do that so that UI responsiveness is not affected much.
* Some of video/image loading code parts are multi-threaded internally already,
* and that does provide some parallelism. */
{
/* Often the same movie file is chopped into multiple strips next to each other.
* Since the requests are sorted by file path and frame index, we can reuse MovieReader
* objects between them for performance. */
MovieReader *cur_anim = nullptr;
std::string cur_anim_path;
int cur_stream = 0;
for (const int i : range) {
const ThumbnailCache::Request &request = requests[i];
IMB_Proxy_Size cur_proxy_size = IMB_PROXY_NONE;
for (const ThumbnailCache::Request &request : requests) {
if (worker_status->stop) {
break;
}
@@ -365,11 +364,23 @@ void ThumbGenerationJob::run_fn(void *customdata, wmJobWorkerStatus *worker_stat
cur_stream = request.stream_index;
cur_anim = MOV_open_file(
cur_anim_path.c_str(), IB_byte_data, cur_stream, true, nullptr);
cur_proxy_size = IMB_PROXY_NONE;
if (cur_anim != nullptr) {
/* Find the lowest proxy resolution available.
* `x & -x` leaves only the lowest bit set. */
int proxies_mask = MOV_get_existing_proxies(cur_anim);
cur_proxy_size = IMB_Proxy_Size(proxies_mask & -proxies_mask);
}
}
/* Decode the movie frame. */
if (cur_anim != nullptr) {
thumb = MOV_decode_frame(cur_anim, request.frame_index, IMB_TC_NONE, IMB_PROXY_NONE);
thumb = MOV_decode_frame(cur_anim, request.frame_index, IMB_TC_NONE, cur_proxy_size);
if (thumb == nullptr && cur_proxy_size != IMB_PROXY_NONE) {
/* Broken proxy file, switch to non-proxy. */
cur_proxy_size = IMB_PROXY_NONE;
thumb = MOV_decode_frame(cur_anim, request.frame_index, IMB_TC_NONE, cur_proxy_size);
}
if (thumb != nullptr) {
seq_imbuf_assign_spaces(job->scene_, thumb);
}
@@ -405,7 +416,7 @@ void ThumbGenerationJob::run_fn(void *customdata, wmJobWorkerStatus *worker_stat
MOV_close(cur_anim);
cur_anim = nullptr;
}
});
}
}
#ifdef DEBUG_PRINT_THUMB_JOB_TIMES