By adding NSURLBookmarkResolutionWithoutMounting to options to keep MacOS from attempting to mount shortcuts.
This addresses that blender hangs and crashes if there's a shortcut to a network drive that cannot be mounted at that time.
Pull Request: https://projects.blender.org/blender/blender/pulls/121673
All official Blender platforms use the SIMD code path, with pow() approximations
for 2.4 and 1/2.4 powers. The non-SIMD code path would only be used on other
platforms like PowerPC etc. Make that fallback scalar code path use the same
math approximation for consistency. This is part of #121312.
This also makes srgb_to_linearrgb_v3_v3 and linearrgb_to_srgb_v3_v3 functions
non-inlined. They are 50-100 CPU instructions, and thus hardly good candidates
for forced inlining into each and every call site.
Also _bli_math_blend_sse now uses actual SSE4 blend instruction instead of doing
it in a roundabout way.
Pull Request: https://projects.blender.org/blender/blender/pulls/121368
This is because sse2neon.h might be used to emulate SSE intrinsics
on ARM64 architecture, and it uses some preprocessor which is not
available for C language when using MSVC.
The old-style math file math_matrix.c uses this header, so needed
to become C++. Simple rename did not work since there is a new math
utility math_matrix.cc exists. Following some existing convention
the math_matrix.c is renamed to math_matrix_c.cc. Eventually all the
code should switch to use C++ style math, and the C style removed,
so it seems reasonable to not mix old and new style of API in the
same file.
There should be no functional changes.
Pull Request: https://projects.blender.org/blender/blender/pulls/121335
This integrates the functionality for `parallel_for_weighted` from 9a3ceb79de
into `parallel_for`. This reduces the number of entry points to the threading
API and also makes it easier to build higher level threading primitives. For
example, `IndexMask.foreach_*` may use `parallel_for` if a `GrainSize` is
provided, but can't use `parallel_for_weighted` easily without duplicating a
fair amount of code.
The default behavior of `parallel_for` does not change. However, now one can
optionally pass in `TaskSizeHints` as the last parameter. This can be used to
specify the size of individual tasks relative to each other and relative to the
grain size. This helps scheduling more equally sized tasks which generally
improves performance because threads are used more effectively.
One generally does not construct `TaskSizeHints` manually, but calls either
`threading::individual_task_sizes` or `threading::accumulated_task_sizes`. Both
allow specifying individual task sizes, but the latter should be used when the
combined size of consecutive tasks can be computed in O(1) time. This allows
splitting up the work more efficiently. It can often be used in conjunction with
`OffsetIndices`.
Pull Request: https://projects.blender.org/blender/blender/pulls/121127
Support for building blender with clang on windows on x64 was added
years ago but given there are no active users support has crumbled a
bit.
This PR brings the build system back into working order but upstream
patches in openVDB are still required for a successful build see PR
#120317 for details.
Blender when build with clang the classroom scenes rendered on the cpu
with cycles is seeing a 5% reduction in render time on both an
AMD 7700x and an Intel 14900k.
Factors out utility function "does the path contain any hidden file/folder
components?" function out of innards of UI code (edit_file,
is_hidden_dot_filename) into a BLI function BLI_path_has_hidden_component
and then:
- Adds unit test coverage to it, which uncovered some inconsistencies
- Fix the behavior inconsistencies:
- A path component that is just a dot (.), was not considered hidden.
Unless it was the first folder component (now this is fixed).
- A path component that ended in a tilde (~) was considered hidden.
Unless it was the first folder component (now this is fixed).
- Speedup the function by not doing several recursive scans over the
string; instead all the logic is done in a single string scan.
Synthetic HasHiddenComponents_Performance test: 6.0s -> 1.1s for 50M calls
on Mac M1 Max.
More real world test (setup as in #120494): out of whole build_catalog_tree
time, the time taken by BLI_path_has_hidden_component drops from 37% down
to 28%
Pull Request: https://projects.blender.org/blender/blender/pulls/120541
Commit 8b9743eb40 already made Blender be compiled with SSE4.2 flags
on x64 architecture, which kicked in the SSE4 code paths in
BLI_math_interp functions.
Which made them faster, e.g. in VSE on Windows/Ryzen5950X, scaling
up an image to 4K resolution:
- Bilinear 5.8ms -> 5.3ms
- Cubic Mitchell 16.3ms -> 15.7ms
This change removes the now-unneeded SSE pre-SSE4 code paths for
_mm_floor_ps, _mm_min_epi32 and _mm_max_epi32 emulation.
Additionally, including BLI_simd.h on SSE4 platform now includes
the necessary SSE4 intrinsics header.
Pull Request: https://projects.blender.org/blender/blender/pulls/120583
The problem was that `XXH3_128bits` was called on `len` bytes
and not `HashSizeInBytes + len` as before 51f8bf53b2.
This lead to more compute context duplicates that one would expect.
I changed the code a little bit to make this mistake less likely in case
the hash function is ever changed to something else.
The `IndexMask` class already had a static function `from_union`.
This adds two new functions `from_difference` and `from_intersection`
as well as tests for each of them.
It also uses `from_intersection` in two grease pencil utility functions.
Pull Request: https://projects.blender.org/blender/blender/pulls/120419
Previously, md5 was used which is significantly slower. In almost all cases
this does not have a significant performance impact in practice. However,
it's possible to build geometry nodes setups that become a few percent
faster ( by combining lots of cheap node groups). Using xxhash instead of
md5 should never be slower.
Pull Request: https://projects.blender.org/blender/blender/pulls/120225
As in the Linux case, it seems like the atomic rename doesn't work on all file systems on Mac either.
We did test on Windows and it seems like there is a built in fallback, so we don't need to do this there.
Pull Request: https://projects.blender.org/blender/blender/pulls/120037
Previously the hulls edges were simply iterated over causing the
rotating calipers to step over points 4x as many times as is needed.
Avoid this by adding angle stepping logic that maps all angles to a
single quadrant, reducing the checks needed to advance the calipers
to each new angle. This gives ~1.4x speedup to AABB fitting logic.
Also add a test for octagon shapes to ensure axis aligned edges work
as expected.
Begin testing the edge edge between indices [0, 1] indices,
instead of [last, 0]. This only ever makes a difference as a tie breaker,
where [0, 1] is now prioritized.
This minor change simplifies further optimizations.
The Perlin noise algorithms suffer from precision issues when a coordinate
is greater than about 250000.
To fix this the Perlin noise texture is repeated every 100000 on each axis.
This causes discontinuities every 100000, however at such scales this
usually shouldn't be noticeable.
Pull Request: https://projects.blender.org/blender/blender/pulls/119884