Asserts triggered e.g. by opening Gold production files (like
`pro/shots/220_storm/220_0020/220_0020-anim.blend`). Their root cause
are zero tangent vectors.
The asserts initially came from unormalized normals, but the root issue
is actually using zero vector as axis in calls to
`math::rotate_direction_around_axis`.
While rotating a zero direction vector is possible (though useless),
rotating around a zero axis vector makes no sense?
So this commit adds an assert that the given axis is non-zero in
`rotate_direction_around_axis`. And 'fixes' the found cases triggering
such assert by skipping rotation when the axis (tangent) is null.
Another related issue fixed by this commit is the iterative process in
calls to `calculate_next_normal`, which can accumulate small floating
point errors over time, leading to generating not normalized-enough
normals at some point.
Pull Request: https://projects.blender.org/blender/blender/pulls/122441
Unlike to `lookup_or_default` accessor methods of `Map` or attribute provider class,
`Span::get` is not so explicit and self described to be used with default value.
Other one issue was is that result is by value. But this is not the main reason to
delete this method. And although this can be fixed by reference, this is still not
such good to just have method to check index and return something.
Pull Request: https://projects.blender.org/blender/blender/pulls/122425
One of the properties of Perlin noise is that it always evaluates to 0.0
when not normalized (or 0.5 when normalized) when the input consists of
only whole integers in all vector components.
Blender's Perlin noise implementation uses single precision floats with
a machine epsilon of 1.19e-07 meaning that for numbers that are greater
than 1/(1.19e-07) = 8.40e6 there mantissa doesn't have any bits left to
store a rational part of the number, effectively meaning that any number
greater than 8.40e6 is a whole integer as far as Blender is concerned.
Therefore when evaluating Perlin noise for any coordinates greater than
that it always results in 0.0 (or 0.5 when normalized).
This fix works as follows: If the original input number is larger than
1.0e6 it is offset by 0.5 after it underwent modulo, which always outputs
numbers in a [0.0, 1.0e5) range leaving the mantissa room for a rational
part. This way the quantization error still persists however the outputs
are random again instead of a constant 0.0.
Pull Request: https://projects.blender.org/blender/blender/pulls/122112
This PR adds one more stat to `ScopedTimerAveraged` for quick timing checks:
the total number of samples.
Sample output:
```
Timer 'vert_hide_update': (Average: 45.93 ms, Min: 45.93 ms, Last: 45.93 ms, Samples: 1)
```
Pull Request: https://projects.blender.org/blender/blender/pulls/121638
This patches optimizes the Fog Glow Glare node to be about 25x faster
for 4K images. This is mainly achieved by utilizing the FFTW library and
multi-threading support code. Further improvements are still possible by
caching kernels, but the CPU compositor does not support caching yet.
The old Hartley transform was removed, so the node no longer works when
FFTW is disabled as a build time option, much like the OIDN node. A new
BLI library was introduced for FFTW, it includes some helper routines
relevant for FFTW as well as an initialization routine that sets up
multithreading using TBB as well as thread safety.
Build system support for threaded FFTW was also added, which defines the
relevant variables to detect threading support as well as add the
relevant libraries.
We do not currently have the threaded FFTW libs in our precompiled libs,
so the threading code is disabled until the libs lands in the coming
weeks. So currently, the code is only about 9x faster.
The only functional change is that the kernel is now odd sized, which
should produce more accurate results, but the final result is almost
identical and mostly undetectable.
The plan is to port this to the GPU as well similar to how we implement
OIDN until we have a GPU FFT implementation. GPU compositor can also do
caching, so it should be faster, being able to compute a 4K image in
under half a second.
Pull Request: https://projects.blender.org/blender/blender/pulls/121653
By adding NSURLBookmarkResolutionWithoutMounting to options to keep MacOS from attempting to mount shortcuts.
This addresses that blender hangs and crashes if there's a shortcut to a network drive that cannot be mounted at that time.
Pull Request: https://projects.blender.org/blender/blender/pulls/121673
Cleanup to avoid unnecessary copies of VArray. This
requires ref-qualifier overloads of dereference operator
of attribute reader and some move operators and constructor
overloads in the code.
Pull Request: https://projects.blender.org/blender/blender/pulls/118437
All official Blender platforms use the SIMD code path, with pow() approximations
for 2.4 and 1/2.4 powers. The non-SIMD code path would only be used on other
platforms like PowerPC etc. Make that fallback scalar code path use the same
math approximation for consistency. This is part of #121312.
This also makes srgb_to_linearrgb_v3_v3 and linearrgb_to_srgb_v3_v3 functions
non-inlined. They are 50-100 CPU instructions, and thus hardly good candidates
for forced inlining into each and every call site.
Also _bli_math_blend_sse now uses actual SSE4 blend instruction instead of doing
it in a roundabout way.
Pull Request: https://projects.blender.org/blender/blender/pulls/121368
This should get all of the tests to pass on Windows ARM64 platforms.
Sadly it needs disabling for hydra/USD stuff as currently it doesn't play nicely with the new preprocessor. @LazyDodo suggested a USD version update may fix this, which is something I can investigate in due course - right now, let's get daily builds up and working :)
Pull Request: https://projects.blender.org/blender/blender/pulls/121361
This is because sse2neon.h might be used to emulate SSE intrinsics
on ARM64 architecture, and it uses some preprocessor which is not
available for C language when using MSVC.
The old-style math file math_matrix.c uses this header, so needed
to become C++. Simple rename did not work since there is a new math
utility math_matrix.cc exists. Following some existing convention
the math_matrix.c is renamed to math_matrix_c.cc. Eventually all the
code should switch to use C++ style math, and the C style removed,
so it seems reasonable to not mix old and new style of API in the
same file.
There should be no functional changes.
Pull Request: https://projects.blender.org/blender/blender/pulls/121335
This integrates the functionality for `parallel_for_weighted` from 9a3ceb79de
into `parallel_for`. This reduces the number of entry points to the threading
API and also makes it easier to build higher level threading primitives. For
example, `IndexMask.foreach_*` may use `parallel_for` if a `GrainSize` is
provided, but can't use `parallel_for_weighted` easily without duplicating a
fair amount of code.
The default behavior of `parallel_for` does not change. However, now one can
optionally pass in `TaskSizeHints` as the last parameter. This can be used to
specify the size of individual tasks relative to each other and relative to the
grain size. This helps scheduling more equally sized tasks which generally
improves performance because threads are used more effectively.
One generally does not construct `TaskSizeHints` manually, but calls either
`threading::individual_task_sizes` or `threading::accumulated_task_sizes`. Both
allow specifying individual task sizes, but the latter should be used when the
combined size of consecutive tasks can be computed in O(1) time. This allows
splitting up the work more efficiently. It can often be used in conjunction with
`OffsetIndices`.
Pull Request: https://projects.blender.org/blender/blender/pulls/121127
This PR inverts the return values of `indexed_data_equal` to make the
function return as one would expect (i.e. if everything is equal then it
returns true, if any are not equal it returns false)
Pull Request: https://projects.blender.org/blender/blender/pulls/121056
Support for building blender with clang on windows on x64 was added
years ago but given there are no active users support has crumbled a
bit.
This PR brings the build system back into working order but upstream
patches in openVDB are still required for a successful build see PR
#120317 for details.
Blender when build with clang the classroom scenes rendered on the cpu
with cycles is seeing a 5% reduction in render time on both an
AMD 7700x and an Intel 14900k.
Factors out utility function "does the path contain any hidden file/folder
components?" function out of innards of UI code (edit_file,
is_hidden_dot_filename) into a BLI function BLI_path_has_hidden_component
and then:
- Adds unit test coverage to it, which uncovered some inconsistencies
- Fix the behavior inconsistencies:
- A path component that is just a dot (.), was not considered hidden.
Unless it was the first folder component (now this is fixed).
- A path component that ended in a tilde (~) was considered hidden.
Unless it was the first folder component (now this is fixed).
- Speedup the function by not doing several recursive scans over the
string; instead all the logic is done in a single string scan.
Synthetic HasHiddenComponents_Performance test: 6.0s -> 1.1s for 50M calls
on Mac M1 Max.
More real world test (setup as in #120494): out of whole build_catalog_tree
time, the time taken by BLI_path_has_hidden_component drops from 37% down
to 28%
Pull Request: https://projects.blender.org/blender/blender/pulls/120541