This change switches Cycles to an opensource HIP-RT library which
implements hardware ray-tracing. This library is now used on
both Windows and Linux. While there should be no noticeable changes
on Windows, on Linux this adds support for hardware ray-tracing on
AMD GPUs.
The majority of the change is typical platform code to add new
library to the dependency builder, and a change in the way how
ahead-of-time (AoT) kernels are compiled. There are changes in
Cycles itself, but they are rather straightforward: some APIs
changed in the opensource version of the library.
There are a couple of extra files which are needed for this to
work: hiprt02003_6.1_amd.hipfb and oro_compiled_kernels.hipfb.
There are some assumptions in the HIP-RT library about how they
are available. Currently they follow the same rule as AoT
kernels for oneAPI:
- On Windows they are next to blender.exe
- On Linux they are in the lib/ folder
Performance comparison on Ubuntu 22.04.5:
```
GPU: AMD Radeon PRO W7800
Driver: amdgpu-install_6.1.60103-1_all.deb
main hip-rt
attic 0.1414s 0.0932s
barbershop_interior 0.1563s 0.1258s
bistro 0.2134s 0.1597s
bmw27 0.0119s 0.0099s
classroom 0.1006s 0.0803s
fishy_cat 0.0248s 0.0178s
junkshop 0.0916s 0.0713s
koro 0.0589s 0.0720s
monster 0.0435s 0.0385s
pabellon 0.0543s 0.0391s
sponza 0.0223s 0.0180s
spring 0.1026s 1.5145s
victor 0.1901s 0.1239s
wdas_cloud 0.1153s 0.1125s
```
Co-authored-by: Brecht Van Lommel <brecht@blender.org>
Co-authored-by: Ray Molenkamp <github@lazydodo.com>
Co-authored-by: Sergey Sharybin <sergey@blender.org>
Pull Request: https://projects.blender.org/blender/blender/pulls/121050
Disable dynamic SDL loading as well as disable SDL for release builds.
This was only used for audio output which can already use OpenAL
if there are back-ends not natively supported by Blender.
- Remove extern/sdlew/
- Remove the WITH_SDL_DYNLOAD build option.
- Remove `bpy.app.sdl.available`.
Ref !127554
This gets Windows ARM64 to compile with clang-cl, which gives up to 40% performance improvements in certain scenes rendered with cycles, compared to MSVC.
This is all tested using LLVM 18.1.8 and a VS2022 `vcvarsall` window.
Subsequent PRs with various lib version updates, etc to go in at a later point.
Pull Request: https://projects.blender.org/blender/blender/pulls/124182
This continues the cmake modernization effort and introduces support for
allowing our optional dependencies to integrate properly. TBB is added
here as it's proven troublesome to maintain correctly.
Currently the only Blender project which uses the TBB headers directly
is `blenlib`. However, all downstream projects which require blenlib as
their dependency, and wish to properly make use of its threading
facilities, needed to define various TBB items in their CMake files. Not
only is this unnecessary and arcane, but several projects didn't do this
and ended up not using threading as well as producing ODR violations
along the way[1].
This PR makes TBB a modern dependency and exposes it PUBLIC'ly from
`blenlib`. All downstream projects which depend on blenlib will now
receive everything they require from TBB automatically. This includes
the `WITH_TBB` define, the headers, and the library itself.
[1] blender/blender@05241f47f5
Pull Request: https://projects.blender.org/blender/blender/pulls/124916
Turns out closing AudioUnit handle could interfere with other software.
It is something Apple is investigating, to see if it is API not used
correctly, or whether there is something to be fixed in the Core Audio.
Until then disable the code which closes audio handles. It rolls back
to the situation when the computer might not be able to sleep properly,
but it is how the previous release was, and overall it is less annoying
than causing an interference.
Once the issue is looked into by Apple we will re-iterate over having
both issues (interference and power management) resolved.
Pull Request: https://projects.blender.org/blender/blender/pulls/124400
The issue was caused by an integer overflow in the `FFMPEGReader::seek`, which
lead to different results depending on an exact compiler version and platform.
The overflow happened in the equation which calculates `m_position`, in its
`packet.pts - st_time` part, which is effectively `int64_t - uint64_t` which is
expected to result in `uint64_t` due to the way how upcasting works in C++.
With the values `packet.pts=0` and `st_time=353600` from the test file the
calculation resulted in a negative value, which then wrapped around due to the
unsigned nature of the type, resulting in very big value on arm64. The value
was wrong, but it did not manifest itself as an error, since the seek code
believed it found the place.
However, when built on x86 platform, the result value was very large negative
value. It is possible that the type upcasting went different, due to the
multiplication with a double value later in the formula. And this large negative
value lead to the seeking code to read the entire file, trying to make it so
the `m_position` is not less than the requested position.
Fortunately, the `AVStream::start_time` is `int64_t`, which leads to a simple
fix which avoids signed/unsigned cast, making the seeking code to properly
calculate `m_position` on both arm64 and x64 platforms, regardless of the
compiler.
Pull Request: https://projects.blender.org/blender/blender/pulls/122149
Use COM smart pointer (`ComPtr`) to simplify memory management in WASAPI driver.
This reduces chances of calling `Release()` when a COM object has not been allocated.
Pull Request: https://projects.blender.org/blender/blender/pulls/121828
`xxHash` is a fast non-cryptographic hashing library. It significantly outperforms
md5 which we use in some places currently while also having great collision
resistance if not attacked explicitly.
The library is added to `extern` because that was the easiest way to do it and has
the least impact on others. I expect this library to become a required dependency
instead of an optional one. It's licence is `BSD 2-Clause` which seems to be the
first of its kind in Blender (there is `BSD 3-Clause` a couple of times).
For now, I used the library only for data deduplication when baking geometry nodes
where the same geometry is generated for each frame. The bake time in my test
goes down from >6s to <1s (note that this includes more than just the hashing time).
Pull Request: https://projects.blender.org/blender/blender/pulls/120139
After 91eb50ec2f, we would get random crashes/asserts from the pulseaudio library like:
`Assertion 'e->mainloop->n_enabled_defer_events > 0' failed at ../pulseaudio-17.0/src/pulse/mainloop.c:261, function mainloop_defer_enable(). Aborting.`
It seems like we would run into a race condition if we didn't guard the
pulseaudio flush command with the pulseaudio mutex.
This is probably because the pulseaudio thread would try to read the
buffer for a tiny bit even after pausing the playback.
Sadly the only way to reproduce this is to playback any scene (seem to happen more often if A/V sync is on) and spam play/pause.
Note that I could not reproduce this on every computer I tested this on.
But by expanding the main pulseaudio mutex lock, I can't seem to reproduce this anymore.
So I think that is the correct solution.
Pull Request: https://projects.blender.org/blender/blender/pulls/120072
This patch ports the GLSL SMAA library to the CPU compositor in order to
unify the anti-aliasing behavior between the CPU and GPU compositor.
Additionally, the SMAA texture generator was removed since it is now
unused.
Previously, we used an external C++ library for SMAA anti-aliasing,
which is itself a port of the GLSL SMAA library. However, the code
structure and results of the library were different, which made it quite
difficult to match results between CPU and GPU, hence the decision to
port the library ourselves.
The port was performed through a complete copy of the library to C++,
retaining the same function and variable names, even if they are
different from Blender's naming conversions. The necessary code changes
were done to make it work in C++, including manually doing swizzling
which changes the code structure a bit.
Even after porting the library, there were still major differences
between CPU and GPU, due to different arithmetic precision. To fix this
some of the bilinear samplers used in branches and selections were
carefully changed to use point samplers to avoid discontinuities around
branches, also resulting in a nice performance improvement. Some slight
differences still exist due to different bilinear interpolation, but
they shall be looked into later once we have a baseline implementation.
The new implementation is slower than the existing implementation, most
likely due to the liberal use of bilinear interpolation, since it is
quite cheap on GPUs and the code even does more work to use bilinear
interpolation to avoid multiple texture fetches, except this causes a
slow down on CPUs. Some of those were alleviated as mentioned in the
previous section, but we can probably look into optimizing it further.
Pull Request: https://projects.blender.org/blender/blender/pulls/119414
This file is generated by the attribution builder for every release
and it's up to the release team to check the log for extern and others to make sure this is up to date.
* Only works on machines with a Qualcomm Snapdragon 8cx Gen3 or above.
Older generation devices are not and will not be supported due to
some driver issues
* Requires VS2022 for building.
* Uses new MSVC preprocessor for sse2neon compatibility.
* SIMD is not enabled, waiting on conversion of blenlib to C++.
Ref #119126
Pull Request: https://projects.blender.org/blender/blender/pulls/117036
New ("fullframe") CPU compositor backend is being used now, and all the code
related to "tiled" CPU compositor is just never used anymore. The new backend
is faster, uses less memory, better matches GPU compositor, etc.
TL;DR: 20 thousand lines of code gone.
This commit:
- Removes various bits and pieces related to "tiled" compositor (execution
groups, one-pixel-at-a-time node processing, read/write buffer operations
related to node execution groups).
- "GPU" (OpenCL) execution device, that was only used by several nodes of
the tiled compositor.
- With that, remove CLEW external library too, since nothing within Blender
uses OpenCL directly anymore.
Pull Request: https://projects.blender.org/blender/blender/pulls/118819
Chrome would only drop into windows supporting v5 of the XDND spec,
resolve by bumping the version and no other changes.
From looking into the changes between v3 & v5 they mainly relate to
dragging files between windows & dragging onto the root window.
Since Blender does neither, bump the version to allow dragging
links from google-chrome.
Update from version 4.65 (2009) to latest 23.01 (2023).
LZMA is only used for pointcache "Heavy" compression option. With
updated SDK, testing in a 100k particle system point cache
(Windows/Ryzen5950X), average times to read or write a cached frame:
- Read a bit faster, 24.3 -> 23.2ms
- Write/bake about 2x faster, 237 -> 103ms
Pull Request: https://projects.blender.org/blender/blender/pulls/118433
Previously in Audaspace there was choice between linear resampler (okay
for preview, but not great for final mix), or "extremely high quality
preset" for rendering the final mix; with nothing in between. I have just
landed "medium" and "low" resampler quality levels in upstream Audaspace
(see https://github.com/audaspace/audaspace/pull/18 with details and
quality spectograms, also comparison with Audacity resampler).
This PR updates Audaspace to latest upstream, and switches to use the newly
added "medium" quality resampler. There's no audible difference (nor visible
one in spectrograms), as far as I can tell.
Timings, rendering out frames 1000-3000 of Sprite Fright Edit blender
studio data set:
- Windows (Ryzen 5950X, VS2022): 92 -> 73 sec
- Mac (M1 Max, clang 15): 70 -> 62 sec
i.e. using a faster audio resampler makes the _whole render process_ be
10-20% faster (however, this from VSE where it combines already pre-rendered
image strips).
Pull Request: https://projects.blender.org/blender/blender/pulls/116059
ROCm 6 brings some changes to the HIP API. This pull request is meant to be
backward and forward compatible.
That is Blender could be compiled with either ROCM 6 or 5 and run on either.
The main change is the hipMemoryType enum, which we check based on the
runtime version to use the correct enum values.
Without this, HIP will not work on Windows with upcoming 23.40 driver.
Pull Request: https://projects.blender.org/blender/blender/pulls/116713
* Different fix for Mantaflow linker warnings that works with new OpenVDB.
* Use new linker for arm64 as it no longer produces warnings with latest
Xcode. Still use the old one for x86_64 as some warnings remain.
* Fix wrong x86_64 build target in deps builder.
For the upcoming 4.1 libraries.
Ref #113157
Pull Request: https://projects.blender.org/blender/blender/pulls/116708
NDEBUG is part of the C standard and disables asserts. Only this will
now be used to decide if asserts are enabled.
DEBUG was a Blender specific define, that has now been removed.
_DEBUG is a Visual Studio define for builds in Debug configuration.
Blender defines this for all platforms. This is still used in a few
places in the draw code, and in external libraries Bullet and Mantaflow.
Pull Request: https://projects.blender.org/blender/blender/pulls/115774
The last good commit was 8474716abb.
After this commits from main were pushed to blender-v4.0-release. These are
being reverted.
Commits a4880576dc from to b26f176d1a that happend afterwards were meant for
4.0, and their contents is preserved.