Briefly about this change:
- OpenColorIO C-API is removed.
- The information about color spaces in ImBuf module is removed.
It was stored in global ListBase in colormanagement.cc.
- Both OpenColorIO and fallback implementation supports GPU drawing.
- Fallback implementation supports white point, RGB curves, etc.
- Removed check for support of GPU drawing in IMB.
Historically it was implemented in a separate library with C-API, this
is because way back C++ code needed to stay in intern. This causes all
sort of overheads, and even calls that are strictly considered bad
level.
This change moves OpenColorIO integration into a module within imbuf,
next to movie, and next to IMB_colormanagement which is the main user
of it. This allows to avoid copy of color spaces, displays, views etc
in the ImBuf: they were used to help quickly querying information to
be shown on the interface. With this change it can be stored in the
same data structures as what is used by the OpenColorIO integration.
While it might not be fully avoiding duplication it is now less, and
there is no need in the user code to maintain the copies.
In a lot of cases this change also avoids allocations done per access
to the OpenColorIO. For example, it is not needed anymore to allocate
image descriptor in a heap.
The bigger user-visible change is that the fallback implementation now
supports GLSL drawing, with the whole list of supported features, such
as curve mapping and white point. This should help simplifying code
which relies on color space conversion on GPU: there is no need to
figure out fallback solution in such cases. The only case when drawing
will not work is when there is some actual bug, or driver issue, and
shader has failed to compile.
The change avoids having an opaque type for color space, and instead
uses forward declaration. It is a bit verbose on declaration, but helps
avoiding unsafe type-casts. There are ways to solve this in the future,
like having a header for forward declaration, or to flatten the name
space a bit.
There should be no user-level changes under normal operation.
When building without OpenColorIO or the configuration has a typo or
is missing a fuller set of color management tools is applies (such as the
white point correction).
Pull Request: https://projects.blender.org/blender/blender/pulls/138433
Add a new shader node to control volume coefficients (scattering,
absorption and emission) directly, making it easier to model existing
volumes with measured data.
Pull Request: https://projects.blender.org/blender/blender/pulls/136287
Part of #136993.
Share as much of the ShaderCompiler implementations as possible.
Remove the ShaderCompiler/ShaderCompilerGeneric split and make most of
its functions non virtual.
Move the `get_compiler` function from `Context` to `GPUBackend` and
creation/deletion to `GPUBackend::init/delete_resources`.
Add a `batch_cancel` function to `ShaderCompiler` (needed for the
GPUPass refactor).
As a nice extra, the multithreaded OpenGL compilation has become faster
too.
The barbershop materials + EEVEE static shaders have gone from 27s to
22s.
I have not observed any performance difference on Vulkan or Metal.
Pull Request: https://projects.blender.org/blender/blender/pulls/136676
Multiple threads would be setting the globals
`g_shader_builtin_srgb_transform` and
`g_shader_builtin_srgb_is_dirty`.
These are use for color management inside the builtin
shaders. But the render thread could modify these
values even if its shader have no use of these.
The fix is to move these globals to the `gpu::Context`
class. This way we remove the race condition.
Allows basic support for using `namespace X {}` and `X::symbol`
syntax.
Benefit:
- More sharing possible with host C++ code.
- Isolation of symbols when including shader files as C++.
Requirements:
- Nesting must be done using `namespace A::B{}` rather than
`namespace A{ namespace B {}}`, which is unsupported.
- No support for `using namespace`.
- Support of `using X` and `using X = Y` inside of function scope.
- Support of `using X` and `using X = Y` inside of namespace scope.
However, this is only to bring symbols from the same namespace
declared in another block (potentially inside another file).
- Only support namespace elision for symbols defined and used
inside of the same namespace scope.
Note that this is currently limited to blender GLSL files and
not for the shared headers. This is because we need to port a lot
of code to use namespaces before allowing this.
### Follow Up:
Nesting like `namespace A{ namespace B {}}` shouldn't be hard to
support and could be added if needed.
Rel #137446
Pull Request: https://projects.blender.org/blender/blender/pulls/137445
This patch adds a new `BLI_mutex.hh` header which adds `blender::Mutex` as alias
for either `tbb::mutex` or `std::mutex` depending on whether TBB is enabled.
Description copied from the patch:
```
/**
* blender::Mutex should be used as the default mutex in Blender. It implements a subset of the API
* of std::mutex but has overall better guaranteed properties. It can be used with RAII helpers
* like std::lock_guard. However, it is not compatible with e.g. std::condition_variable. So one
* still has to use std::mutex for that case.
*
* The mutex provided by TBB has these properties:
* - It's as fast as a spin-lock in the non-contended case, i.e. when no other thread is trying to
* lock the mutex at the same time.
* - In the contended case, it spins a couple of times but then blocks to avoid draining system
* resources by spinning for a long time.
* - It's only 1 byte large, compared to e.g. 40 bytes when using the std::mutex of GCC. This makes
* it more feasible to have many smaller mutexes which can improve scalability of algorithms
* compared to using fewer larger mutexes. Also it just reduces "memory slop" across Blender.
* - It is *not* a fair mutex, i.e. it's not guaranteed that a thread will ever be able to lock the
* mutex when there are always more than one threads that try to lock it. In the majority of
* cases, using a fair mutex just causes extra overhead without any benefit. std::mutex is not
* guaranteed to be fair either.
*/
```
The performance benchmark suggests that the impact is negilible in almost
all cases. The only benchmarks that show interesting behavior are the once
testing foreach zones in Geometry Nodes. These tests are explicitly testing
overhead, which I still have to reduce over time. So it's not unexpected that
changing the mutex has an impact there. What's interesting is that on macos the
performance improves a lot while on linux it gets worse. Since that overhead
should eventually be removed almost entirely, I don't really consider that
blocking.
Links:
* Documentation of different mutex flavors in TBB:
https://www.intel.com/content/www/us/en/docs/onetbb/developer-guide-api-reference/2021-12/mutex-flavors.html
* Older implementation of a similar mutex by me:
https://archive.blender.org/developer/differential/0016/0016711/index.html
* Interesting read regarding how a mutex can be this small:
https://webkit.org/blog/6161/locking-in-webkit/
Pull Request: https://projects.blender.org/blender/blender/pulls/138370
This avoid manual code duplication and readability issues.
This is implemented as simple copy pasting of the function
with the different argument count, calling the overload with
the next argument count for each overload.
A `#line` directive is added to each line make sure errors
still make sense and refer to the original line.
Example:
```cpp
int func(int a, int b = 0, const int2 c = int2(1, 0))
{
/* ... */
}
```
Gets expanded to:
```cpp
int func(int a, int b, const int c)
{
/* ... */
}
int func(int a, int b)
{
return func(a, b, int2(1, 0));
}
int func(int a)
{
return func(a, 0);
}
```
Rel #137446
Pull Request: https://projects.blender.org/blender/blender/pulls/138254
These functions are trivial and shouldn't add the cost of a call.
They appeared in profiles, which they shouldn't since they mostly
just return access to member variables. Inlining them reduces
the backend's overhead when sculpting.
Also reserve a Vector before repeated appending.
Pull Request: https://projects.blender.org/blender/blender/pulls/138349
Implementation of #137341
This adds support for using references to any variable in a local scope
inside the shader codebase.
Example:
```cpp
int a = 0;
int &b = a;
b++; /* a == 1 */
```
Using `auto` is supported for reference definition as the type is not
preserved by the copy paste procedure. Type checking is done by the
C++ shader compilation or after the copy paste procedure during shader
compilation. `auto` is still unsupported for other variable declarations.
Reference to opaque types (`image`, `sampler`) are supported since
they are never really assigned to a temp variable.
This implements all safety feature related to the implementation being
copy pasting the definition string. That is:
- No `--`, `++` operators.
- No function calls.
- Array subscript index needs to be int constants or constant variable.
The copy pasting does not replace member access:
`auto &a = b; a.a = c;` becomes `b.a = c;`
The copy pasting does not replace function calls:
`auto &a = b; a = a();` becomes `b = a();`
While limited, this already allows for nicer syntax (aliasing) for
accessing SSBOs and the potential overhead of a copy semantic:
```cpp
ViewMatrices matrices = drw_view_buf[0];
matrices.viewmat = float4x4(1);
drw_view_buf[0] = matrices;
```
Can now be written as;
```cpp
ViewMatrices &matrices = drw_view_buf[0];
matrices.viewmat = float4x4(1);
```
Which expands to;
```cpp
drw_view_buf[0].viewmat = float4x4(1);
```
Note that the reference semantic is not carried through function call
because arguments are transformed to `inout` in GLSL. `inout` has
copy semantic but it is often implemented as reference by some
implementations.
Another important note is that this copy-pasting doesn't check if a
symbol is a variable. It can match a typename. But given that our
typenames have different capitalizations style this is unlikely to be
an issue. If that issue arise, we can add a check for it.
Rel #137446
Pull Request: https://projects.blender.org/blender/blender/pulls/138412
Allows basic usage of templated functions.
There is no support for templated struct.
Benefit:
- More readable than macros in shader sources.
- Compatible with C++ tools.
- More sharing possible with host C++ code.
Requirements/Limitations:
- No default arguments to template parameters.
- Must use explicit instantiation for all variant needed.
- Explicit instantiation needs to **not** use argument deduction.
- Calls to template needs to have all template argument explicit
or all implicit.
- Template overload is not supported (redefining the same template
with different template argument or function argument types).
Currently implemented as Macros inside the build-time pre-pocessor,
but that could change to copy-paste to allow better error reporting.
However, the Macros keep the shader code reduced in the final binary
and allow different file to declare different instantiation.
The implementation is done by declaring overloads for each explicit
instantiation.
If a template has arguments not present in function
arguments, then all arguments **values** are appended to the
function name. The explicit template callsite is then modified to use
`TEMPLATE_GLUE` which will call the correct function. This is
why template argument deduction is not supported in this case.
Rel #137446
Pull Request: https://projects.blender.org/blender/blender/pulls/137441
Guarding expensive regex computation by much
cheaper checks to reduce compilation time.
Compiling `time ninja -j 1 bf_draw_shaders`
On MacOS M1 Max (debug build with glsl_preprocess optimization turned on):
Before 13.01 sec
After 9.08 sec
Pull Request: https://projects.blender.org/blender/blender/pulls/138336
This adds basic unrolling support for 2 syntax:
- `[[gpu::unroll]]` which does full loop unrolling
- `[[gpu::unroll(x)]]` which unrolls `x` iteration
Nesting is supported.
This change is motivated by the added cost in compilation
and execution time that some loops have even if they have
compile time defined iteration counts.
The syntax is inspired by `GL_EXT_control_flow_attributes`.
However, we might want to have our own prefix to show it is
a blender specific feature and that it differs from the standard.
I propose `[[gpu::unroll]]`.
In the future, we could extend this to support more directives that
can be expanded to backend specific extension / syntax. This would
avoid readability issue an error prone copy paste of large amount
of preprocessor directives.
Currently, given that GL's GLSL flavor doesn't support
any of these attributes, the preprocessor does some copy-pasting
that does the unrolling at the source level. Note that the added
`#line` allow for correct error logging.
For the `[[gpu::unroll]]` syntax, the `for` declaration
needs to follow a specific syntax to deduce the number
of loop iteration.
This variant removes the continue condition between iteration,
so all iterations are evaluated. This could be modified
using a special keyword.
For the `[[gpu::unroll(n)]]` syntax, the usercode needs
to make sure that `n` is large enough to cover all iterations
as the loop is completely removed.
We could add shader `assert` to make sure that there is
never a remaining iteration.
This behavior is usually different from what you see in other
implementation as we do not keep a loop at all. Usually, compilers
still keep the loop if it is not unrolled fully. But given we don't
have IR, this is the best we can do.
`break` and `continue` statement are forbidden at the unrolled loop
scope level. Nested loop and switch can contain these keywords.
This is accounted for by checks in the pre-processor.
Only `for` loops are supported for now. There are no real
incentive to add support for `while` given how rare it is
in the shader codebase.
Rel #137446
Pull Request: https://projects.blender.org/blender/blender/pulls/137444
This avoid having to guards functions that are
only available in fragment shader stage.
Calling the function inside another stage is still
invalid and will yield a compile error on Metal.
The vulkan and opengl glsl patch need to be modified
per stage to allow the fragment specific function
to be defined.
This is not yet widely used, but a good example is
the change in `film_display_depth_amend`.
Rel #137261
Pull Request: https://projects.blender.org/blender/blender/pulls/138280
Enable optimizations on Debug builds for executables used for the build
process (datatoc and glsl_preprocess) and enable unity builds for
shader preprocessing targets.
Debug: From 28.9s to 5.7s
Release: From 4.9s to 3.5s
Pull Request: https://projects.blender.org/blender/blender/pulls/138274
Implemented the VKSamplers based on the documentation in GPU_texture.hh.
They used to be reversed engineered from the OpenGL backend, however
Vulkan has better control over mipmap filtering and OpenGL needed to
combine some flexibility inside the min/mag filtering.
Changes:
- Mipmap filtering was always on, not only when `GPU_SAMPLER_FILTERING_MIPMAP`
is used.
- Mipmap filter mode is always set to linear. This is not according to the
documentation, but matches OpenGL and the render tests.
Fixes#136439, #138111
Partial fix#137436 - seam isn't there, inf can still be part of the
texture.
Pull Request: https://projects.blender.org/blender/blender/pulls/138313
Update descriptor pool sizes to be better for smaller scenes. Creating,
resetting and destroying pools are expensive operations. In smaller
scenes this can lead to lower performance.
The improvements may differ per platforms or not be visible.
Pull Request: https://projects.blender.org/blender/blender/pulls/138300
The use-case of this blend mode is to be able to make parts of an
viewport overlay transparent.
The future user of this blend mode is sequencer preview drawing
where frame will be drawn to an HDR render frame-buffer, and overlays
drawn on-top. In a way it is similar to the image engine, but without
need to have custom shader.
Ref #138094
Pull Request: https://projects.blender.org/blender/blender/pulls/138307
Currently unused, but allows areas outside of DRW to render to the
color render and depth texture.
The primary user of this new API will be Sequencer preview to draw
HDR images.
Ref #138094
Pull Request: https://projects.blender.org/blender/blender/pulls/138306
In a profile of sculpting with the Vulkan GPU backend enabled,
This function made up 0.7% of samples. Since it's just a single
comparison, inlining it should be helpful for the compiler.
Pull Request: https://projects.blender.org/blender/blender/pulls/138210
New function called immRectf_with_texco(), which resides next to the other
immRectf utilities.
Currently used in a single place in the sequencer, but it will be used in
a few other places in the future.
Pull Request: https://projects.blender.org/blender/blender/pulls/138222
This is completely unused, not implemented for the Vulkan backend, and
seems to add quite a bit of complexity to the Metal and OpenGL backends.
It was added for EEVEE legacy motion blur, and the last use was removed
along with EEVEE legacy. We're probably better off not maintaining it since
we should avoid duplicating vertex buffer data anyway.
Pull Request: https://projects.blender.org/blender/blender/pulls/138226
Make GPU_viewport_colorspace_set() const-crrect w.r.t view_settings.
Instead of doing in-place modifications of the view_settings argument
with restoring them later introduce new function for copying view
settings which keeps curve mapping unchanged in the destination.
Should be no functional changes.
Pull Request: https://projects.blender.org/blender/blender/pulls/138189
See #129009 for context.
The preprocessor parses metadata and writes a header file containing
an inline function that inits the `GPUSource` with the metadata.
These header files are then included inside `gpu_shader_dependency.cc`.
This still keep the usage of the `metadata` enums and classes to avoid
pulling the whole blender module inside the preprocessor executable.
This speeds-up startup time in Debug build:
`gpu_shader_dependency_init`
- Before : 37ms
- After : 4ms
I didn't measure release, but it is unlikely to be noticeable (in the
order of 4ms > 1ms).
Pull Request: https://projects.blender.org/blender/blender/pulls/138070
This avoid recreating the GPU context for each individual
tests. This reduces the overhead drastically.
Excluding static_shaders and texture_pool tests I get for GPUVulkanTest:
`Before: 129 tests from 1 test suite ran. (26304 ms total) `
`After: 129 tests from 1 test suite ran. (6965 ms total) `
Including static_shaders and texture_pool tests I get for GPUMetalTest:
`Before: 124 tests from 1 test suite ran. (54654 ms total)`
`After: 124 tests from 1 test suite ran. (1870 ms total)`
Given the tests are run twice for the workarounds versions, the
speedup can be multiplied by 2.
Overall tests time is still largely dominated by shader compilation time.
However, there is still 3x improvement using this patch:
Including static_shaders and texture_pool tests I get for GPUVulkanTest,
GPUVulkanWorkaroundTest, GPUOpenGLTest, GPUOpenGLWorkaroundTest:
`Before: 516 tests from 4 test suites ran. (318878 ms total)`
`After: 516 tests from 4 test suites ran. (106593 ms total)`
Pull Request: https://projects.blender.org/blender/blender/pulls/138097
Metal and Vulkan don't support line smoothing. There is a workaround
implemented. This workaround is only enabled when linesmooth value is
larger than 1. However When using smooth lines it should also be used.
This is fixed by adding a `GPU_line_smooth_get` function for getting the
current line smooth state.
Pull Request: https://projects.blender.org/blender/blender/pulls/138123
Each time when rendering an image a new context is created. When the
context is destroyed it can still contain a render graph, which ownership
isn't transferred back to the device. Resulting in an increase of several
MB per render. The render graph is cleanly destroyed during quiting as
there is a master list of created render graphs.
Fixed by moving the ownership of the render graph back to the device
when a context is unregistered.
Pull Request: https://projects.blender.org/blender/blender/pulls/138095
When allocating a large vertex buffer on NVIDIA it tried to allocate it
on the GPU and host visible. This section is limited in size (256MB).
However when allocating a large vertex buffer it should not have been
chosen.
Detected during investigation of #137909.
It removes the validation error, but it is unclear yet if this solves the'
crash as I wasn't able to reproduce the crash.
Pull Request: https://projects.blender.org/blender/blender/pulls/138079