Cleanup and simplification of GPUMaterial and GPUPass compilation.
See #133674 for details/goals.
- Remove the `draw_manage_shader` thread.
Deferred compilation is now handled by the gpu::ShaderCompiler
through the batch compilation API.
Batch management is handled by the `GPUPassCache`.
- Simplify `GPUMaterial` status tracking so it just queries the
`GPUPass` status.
- Split the `GPUPass` and the `GPUCodegen` code.
- Replaced the (broken) `GPU_material_recalc_flag_get` with the new
`GPU_pass_compilation_timestamp`.
- Add the `GPU_pass_cache_wait_for_all` and
`GPU_shader_batch_wait_for_all`, and remove the busy waits from
EEVEE.
- Remove many unused functions, properties, includes...
Pull Request: https://projects.blender.org/blender/blender/pulls/135637
When enabled, this normalize the strength by the light area, to keep
the total output the same regardless of shape or size. This is the
existing behavior.
This is supported in Cycles, EEVEE, Hydra, USD, COLLADA.
For add-ons, an API function to compute the area is added for conversion,
in case there is no native support for normalization.
area = light.area(matrix_world=ob.matrix_world)
Co-authored-by: Brecht Van Lommel <brecht@blender.org>
Pull Request: https://projects.blender.org/blender/blender/pulls/136958
Similar to other renderers, this adds a temperature property to set the
light color using blackbody emission. This can be more convenient than
using nodes, and can improve interop with other software.
This is supported in Cycles, EEVEE, Hydra, USD, COLLADA and FBX.
Pull Request: https://projects.blender.org/blender/blender/pulls/134303
Similar to other renderers, this adds an exposure property to multiply
the light power by 2^exposure. This can be more convenient to control
a wide range of values.
This is supported in Cycles, EEVEE, Hydra, USD, COLLADA and FBX.
Pull Request: https://projects.blender.org/blender/blender/pulls/134528
This is because the warm_shader_specialization was
not called for the actually used specialization since
the samples_len would be updated right before accumulation.
Fixed by calling update_sample_table before the warm shader
call. Also avoid default filter request 4 sample specialization.
This avoid all stall altogether.
Removes the stutters comming from the film shader.
The number of samples used by the film shader is set
throught a specialization constant. Since the number of
effective sample can depend on jitter position, this
number was fluctuating and trigger compilation on
new specialization until all specialization were
encountered.
Rounding the samples up to the most common sample
count fixes the issue.
Pull Request: https://projects.blender.org/blender/blender/pulls/139154
This caused UB in the tests now that tests are all ran
inside the same context.
A shader could be free but its pointer would be dangling
inside the `Context`. A new shader could have the same
address and generate UB after binding.
This is not the best way to solve this issue but at least
we prevent the use of the UB.
Pull Request: https://projects.blender.org/blender/blender/pulls/139109
This allows multiple threads to request different specializations without
locking usage of all specialized shaders program when a new specialization
is being compiled.
The specialization constants are bundled in a structure that is being
passed to the `Shader::bind()` method. The structure is owned by the
calling thread and only used by the `Shader::bind()`.
Only querying for the specialized shader (Map lookup) is locking the shader
usage.
The variant compilation is now also locking and ensured that
multiple thread trying to compile the same variant will never result
in race condition.
Note that this removes the `is_dirty` optimization. This can be added
back if this becomes a bottleneck in the future. Otherwise, the
performance impact is not noticeable.
Pull Request: https://projects.blender.org/blender/blender/pulls/136991
This feature greatly increase depth buffer precision.
This is very noticeable in large view distance scenes.
This is enabled by default on GPUs that supports it (most of the hardware we
support already supports this). This makes rendering different on the GPUs
that do not support that feature (`glClipControl`).
While this give much better depth precision as before, we also have a lot of
imprecision caused by our vertex transformations. This can be improved in
another task.
Pull Request: https://projects.blender.org/blender/blender/pulls/138898
This adds support for the extension and always
set the clip state value to 0..1 to align with vulkan
and metal. Moreover this is needed for the Reverse Z
implementation.
Note that this is a OpenGL 4.5 feature and is not
required to start Blender. So there must still be
a fallback path for now.
Rel #138898
Pull Request: https://projects.blender.org/blender/blender/pulls/138941
Many of the upfront specialized variants
were not needed/ They are only used if
some scene render setting changes, which
we can detect upfront.
This is noticeable on OpenGL which doesn't support
specialization constant and has to do full shader
recompilation for each variants.
Pull Request: https://projects.blender.org/blender/blender/pulls/138589
Fix the recently implemented ShaderCompiler::batch_cancel.
Expose it with GPU_shader_batch_cancel and
GPU_shader_specialization_batch_cancel.
Use them in the EEVEE ShaderModule destructor, to prevent blocking on
destruction when there are in-flight compilations.
Pull Request: https://projects.blender.org/blender/blender/pulls/138774
This adds a new `DNA_sdna_type_ids.hh` header:
```cpp
namespace blender::dna {
/**
* Each DNA struct has an integer identifier which is unique within a specific
* Blender build, but not necessarily across different builds. The identifier
* can be used to index into `SDNA.structs`.
*/
template<typename T> int sdna_struct_id_get();
/**
* The maximum identifier that will be returned by #sdna_struct_id_get in this
* Blender build.
*/
int sdna_struct_id_get_max();
} // namespace blender::dna
```
The `sdna_struct_id_get` function is used as replacement of
`SDNA_TYPE_FROM_STRUCT` in all places except the DNA defaults system. The
defaults system is C code and therefore can't use the template. There is ongoing
work to replace the defaults system as well though: #134531.
Using this templated function has some benefits over the old approach:
* No need to rely on macros.
* Can use type inferencing in functions like `BLO_write_struct` which avoids
redundancy on the call site. E.g. `BLO_write_struct(writer, ActionStrip,
strip);` can become `BLO_write_struct(writer, strip);` which could even become
`writer.write_struct(strip);`. None of that is implemented as part of this
patch though.
* No need to include the generated `dna_type_offsets.h` file which contains a
huge enum.
Implementation wise, this is done using explicit template instantiations in a
new file generated by `makesdna.cc`: `dna_struct_ids.cc`. The generated file
looks like so:
```cpp
namespace blender::dna {
template<typename T> int sdna_struct_id_get();
int sdna_struct_id_get_max();
int sdna_struct_id_get_max() { return 951; }
}
struct IDPropertyUIData;
template<> int blender:🧬:sdna_struct_id_get<IDPropertyUIData>() { return 1; }
struct IDPropertyUIDataEnumItem;
template<> int blender:🧬:sdna_struct_id_get<IDPropertyUIDataEnumItem>() { return 2; }
```
I tried using static variables instead of separate functions, but I didn't
manage to link it properly. Not quite sure yet if that's an actual limitation or
if I was just missing something.
Pull Request: https://projects.blender.org/blender/blender/pulls/138706
Part of #136993.
Share as much of the ShaderCompiler implementations as possible.
Remove the ShaderCompiler/ShaderCompilerGeneric split and make most of
its functions non virtual.
Move the `get_compiler` function from `Context` to `GPUBackend` and
creation/deletion to `GPUBackend::init/delete_resources`.
Add a `batch_cancel` function to `ShaderCompiler` (needed for the
GPUPass refactor).
As a nice extra, the multithreaded OpenGL compilation has become faster
too.
The barbershop materials + EEVEE static shaders have gone from 27s to
22s.
I have not observed any performance difference on Vulkan or Metal.
Pull Request: https://projects.blender.org/blender/blender/pulls/136676
This is caused by the pointcloud attribute header not being added
to the volume shader.
The fix is to make volume shaders not require volume attribute
for geometry types that do not support it.
Pull Request: https://projects.blender.org/blender/blender/pulls/138212
Allows basic support for using `namespace X {}` and `X::symbol`
syntax.
Benefit:
- More sharing possible with host C++ code.
- Isolation of symbols when including shader files as C++.
Requirements:
- Nesting must be done using `namespace A::B{}` rather than
`namespace A{ namespace B {}}`, which is unsupported.
- No support for `using namespace`.
- Support of `using X` and `using X = Y` inside of function scope.
- Support of `using X` and `using X = Y` inside of namespace scope.
However, this is only to bring symbols from the same namespace
declared in another block (potentially inside another file).
- Only support namespace elision for symbols defined and used
inside of the same namespace scope.
Note that this is currently limited to blender GLSL files and
not for the shared headers. This is because we need to port a lot
of code to use namespaces before allowing this.
### Follow Up:
Nesting like `namespace A{ namespace B {}}` shouldn't be hard to
support and could be added if needed.
Rel #137446
Pull Request: https://projects.blender.org/blender/blender/pulls/137445
This patch adds a new `BLI_mutex.hh` header which adds `blender::Mutex` as alias
for either `tbb::mutex` or `std::mutex` depending on whether TBB is enabled.
Description copied from the patch:
```
/**
* blender::Mutex should be used as the default mutex in Blender. It implements a subset of the API
* of std::mutex but has overall better guaranteed properties. It can be used with RAII helpers
* like std::lock_guard. However, it is not compatible with e.g. std::condition_variable. So one
* still has to use std::mutex for that case.
*
* The mutex provided by TBB has these properties:
* - It's as fast as a spin-lock in the non-contended case, i.e. when no other thread is trying to
* lock the mutex at the same time.
* - In the contended case, it spins a couple of times but then blocks to avoid draining system
* resources by spinning for a long time.
* - It's only 1 byte large, compared to e.g. 40 bytes when using the std::mutex of GCC. This makes
* it more feasible to have many smaller mutexes which can improve scalability of algorithms
* compared to using fewer larger mutexes. Also it just reduces "memory slop" across Blender.
* - It is *not* a fair mutex, i.e. it's not guaranteed that a thread will ever be able to lock the
* mutex when there are always more than one threads that try to lock it. In the majority of
* cases, using a fair mutex just causes extra overhead without any benefit. std::mutex is not
* guaranteed to be fair either.
*/
```
The performance benchmark suggests that the impact is negilible in almost
all cases. The only benchmarks that show interesting behavior are the once
testing foreach zones in Geometry Nodes. These tests are explicitly testing
overhead, which I still have to reduce over time. So it's not unexpected that
changing the mutex has an impact there. What's interesting is that on macos the
performance improves a lot while on linux it gets worse. Since that overhead
should eventually be removed almost entirely, I don't really consider that
blocking.
Links:
* Documentation of different mutex flavors in TBB:
https://www.intel.com/content/www/us/en/docs/onetbb/developer-guide-api-reference/2021-12/mutex-flavors.html
* Older implementation of a similar mutex by me:
https://archive.blender.org/developer/differential/0016/0016711/index.html
* Interesting read regarding how a mutex can be this small:
https://webkit.org/blog/6161/locking-in-webkit/
Pull Request: https://projects.blender.org/blender/blender/pulls/138370
Implementation of #137341
This adds support for using references to any variable in a local scope
inside the shader codebase.
Example:
```cpp
int a = 0;
int &b = a;
b++; /* a == 1 */
```
Using `auto` is supported for reference definition as the type is not
preserved by the copy paste procedure. Type checking is done by the
C++ shader compilation or after the copy paste procedure during shader
compilation. `auto` is still unsupported for other variable declarations.
Reference to opaque types (`image`, `sampler`) are supported since
they are never really assigned to a temp variable.
This implements all safety feature related to the implementation being
copy pasting the definition string. That is:
- No `--`, `++` operators.
- No function calls.
- Array subscript index needs to be int constants or constant variable.
The copy pasting does not replace member access:
`auto &a = b; a.a = c;` becomes `b.a = c;`
The copy pasting does not replace function calls:
`auto &a = b; a = a();` becomes `b = a();`
While limited, this already allows for nicer syntax (aliasing) for
accessing SSBOs and the potential overhead of a copy semantic:
```cpp
ViewMatrices matrices = drw_view_buf[0];
matrices.viewmat = float4x4(1);
drw_view_buf[0] = matrices;
```
Can now be written as;
```cpp
ViewMatrices &matrices = drw_view_buf[0];
matrices.viewmat = float4x4(1);
```
Which expands to;
```cpp
drw_view_buf[0].viewmat = float4x4(1);
```
Note that the reference semantic is not carried through function call
because arguments are transformed to `inout` in GLSL. `inout` has
copy semantic but it is often implemented as reference by some
implementations.
Another important note is that this copy-pasting doesn't check if a
symbol is a variable. It can match a typename. But given that our
typenames have different capitalizations style this is unlikely to be
an issue. If that issue arise, we can add a check for it.
Rel #137446
Pull Request: https://projects.blender.org/blender/blender/pulls/138412
Allows basic usage of templated functions.
There is no support for templated struct.
Benefit:
- More readable than macros in shader sources.
- Compatible with C++ tools.
- More sharing possible with host C++ code.
Requirements/Limitations:
- No default arguments to template parameters.
- Must use explicit instantiation for all variant needed.
- Explicit instantiation needs to **not** use argument deduction.
- Calls to template needs to have all template argument explicit
or all implicit.
- Template overload is not supported (redefining the same template
with different template argument or function argument types).
Currently implemented as Macros inside the build-time pre-pocessor,
but that could change to copy-paste to allow better error reporting.
However, the Macros keep the shader code reduced in the final binary
and allow different file to declare different instantiation.
The implementation is done by declaring overloads for each explicit
instantiation.
If a template has arguments not present in function
arguments, then all arguments **values** are appended to the
function name. The explicit template callsite is then modified to use
`TEMPLATE_GLUE` which will call the correct function. This is
why template argument deduction is not supported in this case.
Rel #137446
Pull Request: https://projects.blender.org/blender/blender/pulls/137441
This adds a version of `BKE_id_new_nomain` that takes the ID type parameter as
template argument. This allows the function the return the newly created ID with
the correct type, removing the need to use `static_cast` on the call-site.
To make this work, I added a static `id_type` member to every ID struct. This
can also be used to create a similar API for other id management functions in
future patches.
```cpp
// Old
Mesh *mesh = static_cast<Mesh *>(BKE_id_new_nomain(ID_ME, "Mesh"));
// New
Mesh *mesh = BKE_id_new_nomain<Mesh>("Mesh");
```
Pull Request: https://projects.blender.org/blender/blender/pulls/138383
This adds basic unrolling support for 2 syntax:
- `[[gpu::unroll]]` which does full loop unrolling
- `[[gpu::unroll(x)]]` which unrolls `x` iteration
Nesting is supported.
This change is motivated by the added cost in compilation
and execution time that some loops have even if they have
compile time defined iteration counts.
The syntax is inspired by `GL_EXT_control_flow_attributes`.
However, we might want to have our own prefix to show it is
a blender specific feature and that it differs from the standard.
I propose `[[gpu::unroll]]`.
In the future, we could extend this to support more directives that
can be expanded to backend specific extension / syntax. This would
avoid readability issue an error prone copy paste of large amount
of preprocessor directives.
Currently, given that GL's GLSL flavor doesn't support
any of these attributes, the preprocessor does some copy-pasting
that does the unrolling at the source level. Note that the added
`#line` allow for correct error logging.
For the `[[gpu::unroll]]` syntax, the `for` declaration
needs to follow a specific syntax to deduce the number
of loop iteration.
This variant removes the continue condition between iteration,
so all iterations are evaluated. This could be modified
using a special keyword.
For the `[[gpu::unroll(n)]]` syntax, the usercode needs
to make sure that `n` is large enough to cover all iterations
as the loop is completely removed.
We could add shader `assert` to make sure that there is
never a remaining iteration.
This behavior is usually different from what you see in other
implementation as we do not keep a loop at all. Usually, compilers
still keep the loop if it is not unrolled fully. But given we don't
have IR, this is the best we can do.
`break` and `continue` statement are forbidden at the unrolled loop
scope level. Nested loop and switch can contain these keywords.
This is accounted for by checks in the pre-processor.
Only `for` loops are supported for now. There are no real
incentive to add support for `while` given how rare it is
in the shader codebase.
Rel #137446
Pull Request: https://projects.blender.org/blender/blender/pulls/137444
This avoid having to guards functions that are
only available in fragment shader stage.
Calling the function inside another stage is still
invalid and will yield a compile error on Metal.
The vulkan and opengl glsl patch need to be modified
per stage to allow the fragment specific function
to be defined.
This is not yet widely used, but a good example is
the change in `film_display_depth_amend`.
Rel #137261
Pull Request: https://projects.blender.org/blender/blender/pulls/138280
Fixes incorrect render of lookdev on macOS.
There might have been some other issues caused by the mistake, hard to say,
seems to be hitting some platform-specific behavior and such.
* Remove `DEG_get_evaluated_object` in favor of `DEG_get_evaluated`.
* Remove `DEG_is_original_object` in favor of `DEG_is_original`.
* Remove `DEG_is_evaluated_object` in favor of `DEG_is_evaluated`.
Pull Request: https://projects.blender.org/blender/blender/pulls/138317
These were handled mostly completely outside of IDManagement code, yet
(ab)using the same ID management system in some cases, adding hacks to
address some issues, etc.
Also address a similar issue in the eevee lookdev `LookdevWorld` code.
Since initializing pre-allocated (or static) buffers as valid IDs is not
currently supported, and these use-cases do not seem common enough to be
worth supporting it currently, instead switch to storing allocated IDs
into static pointers.
This allows to use proper 'out-of-main' ID creation code API.
NOTE: There are still some remaining issues, especially in the GP
material BKE api. These are noted in comments for now, as it would be
out of scope to address them in this commit.
Pull Request: https://projects.blender.org/blender/blender/pulls/138263
This is the root cause of the non-deterministic behavior of
light baking. The shadow system was not given enough update
iteration to fully render all tiles of the Virtual shadow maps.
This would resulte in missing tiles inside the bakes.
The value of the tiles seems to have been implementation
dependant, which, on some implementation created light leaks.
This fixes some recent regressions on the MacOS builbot tests.
This is candidate for backporting to 4.2.
Fix#138147
Pull Request: https://projects.blender.org/blender/blender/pulls/138203
This allows users to implement arbitrary camera models using OSL by writing
shaders that take an image position as input and compute ray origin and
direction.
The obvious applications for this are e.g. panorama modes, lens distortion
models and realistic lens simulation, but the possibilities are endless.
Currently, this is only supported on devices with OSL support, so CPU and
OptiX. However, it is independent from the shading model used, so custom
cameras can be used without getting the performance hit of OSL shading.
A few samples are provided as Text Editor templates.
One notable current limitation (in addition to the limited device support)
is that inverse mapping is not supported, so Window texture coordinates and
the Vector pass will not work with custom cameras.
Pull Request: https://projects.blender.org/blender/blender/pulls/129495
This implement the design detailed in #135935.
A new per object property called `Shadow Terminator Normal Offset` is
introduced to shift the shadowed position along the shading normal.
The amount of shift is defined in object space on the object datablock.
This amount is modulated by the facing ratio to the light. Faces
already facing the light will get no offset. This avoids most light
leaking artifacts.
In case of multiple shading normal, the normal used for the shift
is arbitrary. Note that this is the same behavior for other biases.
The magnitude of the bias is controlled by `Shadow Terminator Normal Offset`.
The amount of faces affected by the bias is controlled using
`Shadow Terminator Geometry Offset` just like cycles.
Tweaking the `Shadow Terminator Geometry Offset` allows to avoid too much
shadow distortion on surfaces with bump mapping.
Cycles properties are copied from the Cycles object datablock to the
blender datablock. This break the python API for Cycles.
The defaults are set to no bias because:
- There is no good default. The best value depends on the geometry.
- The best value might depend on real-time displacement.
- Any bias will introduce light leaking on surfaces that do not need it.
- There is an additional cost of enabling it, which is proportional
to the amount of pixels on screen using it.
Pull Request: https://projects.blender.org/blender/blender/pulls/136935
The goal here is to avoid having to cast to and from `ID` when getting the
evaluated or original ID using the depsgraph API, which is often verbose and not
type safe. To solve this, there are now `DEG_get_original` and
`DEG_get_evaluated` methods which are templated on the type and use a new
`is_ID_v` static type check to make sure it's only used with valid types.
This allows removing quite some verbosity on all the call sites. I also removed
`DEG_get_original_object`, because that does not have to be a special case
anymore.
Pull Request: https://projects.blender.org/blender/blender/pulls/137629