Allow precompiling specialization constants variations in parallel.
Only supported in OpenGL as the rest of the batch compilation API,
on the other backends the function is a no-op.
This also moves the `SpecializationConstant` from
`gpu_shader_create_info` (private API) into`GPU_common_types`
(public API).
Pull Request: https://projects.blender.org/blender/blender/pulls/122796
This is the first commit of the several required to support
subprocess-based parallel compilation on OpenGL.
This provides the base API and implementation, and exposes the max
subprocesses setting on the UI, but it's not used by any code yet.
More information and the rest of the code can be found in #121925.
This one includes:
- A new `GPU_shader_batch` API that allows requesting the compilation
of multiple shaders at once, allowing GPU backed to compile them in
parallel and asynchronously without blocking the Blender UI.
- A virtual `ShaderCompiler` class that backends can use to add their
own implementation.
- A `ShaderCompilerGeneric` class that implements synchronous/blocking
compilation of batches for backends that don't have their own
implementation yet.
- A `GLShaderCompiler` that supports parallel compilation using
subprocesses.
- A new `BLI_subprocess` API, including IPC (required for the
`GLShaderCompiler` implementation).
- The implementation of the subprocess program in
`GPU_compilation_subprocess`.
- A new `Max Shader Compilation Subprocesses` option in
`Preferences > System > Memory & Limits` to enable parallel shader
compilation and the max number of subprocesses to allocate (each
subprocess has a relatively high memory footprint).
Implementation Overview:
There's a single `GLShaderCompiler` shared by all OpenGL contexts.
This class stores a pool of up to `GCaps.max_parallel_compilations`
subprocesses that can be used for compilation.
Each subprocess has a shared memory pool used for sending the shader
source code from the main Blender process and for receiving the already
compiled shader binary from the subprocess. This is synchronized using
a series of shared semaphores.
The subprocesses maintain a shader cache on disk inside a
`BLENDER_SHADER_CACHE` folder at the OS temporary folder.
Shaders that fail to compile are tried to be compiled again locally for
proper error reports.
Hanged subprocesses are currently detected using a timeout of 30s.
Pull Request: https://projects.blender.org/blender/blender/pulls/122232
BLF font rendering isn't compatible with render graph as it
rewrites buffers that are not yet drawn. To work around this issue
the vertex buffers should always be created on device and not
directly altered by CPU code.
Pull Request: https://projects.blender.org/blender/blender/pulls/122648
The mesh triangulation data is stored in CPU memory with the same format
as the triangles GPU index buffer. Because of that we can skip creating a
temporary copied owned by the GPU API. One way to do that is to just
upload the data directly and avoid keeping a reference to it. However, we
can only upload GPU data from the main thread with OpenGL, so instead
reference the data and keep track of whether to free it.
When drawing a mesh with a single material and 1.8 million faces, this
change gives a 12-15% improvement in framerate, from about 32 to 37 FPS.
Part of #116901.
Pull Request: https://projects.blender.org/blender/blender/pulls/122175
When synchonizing the image for presentation the incorrect access
mask was used. This PR changes the access mask from data transfer
write to memory write. The data transfer write is not allowed to
change the image layout.
Pull Request: https://projects.blender.org/blender/blender/pulls/122138
* Debugging groups were not being applied as that part of the code
wasn't ported to the original patch
* Debugging groups didn't account for nodes that weren't owned by
any debug group.
Pull Request: https://projects.blender.org/blender/blender/pulls/122136
This PR implements debug groups in the render graph. Each node contains
a reference to the debug group they belong to. During scheduling the
nodes can be reordered and the correct debug group needs to be
activated.
This is done by keeping track of the current debug group. When a
different debug group is needed, the needed ends/begins are added
to the command buffer.
This mechanism also cleans up debug groups that are not used at all
as they don't have any nodes associated to it.
Pull Request: https://projects.blender.org/blender/blender/pulls/122054
Previously the VKShaderInterface was constructed twice. This was
due to a limitation of the Shader api. Specialization constants
introduced an Shader::init function which allows to pre-initialize
the shader interface before a shader is finalized.
Pull Request: https://projects.blender.org/blender/blender/pulls/122049
Move ownership of image views to VKTexture. VKFramebuffer can request
access to the image views. This allows reconfiguring framebuffers when
the previous configuration is still in use by the render graph.
Pull Request: https://projects.blender.org/blender/blender/pulls/121727
This fixes several issues with push constants.
Push constants test were failing due to setting incorrect parameters.
The pipeline stage was used as if it was a shader stage.
When using push constants fallback the uniform was loaded to the
descriptor set after the descriptor set was checked.
Suppress updating push constants of non render graph, when render
graph is active.
Pull Request: https://projects.blender.org/blender/blender/pulls/121772
Compute tests were failing due to recent changes.
* Incorrect image layout was used for writable image bindings.
* Incorrect pipeline stage was used for indirect command buffers.
Pull Request: https://projects.blender.org/blender/blender/pulls/121744
Overlay engine extra layer can record draw list commands with an
empty index buffer. This would not affect any pixels and should be
ignored.
Issue detected when vulkan validation layers are turned on and loading
default scene.
Pull Request: https://projects.blender.org/blender/blender/pulls/121736
There is an implementation flaw in the render graph where local pointers
cannot be updated, but the data it refers to can be reallocated to
another location.
The cause of this is that the nodes use an union, which can only contain
simple constructed structs (eg memcpy). this union is stored in a vector
and can relocate the union. Any local pointers can (and will) become
invalid.
This PR is a quick fix by updating the pointers just before sending
them to the command buffer. In future a better fox needs to be done
as part of #121649.
Pull Request: https://projects.blender.org/blender/blender/pulls/121723
The render graph tests initialized a command buffer wrapper that
requires a working device. The wrapper was not used, but when
destructing it would try to deallocate the command buffer, which
cannot be done as that requires a working device as well.
This PR removes the unneeded command buffer wrappers in the test
cases.
Result isn't consistent on Windows platform due to copying structs with arrays.
This is a quick fix and will be looked at next week. The code isn't used at this moment.
Pull Request: https://projects.blender.org/blender/blender/pulls/121670
Adds support for clear attachments to the render graph.
Clearing attachments require that the command buffer is
being set for rendering (vkCmdBeginRendering).
**What works**
- Clear attachments are working with dynamic rendering and the render graph
- `GPUVulkanTest.framebuffer_clear_color_single_attachment`
- `GPUVulkanTest.framebuffer_clear_color_multiple_attachments`
- `GPUVulkanTest.framebuffer_clear_multiple_color_multiple_attachments`
- `GPUVulkanTest.framebuffer_scissor_test`
- `GPUVulkanTest.framebuffer_clear_depth`
**What still needs to be addressed**
When a command buffer is rendering it cannot do any pipeline
barriers. For clearing attachments this isn't needed, but when
we address drawing we might need to register drawing resource
dependencies to the dependency list of the begin rendering node.
This will be solved when drawing will be implemented. [#121648]
The begin rendering node is large as it has to store data for
framebuffers with 8 color attachments as well. This might
become an overhead and we could solve this by splicing the
data of larger nodes into 2 lists. This will be addressed
after we are able to draw a screen so we can collect data
on this topic. [#121649]
Pull Request: https://projects.blender.org/blender/blender/pulls/121073
Currently the image layout was fixed when creating pipeline barriers.
This PR determined the image layout an image is in based on its access
mask.
This extends the usage of the render graph and its pipeline barrier
generation for render attachments. In future it can be extended to
support other layouts.
Pull Request: https://projects.blender.org/blender/blender/pulls/121646
In the near future the legacy framebuffer/renderpass/pipeline drawing
will be replaced by dynamic rendering. Dynamic rendering provide a
flexible API to reuse pipelines between framebuffers if they share
the same image formats.
Dynamic rendering is provided by `VK_KHR_dynamic_rendering` extension
and is supported by all platforms we support (Intel since HD4000, NVIDIA
since 700, AMD since GCN2 and llvmpipe).
Functions provided by extensions are loaded in a struct inside
`VKDevice`.
Pull Request: https://projects.blender.org/blender/blender/pulls/121642
Add dispatch indirect node. Also refactored the dispatch (direct) node
so more logic could be reused. The context only stores a `VKResourceAccessInfo`
struct which is reused by both the dispatch and dispatch indirect node.
Pull Request: https://projects.blender.org/blender/blender/pulls/120993
This PR adds support for compute shaders to render graph. Only direct dispatch
is supported. indirect dispatch will be added in a future PR.
This change enables the next test cases to be supported when using render graphs
- `GPUVulkanTest.push_constants*`
- `GPUVulkanTest.shader_compute_*`
- `GPUVulkanTest.buffer_texture`
- `GPUVulkanTest.specialization_constants_compute`
- `GPUVulkanTest.compute_direct`
```
[==========] 95 tests from 2 test suites ran. (24059 ms total)
[ PASSED ] 95 tests.
```
Specialization constants are supported when using the render graph. This should conclude
the conversion the prototype of the render graph.
Pull Request: https://projects.blender.org/blender/blender/pulls/120963
VKPipeline class is deprecated and will be phased out in the near future.
This PR moves the push constants to VKShader as it was wrongly placed in the
pipeline.
Pull Request: https://projects.blender.org/blender/blender/pulls/120980
In Vulkan, a Blender shader is organized in multiple
objects. A VkPipeline is the highest level concept and represents
somewhat we call a shader. A pipeline is an device/platform optimized
version of the shader that is uploaded and executed in the GPU device.
A key difference with shaders is that its usage is also compiled
in. When using the same shader with a different blending, a new pipeline
needs to be created.
In the current implementation of the Vulkan backend the pipeline is
re-created when any pipeline parameter changes. This triggers many
pipeline compilations. Especially when common shaders are used in
different parts of the drawing code.
A requirement of our render graph implementation is that changes
of the pipeline can be detected based on the VkPipeline handle.
We only want to rebind the pipeline handle when the handle actually
changes. This improves performance (especially on NVIDIA) devices
where pipeline binds are known to be costly.
The solution of this PR is to add a pipeline pool. This holds all
pipelines and can find an already created pipeline based on pipeline
infos. Only compute pipelines support has been added.
# Future enhancements
- Recent drivers replace `VkShaderModule` with pipeline libraries.
It improves sharing pipeline stages and reduce pipeline creation times.
- GPUMaterials should be removed from the pipeline pool when they are
destroyed. Details on this will be more clear when EEVEE support is
added.
Pull Request: https://projects.blender.org/blender/blender/pulls/120899
Resource access info contains lists in a future setup the resource access info
will be kept in the VKContext and reused. This requires a reset function to
cleanup the instance for reuse.
Pull Request: https://projects.blender.org/blender/blender/pulls/120962
When building the resource access used when adding dispatch/draw commands
to the render graph, the access mask is required. This PR stores the
access mask in the shader interface. When binding the resources referenced
by the state manager, the resource access info struct is populated with
the access flags.
In the near future the resource access info will be passed when adding
a dispatch/draw node to the render graph to generate the links.
Pull Request: https://projects.blender.org/blender/blender/pulls/120908
Compute shaders are required since 4.0. There was one occasion where
an older AMD driver failed and support was turned off. This driver
is now marked unsupported.
This PR includes:
- removing the check in viewport compositing
- remove properties from system info
- always construct draw manager.
- remove unused pass logic in draw hair/curves
- add deprecation warning when accessed from python
Pull Request: https://projects.blender.org/blender/blender/pulls/120909
There was a memory leak in the render graph where nodes where freed,
but not the data it could keep. Detected during adding support for
compute shaders and running the draw tests.
Pull Request: https://projects.blender.org/blender/blender/pulls/120906
This PR implements render graph for VKTexture. During the
implementation some tweaks to the render graph was done
to support depth and stencil textures.
The render graph will record the image aspect being used
for each node. This will then be used to generate barriers
for the correct aspect.
Also fixes an issue that uploading of array textures didn't
allocate a large enough staging buffer.
Pull Request: https://projects.blender.org/blender/blender/pulls/120821
A developer can switch `vk_common.hh#use_render_graph` to enable render graph.
When enabled the buffers and images are tracked by the device resource state
tracker. The storage buffer commands are recorded to the context render graph.
The next unit tests will pass:
- GPUVulkanTest.storage_buffer_create_update_read
- GPUVulkanTest.storage_buffer_clear_zero
- GPUVulkanTest.storage_buffer_clear
- GPUVulkanTest.storage_buffer_copy_from_vertex_buffer
The pattern to migrate to render graph is:
- always construct CreateInfo for class.
- based on `use_render_graph` call `context.command_buffers.something`
or `context.render_graph.add_node`.
- Hide calls to `context.flush` when `use_render_graph` is true.
Pull Request: https://projects.blender.org/blender/blender/pulls/120812
f2ae04db10 introduces missing binding
tracking for SSBO and UBOs. Vulkan relies on validation layers to
report on missing bindings, but the binding information should still
be cleared in the context state manager.
Pull Request: https://projects.blender.org/blender/blender/pulls/120814
**Design Task**: blender/blender#118330
This PR adds the core of the render graph. The render graph isn't used.
Current implementation of the Vulkan Backend is slow by design. We
focused on stability, before performance. With the new introduced render
graph the focus will shift to performance and keep the stability at where
it is.
Some highlights:
- Every context will get its own render graph. (`VKRenderGraph`).
- Resources (and resource state tracking) is device specific (`VKResourceStateTracker`).
- No node reordering / sub graph execution has been implemented. Currently
All nodes in the graph is executed in the order they were added. (`VKScheduler`).
- The links inside the graph describe the resources the nodes read from (input links)
or writes to (output links)
- When resources are written to a resource stamp is incremented allowing keeping
track of which nodes needs which stamp of a resource.
- At each link the access information (how does the node accesses the resource)
and image layout (for image resources) are stored. This allows the render graph
to find out how a resource was used in the past and will be used in the future.
That is important to construct pipeline barriers that don't stall the whole GPU.
# Defined nodes
This implementation has nodes for:
- Blit image
- Clear color image
- Copy buffers to buffers
- Copy buffers to images
- Copy images to images
- Copy images to buffers
- Dispatch compute shader
- Fill buffers
- Synchronization
Each node has a node info, create info and data struct. The create info
contains all data to construct the node, including the links of the graph.
The data struct only contains the data stored inside the node. The node info
contains the node specific implementation.
> NOTE: Other nodes will be added after this PR lands to main.
# Resources
Before a render graph can be used, the resources should be registered
to `VKResourceStateTracker`. In the final implementation this will be owned by
the `VKDevice`. Registration of resources can be done by calling
`VKResources.add_buffer` or `VKResources.add_image`.
# Render graph
Nodes can be added to the render graph. When adding a node its read/
write dependencies are extracted and converted into links (`VKNodeInfo.
build_links`).
When the caller wants to have a resource up to date the functions
`VKRenderGraph.submit_for_read` or `VKRenderGraph.submit_for_present`
can be called.
These functions will select and order the nodes that are needed
and convert them to `vkCmd*` commands. These commands include pipeline
barrier and image layout transitions.
The `vkCmd` are recorded into a command buffer which is sent to the
device queue.
## Walking the graph
Walking the render graph isn't implemented yet. The idea is to have a
`Map<ResourceWithStamp, Vector<NodeHandle>> consumers` and
`Map<ResourceWithStamp, NodeHandle> producers`. These attributes can
be stored in the render graph and created when building the links, or
can be created inside the VKScheduler as a variable. The exact detail
which one would be better is unclear as there aren't any users yet. At
the moment the scheduler would need them we need to figure out the best
way to store and retrieve the consumers/producers.
# Unit tests
The render graph can be tested by enabling `WITH_GTEST` and use
`vk_render_graph` as a filter.
```
bin/tests/blender_test --gtest_filter="vk_render_graph*"
```
Pull Request: https://projects.blender.org/blender/blender/pulls/120427
Trying to narrow down why some tests are failing on windows, but not on
linux/mac. These tests use a string compare. This PR adds two modifications
to the current vk_to_string.
- use `std::endl` although not required, it is important to portability
- print vulkan handles in a uniform way.
Pull Request: https://projects.blender.org/blender/blender/pulls/120780