Files
test2/source/blender/gpu/vulkan/vk_framebuffer.cc

Ignoring revisions in .git-blame-ignore-revs. Click here to bypass and see the normal blame view.

1042 lines
39 KiB
C++
Raw Normal View History

/* SPDX-FileCopyrightText: 2022 Blender Authors
*
* SPDX-License-Identifier: GPL-2.0-or-later */
/** \file
* \ingroup gpu
*/
#include "vk_framebuffer.hh"
#include "vk_backend.hh"
#include "vk_context.hh"
Vulkan: Rewrite GHOST_ContextVK This is a rewrite of GHOST_ContextVK to align with Metal backend as described in #111389 - solution 3 with the adaptation that GHOST is still responsible for presenting the swap chain image and a post callback is still needed in case the swapchain is recreated. This PR also includes some smaller improvements in stability. Technical documentation: https://developer.blender.org/docs/eevee_and_viewport/gpu/vulkan/swap_chain/ * Renderpasses and framebuffers are not owned anymore by GHOST_ContextVK * VKFramebuffer doesn't contain a swap chain image. * Swapchain images can only be used as a blit destination or present source. Not as an attachment. * GHOST_ContextVK::swapBuffers would call a callback with the image the GPU module needs to blit the results to. * Clearing of depth/stencil attachments when no depth write state is set. * Enable VK_KHR_maintenance4 to relax the stage interface mapping. * Removes most vulkan validation warnings/errors. * Detection of frame buffer changes that needs to be applied before performing a command requiring render pass (draw/clear attachment) **Benefits** * Late retrieval of a swap chain image results in better overall performance as Blender doesn't need to wait until the image is presented on the screen. * Easier API and clearer state (transitions) * More control over Image layouts and command buffer states. (Better alignment with Vulkan API) Pull Request: https://projects.blender.org/blender/blender/pulls/111473
2023-08-29 15:05:08 +02:00
#include "vk_state_manager.hh"
#include "vk_texture.hh"
namespace blender::gpu {
/**
* The default load store action when not using load stores.
*/
constexpr GPULoadStore default_load_store()
{
return {GPU_LOADACTION_LOAD, GPU_STOREACTION_STORE, {0.0f, 0.0f, 0.0f, 0.0f}};
}
/* -------------------------------------------------------------------- */
/** \name Creation & Deletion
* \{ */
VKFrameBuffer::VKFrameBuffer(const char *name)
: FrameBuffer(name),
load_stores(GPU_FB_MAX_ATTACHMENT, default_load_store()),
attachment_states_(GPU_FB_MAX_ATTACHMENT, GPU_ATTACHMENT_WRITE)
{
size_set(1, 1);
srgb_ = false;
enabled_srgb_ = false;
}
VKFrameBuffer::~VKFrameBuffer()
{
VKContext *context = VKContext::get();
if (context && context->active_framebuffer_get() == this) {
context->deactivate_framebuffer();
}
render_pass_free();
}
void VKFrameBuffer::render_pass_free()
{
Vulkan: Device command builder This PR implements a new the threading model for building render graphs based on tests performed last month. For out workload multithreaded command building will block in the driver or device. So better to use a single thread for command building. Details of the internal working is documented at https://developer.blender.org/docs/features/gpu/vulkan/render_graph/ - When a context is activated on a thread the context asks for a render graph it can use by calling `VKDevice::render_graph_new`. - Parts of the GPU backend that requires GPU commands will add a specific render graph node to the render graph. The nodes also contains a reference to all resources it needs including the access it needs and the image layout. - When the context is flushed the render graph is submitted to the device by calling `VKDevice::render_graph_submit`. - The device puts the render graph in `VKDevice::submission_pool`. - There is a single background thread that gets the next render graph to send to the GPU (`VKDevice::submission_runner`). - Reorder the commands of the render graph to comply with Vulkan specific command order rules and reducing possible bottlenecks. (`VKScheduler`) - Generate the required barriers `VKCommandBuilder::groups_extract_barriers`. This is a separate step to reduce resource locking giving other threads access to the resource states when they are building the render graph nodes. - GPU commands and pipeline barriers are recorded to a VkCommandBuffer. (`VKCommandBuilder::record_commands`) - When completed the command buffer can be submitted to the device queue. `vkQueueSubmit` - Render graphs that have been submitted can be reused by a next thread. This is done by pushing the render graph to the `VKDevice::unused_render_graphs` queue. Pull Request: https://projects.blender.org/blender/blender/pulls/132681
2025-01-27 08:55:23 +01:00
VKDiscardPool &discard_pool = VKDiscardPool::discard_pool_get();
if (vk_framebuffer != VK_NULL_HANDLE) {
Vulkan: Device command builder This PR implements a new the threading model for building render graphs based on tests performed last month. For out workload multithreaded command building will block in the driver or device. So better to use a single thread for command building. Details of the internal working is documented at https://developer.blender.org/docs/features/gpu/vulkan/render_graph/ - When a context is activated on a thread the context asks for a render graph it can use by calling `VKDevice::render_graph_new`. - Parts of the GPU backend that requires GPU commands will add a specific render graph node to the render graph. The nodes also contains a reference to all resources it needs including the access it needs and the image layout. - When the context is flushed the render graph is submitted to the device by calling `VKDevice::render_graph_submit`. - The device puts the render graph in `VKDevice::submission_pool`. - There is a single background thread that gets the next render graph to send to the GPU (`VKDevice::submission_runner`). - Reorder the commands of the render graph to comply with Vulkan specific command order rules and reducing possible bottlenecks. (`VKScheduler`) - Generate the required barriers `VKCommandBuilder::groups_extract_barriers`. This is a separate step to reduce resource locking giving other threads access to the resource states when they are building the render graph nodes. - GPU commands and pipeline barriers are recorded to a VkCommandBuffer. (`VKCommandBuilder::record_commands`) - When completed the command buffer can be submitted to the device queue. `vkQueueSubmit` - Render graphs that have been submitted can be reused by a next thread. This is done by pushing the render graph to the `VKDevice::unused_render_graphs` queue. Pull Request: https://projects.blender.org/blender/blender/pulls/132681
2025-01-27 08:55:23 +01:00
discard_pool.discard_framebuffer(vk_framebuffer);
vk_framebuffer = VK_NULL_HANDLE;
}
if (vk_render_pass != VK_NULL_HANDLE) {
Vulkan: Device command builder This PR implements a new the threading model for building render graphs based on tests performed last month. For out workload multithreaded command building will block in the driver or device. So better to use a single thread for command building. Details of the internal working is documented at https://developer.blender.org/docs/features/gpu/vulkan/render_graph/ - When a context is activated on a thread the context asks for a render graph it can use by calling `VKDevice::render_graph_new`. - Parts of the GPU backend that requires GPU commands will add a specific render graph node to the render graph. The nodes also contains a reference to all resources it needs including the access it needs and the image layout. - When the context is flushed the render graph is submitted to the device by calling `VKDevice::render_graph_submit`. - The device puts the render graph in `VKDevice::submission_pool`. - There is a single background thread that gets the next render graph to send to the GPU (`VKDevice::submission_runner`). - Reorder the commands of the render graph to comply with Vulkan specific command order rules and reducing possible bottlenecks. (`VKScheduler`) - Generate the required barriers `VKCommandBuilder::groups_extract_barriers`. This is a separate step to reduce resource locking giving other threads access to the resource states when they are building the render graph nodes. - GPU commands and pipeline barriers are recorded to a VkCommandBuffer. (`VKCommandBuilder::record_commands`) - When completed the command buffer can be submitted to the device queue. `vkQueueSubmit` - Render graphs that have been submitted can be reused by a next thread. This is done by pushing the render graph to the `VKDevice::unused_render_graphs` queue. Pull Request: https://projects.blender.org/blender/blender/pulls/132681
2025-01-27 08:55:23 +01:00
discard_pool.discard_render_pass(vk_render_pass);
vk_render_pass = VK_NULL_HANDLE;
}
}
/** \} */
void VKFrameBuffer::bind(bool enabled_srgb)
{
VKContext &context = *VKContext::get();
/* Updating attachments can issue pipeline barriers, this should be done outside the render pass.
* When done inside a render pass there should be a self-dependency between sub-passes on the
* active render pass. As the active render pass isn't aware of the new render pass (and should
* not) it is better to deactivate it before updating the attachments. For more information check
* `VkSubpassDependency`. */
if (context.has_active_framebuffer()) {
context.deactivate_framebuffer();
}
context.activate_framebuffer(*this);
Vulkan: Initial OpenXR support The Blender's VkInstance cannot be shared with OpenXR VkInstance. The reason is a chicken and egg problem where OpenXR needs to be started before Vulkan. OpenXR can add special vulkan specific requirements (instance&device) that are only available when the user starts an OpenXR session. The goal implementation is to share memory between both instances using [VK_KHR_external_memory](https://registry.khronos.org/vulkan/specs/latest/man/html/VK_KHR_external_memory.html) and related extensions. However this seems to be a bridge to far as a initial step. Reason: There are not that many samples/ guides and documentation to be found to handle the workflow that we require. We want to do a smaller step by step approach to gain the needed knowledge. For that reason this PR does the most stupidest thing that can be done to share memory between instances. Download the render result to CPU RAM share the host pointer with the OpenXR instance which copies it to the swap chain. Also the synchronization is done using wait idle commands. <video src="attachments/32a0d69b-c3fa-4272-aea0-d207609afaaf" title="Screencast From 2025-03-18 11-16-17.webm" controls></video> **Gaining knowledge** - Experiment with `VK_KHR_external_memory_host` extension for uploading vertex buffers (not related to OpenXR). - Import host pointer with `VK_KHR_external_memory_host`. This reduces the additional memcpy on OpenXR side. - Export host pointer from Blender side from a mappable buffer. - Replace host pointers with fd/dmabuf/winhandle - Remove mappable buffer. Ref #133718 Pull Request: https://projects.blender.org/blender/blender/pulls/133824
2025-03-27 16:57:51 +01:00
update_size();
viewport_reset();
scissor_reset();
enabled_srgb_ = enabled_srgb;
Shader::set_framebuffer_srgb_target(enabled_srgb && srgb_);
load_stores.fill(default_load_store());
attachment_states_.fill(GPU_ATTACHMENT_WRITE);
}
void VKFrameBuffer::vk_viewports_append(Vector<VkViewport> &r_viewports) const
{
BLI_assert(r_viewports.is_empty());
for (int64_t index : IndexRange(this->multi_viewport_ ? GPU_MAX_VIEWPORTS : 1)) {
VkViewport viewport;
viewport.x = viewport_[index][0];
viewport.y = viewport_[index][1];
viewport.width = viewport_[index][2];
viewport.height = viewport_[index][3];
viewport.minDepth = 0.0f;
viewport.maxDepth = 1.0f;
r_viewports.append(viewport);
}
}
void VKFrameBuffer::render_area_update(VkRect2D &render_area) const
{
if (scissor_test_get()) {
int scissor_rect[4];
scissor_get(scissor_rect);
render_area.offset.x = clamp_i(scissor_rect[0], 0, width_);
render_area.offset.y = clamp_i(scissor_rect[1], 0, height_);
render_area.extent.width = clamp_i(scissor_rect[2], 1, width_ - scissor_rect[0]);
render_area.extent.height = clamp_i(scissor_rect[3], 1, height_ - scissor_rect[1]);
}
else {
render_area.offset.x = 0;
render_area.offset.y = 0;
render_area.extent.width = width_;
render_area.extent.height = height_;
}
}
void VKFrameBuffer::vk_render_areas_append(Vector<VkRect2D> &r_render_areas) const
{
BLI_assert(r_render_areas.is_empty());
VkRect2D render_area;
render_area_update(render_area);
r_render_areas.append_n_times(render_area, this->multi_viewport_ ? GPU_MAX_VIEWPORTS : 1);
}
bool VKFrameBuffer::check(char err_out[256])
{
bool success = true;
if (has_gaps_between_color_attachments()) {
success = false;
BLI_snprintf(err_out,
256,
"Framebuffer '%s' has gaps between color attachments. This is not supported by "
"legacy devices using VkRenderPass natively.\n",
name_);
}
return success;
}
bool VKFrameBuffer::has_gaps_between_color_attachments() const
{
bool empty_slot = false;
for (int attachment_index : IndexRange(GPU_FB_COLOR_ATTACHMENT0, GPU_FB_MAX_COLOR_ATTACHMENT)) {
const GPUAttachment &attachment = attachments_[attachment_index];
if (attachment.tex == nullptr) {
empty_slot = true;
}
else if (empty_slot) {
return true;
}
}
return false;
}
void VKFrameBuffer::build_clear_attachments_depth_stencil(
const eGPUFrameBufferBits buffers,
float clear_depth,
uint32_t clear_stencil,
render_graph::VKClearAttachmentsNode::CreateInfo &clear_attachments) const
{
VkImageAspectFlags aspect_mask = (buffers & GPU_DEPTH_BIT ? VK_IMAGE_ASPECT_DEPTH_BIT : 0) |
(buffers & GPU_STENCIL_BIT ? VK_IMAGE_ASPECT_STENCIL_BIT : 0);
VkClearAttachment &clear_attachment =
clear_attachments.attachments[clear_attachments.attachment_count++];
clear_attachment.aspectMask = aspect_mask;
clear_attachment.clearValue.depthStencil.depth = clear_depth;
clear_attachment.clearValue.depthStencil.stencil = clear_stencil;
clear_attachment.colorAttachment = 0;
}
void VKFrameBuffer::build_clear_attachments_color(
const float (*clear_colors)[4],
const bool multi_clear_colors,
render_graph::VKClearAttachmentsNode::CreateInfo &clear_attachments) const
{
int color_index = 0;
for (int color_slot = 0; color_slot < GPU_FB_MAX_COLOR_ATTACHMENT; color_slot++) {
const GPUAttachment &attachment = attachments_[GPU_FB_COLOR_ATTACHMENT0 + color_slot];
if (attachment.tex == nullptr) {
continue;
}
VkClearAttachment &clear_attachment =
clear_attachments.attachments[clear_attachments.attachment_count++];
clear_attachment.aspectMask = VK_IMAGE_ASPECT_COLOR_BIT;
clear_attachment.colorAttachment = color_slot;
eGPUDataFormat data_format = to_data_format(GPU_texture_format(attachment.tex));
clear_attachment.clearValue.color = to_vk_clear_color_value(data_format,
&clear_colors[color_index]);
color_index += multi_clear_colors ? 1 : 0;
}
}
/* -------------------------------------------------------------------- */
/** \name Clear
* \{ */
void VKFrameBuffer::clear(render_graph::VKClearAttachmentsNode::CreateInfo &clear_attachments)
{
VKContext &context = *VKContext::get();
rendering_ensure(context);
Vulkan: Device command builder This PR implements a new the threading model for building render graphs based on tests performed last month. For out workload multithreaded command building will block in the driver or device. So better to use a single thread for command building. Details of the internal working is documented at https://developer.blender.org/docs/features/gpu/vulkan/render_graph/ - When a context is activated on a thread the context asks for a render graph it can use by calling `VKDevice::render_graph_new`. - Parts of the GPU backend that requires GPU commands will add a specific render graph node to the render graph. The nodes also contains a reference to all resources it needs including the access it needs and the image layout. - When the context is flushed the render graph is submitted to the device by calling `VKDevice::render_graph_submit`. - The device puts the render graph in `VKDevice::submission_pool`. - There is a single background thread that gets the next render graph to send to the GPU (`VKDevice::submission_runner`). - Reorder the commands of the render graph to comply with Vulkan specific command order rules and reducing possible bottlenecks. (`VKScheduler`) - Generate the required barriers `VKCommandBuilder::groups_extract_barriers`. This is a separate step to reduce resource locking giving other threads access to the resource states when they are building the render graph nodes. - GPU commands and pipeline barriers are recorded to a VkCommandBuffer. (`VKCommandBuilder::record_commands`) - When completed the command buffer can be submitted to the device queue. `vkQueueSubmit` - Render graphs that have been submitted can be reused by a next thread. This is done by pushing the render graph to the `VKDevice::unused_render_graphs` queue. Pull Request: https://projects.blender.org/blender/blender/pulls/132681
2025-01-27 08:55:23 +01:00
context.render_graph().add_node(clear_attachments);
}
void VKFrameBuffer::clear(const eGPUFrameBufferBits buffers,
const float clear_color[4],
float clear_depth,
uint clear_stencil)
{
render_graph::VKClearAttachmentsNode::CreateInfo clear_attachments = {};
render_area_update(clear_attachments.vk_clear_rect.rect);
clear_attachments.vk_clear_rect.baseArrayLayer = 0;
clear_attachments.vk_clear_rect.layerCount = 1;
if (buffers & (GPU_DEPTH_BIT | GPU_STENCIL_BIT)) {
Vulkan: Rewrite GHOST_ContextVK This is a rewrite of GHOST_ContextVK to align with Metal backend as described in #111389 - solution 3 with the adaptation that GHOST is still responsible for presenting the swap chain image and a post callback is still needed in case the swapchain is recreated. This PR also includes some smaller improvements in stability. Technical documentation: https://developer.blender.org/docs/eevee_and_viewport/gpu/vulkan/swap_chain/ * Renderpasses and framebuffers are not owned anymore by GHOST_ContextVK * VKFramebuffer doesn't contain a swap chain image. * Swapchain images can only be used as a blit destination or present source. Not as an attachment. * GHOST_ContextVK::swapBuffers would call a callback with the image the GPU module needs to blit the results to. * Clearing of depth/stencil attachments when no depth write state is set. * Enable VK_KHR_maintenance4 to relax the stage interface mapping. * Removes most vulkan validation warnings/errors. * Detection of frame buffer changes that needs to be applied before performing a command requiring render pass (draw/clear attachment) **Benefits** * Late retrieval of a swap chain image results in better overall performance as Blender doesn't need to wait until the image is presented on the screen. * Easier API and clearer state (transitions) * More control over Image layouts and command buffer states. (Better alignment with Vulkan API) Pull Request: https://projects.blender.org/blender/blender/pulls/111473
2023-08-29 15:05:08 +02:00
VKContext &context = *VKContext::get();
eGPUWriteMask needed_mask = GPU_WRITE_NONE;
if (buffers & GPU_DEPTH_BIT) {
needed_mask |= GPU_WRITE_DEPTH;
}
if (buffers & GPU_STENCIL_BIT) {
needed_mask |= GPU_WRITE_STENCIL;
}
2024-11-25 13:24:46 +11:00
/* Clearing depth via #vkCmdClearAttachments requires a render pass with write depth or stencil
* enabled. When not enabled, clearing should be done via texture directly. */
/* WORKAROUND: Clearing depth attachment when using dynamic rendering are not working on AMD
* official drivers.
* See #129265 */
if ((context.state_manager_get().state.write_mask & needed_mask) == needed_mask &&
!GPU_type_matches(GPU_DEVICE_ATI, GPU_OS_ANY, GPU_DRIVER_OFFICIAL))
{
build_clear_attachments_depth_stencil(
buffers, clear_depth, clear_stencil, clear_attachments);
Vulkan: Rewrite GHOST_ContextVK This is a rewrite of GHOST_ContextVK to align with Metal backend as described in #111389 - solution 3 with the adaptation that GHOST is still responsible for presenting the swap chain image and a post callback is still needed in case the swapchain is recreated. This PR also includes some smaller improvements in stability. Technical documentation: https://developer.blender.org/docs/eevee_and_viewport/gpu/vulkan/swap_chain/ * Renderpasses and framebuffers are not owned anymore by GHOST_ContextVK * VKFramebuffer doesn't contain a swap chain image. * Swapchain images can only be used as a blit destination or present source. Not as an attachment. * GHOST_ContextVK::swapBuffers would call a callback with the image the GPU module needs to blit the results to. * Clearing of depth/stencil attachments when no depth write state is set. * Enable VK_KHR_maintenance4 to relax the stage interface mapping. * Removes most vulkan validation warnings/errors. * Detection of frame buffer changes that needs to be applied before performing a command requiring render pass (draw/clear attachment) **Benefits** * Late retrieval of a swap chain image results in better overall performance as Blender doesn't need to wait until the image is presented on the screen. * Easier API and clearer state (transitions) * More control over Image layouts and command buffer states. (Better alignment with Vulkan API) Pull Request: https://projects.blender.org/blender/blender/pulls/111473
2023-08-29 15:05:08 +02:00
}
else {
VKTexture *depth_texture = unwrap(unwrap(depth_tex()));
if (depth_texture != nullptr) {
depth_texture->clear_depth_stencil(buffers, clear_depth, clear_stencil);
}
}
}
if (buffers & GPU_COLOR_BIT) {
float clear_color_single[4];
copy_v4_v4(clear_color_single, clear_color);
build_clear_attachments_color(&clear_color_single, false, clear_attachments);
}
Vulkan: Rewrite GHOST_ContextVK This is a rewrite of GHOST_ContextVK to align with Metal backend as described in #111389 - solution 3 with the adaptation that GHOST is still responsible for presenting the swap chain image and a post callback is still needed in case the swapchain is recreated. This PR also includes some smaller improvements in stability. Technical documentation: https://developer.blender.org/docs/eevee_and_viewport/gpu/vulkan/swap_chain/ * Renderpasses and framebuffers are not owned anymore by GHOST_ContextVK * VKFramebuffer doesn't contain a swap chain image. * Swapchain images can only be used as a blit destination or present source. Not as an attachment. * GHOST_ContextVK::swapBuffers would call a callback with the image the GPU module needs to blit the results to. * Clearing of depth/stencil attachments when no depth write state is set. * Enable VK_KHR_maintenance4 to relax the stage interface mapping. * Removes most vulkan validation warnings/errors. * Detection of frame buffer changes that needs to be applied before performing a command requiring render pass (draw/clear attachment) **Benefits** * Late retrieval of a swap chain image results in better overall performance as Blender doesn't need to wait until the image is presented on the screen. * Easier API and clearer state (transitions) * More control over Image layouts and command buffer states. (Better alignment with Vulkan API) Pull Request: https://projects.blender.org/blender/blender/pulls/111473
2023-08-29 15:05:08 +02:00
if (clear_attachments.attachment_count) {
clear(clear_attachments);
Vulkan: Rewrite GHOST_ContextVK This is a rewrite of GHOST_ContextVK to align with Metal backend as described in #111389 - solution 3 with the adaptation that GHOST is still responsible for presenting the swap chain image and a post callback is still needed in case the swapchain is recreated. This PR also includes some smaller improvements in stability. Technical documentation: https://developer.blender.org/docs/eevee_and_viewport/gpu/vulkan/swap_chain/ * Renderpasses and framebuffers are not owned anymore by GHOST_ContextVK * VKFramebuffer doesn't contain a swap chain image. * Swapchain images can only be used as a blit destination or present source. Not as an attachment. * GHOST_ContextVK::swapBuffers would call a callback with the image the GPU module needs to blit the results to. * Clearing of depth/stencil attachments when no depth write state is set. * Enable VK_KHR_maintenance4 to relax the stage interface mapping. * Removes most vulkan validation warnings/errors. * Detection of frame buffer changes that needs to be applied before performing a command requiring render pass (draw/clear attachment) **Benefits** * Late retrieval of a swap chain image results in better overall performance as Blender doesn't need to wait until the image is presented on the screen. * Easier API and clearer state (transitions) * More control over Image layouts and command buffer states. (Better alignment with Vulkan API) Pull Request: https://projects.blender.org/blender/blender/pulls/111473
2023-08-29 15:05:08 +02:00
}
}
void VKFrameBuffer::clear_multi(const float (*clear_color)[4])
{
render_graph::VKClearAttachmentsNode::CreateInfo clear_attachments = {};
render_area_update(clear_attachments.vk_clear_rect.rect);
clear_attachments.vk_clear_rect.baseArrayLayer = 0;
clear_attachments.vk_clear_rect.layerCount = 1;
build_clear_attachments_color(clear_color, true, clear_attachments);
if (clear_attachments.attachment_count) {
clear(clear_attachments);
}
}
void VKFrameBuffer::clear_attachment(GPUAttachmentType /*type*/,
eGPUDataFormat /*data_format*/,
const void * /*clear_value*/)
{
/* Clearing of a single attachment was added to implement `clear_multi` in OpenGL. As
* `clear_multi` is supported in Vulkan it isn't needed to implement this method.
*/
BLI_assert_unreachable();
}
/** \} */
/* -------------------------------------------------------------------- */
/** \name Load/Store operations
* \{ */
void VKFrameBuffer::attachment_set_loadstore_op(GPUAttachmentType type, GPULoadStore ls)
{
load_stores[type] = ls;
}
static VkAttachmentLoadOp to_vk_attachment_load_op(eGPULoadOp load_op)
{
switch (load_op) {
case GPU_LOADACTION_DONT_CARE:
return VK_ATTACHMENT_LOAD_OP_DONT_CARE;
case GPU_LOADACTION_CLEAR:
return VK_ATTACHMENT_LOAD_OP_CLEAR;
case GPU_LOADACTION_LOAD:
return VK_ATTACHMENT_LOAD_OP_LOAD;
}
BLI_assert_unreachable();
return VK_ATTACHMENT_LOAD_OP_LOAD;
}
static VkAttachmentStoreOp to_vk_attachment_store_op(eGPUStoreOp store_op)
{
switch (store_op) {
case GPU_STOREACTION_DONT_CARE:
return VK_ATTACHMENT_STORE_OP_DONT_CARE;
case GPU_STOREACTION_STORE:
return VK_ATTACHMENT_STORE_OP_STORE;
}
BLI_assert_unreachable();
return VK_ATTACHMENT_STORE_OP_STORE;
}
static void set_load_store(VkRenderingAttachmentInfo &r_rendering_attachment,
const GPULoadStore &ls)
{
copy_v4_v4(r_rendering_attachment.clearValue.color.float32, ls.clear_value);
r_rendering_attachment.loadOp = to_vk_attachment_load_op(ls.load_action);
r_rendering_attachment.storeOp = to_vk_attachment_store_op(ls.store_action);
}
/** \} */
/* -------------------------------------------------------------------- */
/** \name Sub-pass transition
* \{ */
void VKFrameBuffer::subpass_transition_impl(const GPUAttachmentState depth_attachment_state,
Span<GPUAttachmentState> color_attachment_states)
{
const VKDevice &device = VKBackend::get().device;
const bool supports_local_read = device.extensions_get().dynamic_rendering_local_read;
attachment_states_[GPU_FB_DEPTH_ATTACHMENT] = depth_attachment_state;
attachment_states_.as_mutable_span()
.slice(GPU_FB_COLOR_ATTACHMENT0, color_attachment_states.size())
.copy_from(color_attachment_states);
if (supports_local_read) {
VKContext &context = *VKContext::get();
for (int index : IndexRange(color_attachment_states.size())) {
if (color_attachment_states[index] == GPU_ATTACHMENT_READ) {
VKTexture *texture = unwrap(unwrap(color_tex(index)));
if (texture) {
context.state_manager_get().image_bind(texture, index);
}
}
}
if (is_rendering_) {
is_rendering_ = false;
load_stores.fill(default_load_store());
}
}
else {
VKContext &context = *VKContext::get();
if (is_rendering_) {
rendering_end(context);
/* TODO: this might need a better implementation:
* READ -> DONTCARE
* WRITE -> LOAD, STORE based on previous value.
* IGNORE -> DONTCARE -> IGNORE */
load_stores.fill(default_load_store());
}
for (int index : IndexRange(color_attachment_states.size())) {
if (color_attachment_states[index] == GPU_ATTACHMENT_READ) {
VKTexture *texture = unwrap(unwrap(color_tex(index)));
if (texture) {
context.state_manager_get().texture_bind(
texture, GPUSamplerState::default_sampler(), index);
}
}
}
}
}
/** \} */
/* -------------------------------------------------------------------- */
/** \name Read back
* \{ */
void VKFrameBuffer::read(eGPUFrameBufferBits plane,
eGPUDataFormat format,
const int area[4],
int /*channel_len*/,
int slot,
void *r_data)
{
GPUAttachment *attachment = nullptr;
switch (plane) {
case GPU_COLOR_BIT:
attachment = &attachments_[GPU_FB_COLOR_ATTACHMENT0 + slot];
break;
case GPU_DEPTH_BIT:
attachment = attachments_[GPU_FB_DEPTH_ATTACHMENT].tex ?
&attachments_[GPU_FB_DEPTH_ATTACHMENT] :
&attachments_[GPU_FB_DEPTH_STENCIL_ATTACHMENT];
break;
default:
BLI_assert_unreachable();
return;
}
VKTexture *texture = unwrap(unwrap(attachment->tex));
BLI_assert_msg(texture,
"Trying to read back texture from framebuffer, but no texture is available in "
"requested slot.");
if (texture == nullptr) {
return;
}
const int region[6] = {area[0], area[1], 0, area[0] + area[2], area[1] + area[3], 1};
IndexRange layers(max_ii(attachment->layer, 0), 1);
texture->read_sub(0, format, region, layers, r_data);
}
/** \} */
/* -------------------------------------------------------------------- */
/** \name Blit operations
* \{ */
static void blit_aspect(VKContext &context,
VKTexture &dst_texture,
VKTexture &src_texture,
int dst_offset_x,
int dst_offset_y,
VkImageAspectFlags image_aspect)
{
/* Prefer texture copy, as some platforms don't support using D32_SFLOAT_S8_UINT to be used as
* a blit destination. */
if (dst_offset_x == 0 && dst_offset_y == 0 &&
dst_texture.device_format_get() == src_texture.device_format_get() &&
src_texture.width_get() == dst_texture.width_get() &&
src_texture.height_get() == dst_texture.height_get())
{
src_texture.copy_to(dst_texture, image_aspect);
return;
}
render_graph::VKBlitImageNode::CreateInfo blit_image = {};
blit_image.src_image = src_texture.vk_image_handle();
blit_image.dst_image = dst_texture.vk_image_handle();
blit_image.filter = VK_FILTER_NEAREST;
VkImageBlit &region = blit_image.region;
region.srcSubresource.aspectMask = image_aspect;
region.srcSubresource.mipLevel = 0;
region.srcSubresource.baseArrayLayer = 0;
region.srcSubresource.layerCount = 1;
region.srcOffsets[0].x = 0;
region.srcOffsets[0].y = 0;
region.srcOffsets[0].z = 0;
region.srcOffsets[1].x = src_texture.width_get();
region.srcOffsets[1].y = src_texture.height_get();
region.srcOffsets[1].z = 1;
region.dstSubresource.aspectMask = image_aspect;
region.dstSubresource.mipLevel = 0;
region.dstSubresource.baseArrayLayer = 0;
region.dstSubresource.layerCount = 1;
region.dstOffsets[0].x = clamp_i(dst_offset_x, 0, dst_texture.width_get());
region.dstOffsets[0].y = clamp_i(dst_offset_y, 0, dst_texture.height_get());
region.dstOffsets[0].z = 0;
region.dstOffsets[1].x = clamp_i(
dst_offset_x + src_texture.width_get(), 0, dst_texture.width_get());
region.dstOffsets[1].y = clamp_i(
dst_offset_y + src_texture.height_get(), 0, dst_texture.height_get());
region.dstOffsets[1].z = 1;
Vulkan: Device command builder This PR implements a new the threading model for building render graphs based on tests performed last month. For out workload multithreaded command building will block in the driver or device. So better to use a single thread for command building. Details of the internal working is documented at https://developer.blender.org/docs/features/gpu/vulkan/render_graph/ - When a context is activated on a thread the context asks for a render graph it can use by calling `VKDevice::render_graph_new`. - Parts of the GPU backend that requires GPU commands will add a specific render graph node to the render graph. The nodes also contains a reference to all resources it needs including the access it needs and the image layout. - When the context is flushed the render graph is submitted to the device by calling `VKDevice::render_graph_submit`. - The device puts the render graph in `VKDevice::submission_pool`. - There is a single background thread that gets the next render graph to send to the GPU (`VKDevice::submission_runner`). - Reorder the commands of the render graph to comply with Vulkan specific command order rules and reducing possible bottlenecks. (`VKScheduler`) - Generate the required barriers `VKCommandBuilder::groups_extract_barriers`. This is a separate step to reduce resource locking giving other threads access to the resource states when they are building the render graph nodes. - GPU commands and pipeline barriers are recorded to a VkCommandBuffer. (`VKCommandBuilder::record_commands`) - When completed the command buffer can be submitted to the device queue. `vkQueueSubmit` - Render graphs that have been submitted can be reused by a next thread. This is done by pushing the render graph to the `VKDevice::unused_render_graphs` queue. Pull Request: https://projects.blender.org/blender/blender/pulls/132681
2025-01-27 08:55:23 +01:00
context.render_graph().add_node(blit_image);
}
void VKFrameBuffer::blit_to(eGPUFrameBufferBits planes,
int src_slot,
FrameBuffer *dst,
int dst_slot,
int dst_offset_x,
int dst_offset_y)
{
BLI_assert(dst);
BLI_assert_msg(ELEM(planes, GPU_COLOR_BIT, GPU_DEPTH_BIT),
"VKFrameBuffer::blit_to only supports a single color or depth aspect.");
UNUSED_VARS_NDEBUG(planes);
VKContext &context = *VKContext::get();
if (!context.has_active_framebuffer()) {
BLI_assert_unreachable();
return;
}
VKFrameBuffer &dst_framebuffer = *unwrap(dst);
if (planes & GPU_COLOR_BIT) {
const GPUAttachment &src_attachment = attachments_[GPU_FB_COLOR_ATTACHMENT0 + src_slot];
const GPUAttachment &dst_attachment =
dst_framebuffer.attachments_[GPU_FB_COLOR_ATTACHMENT0 + dst_slot];
if (src_attachment.tex && dst_attachment.tex) {
VKTexture &src_texture = *unwrap(unwrap(src_attachment.tex));
VKTexture &dst_texture = *unwrap(unwrap(dst_attachment.tex));
blit_aspect(context,
dst_texture,
src_texture,
dst_offset_x,
dst_offset_y,
VK_IMAGE_ASPECT_COLOR_BIT);
}
}
if (planes & GPU_DEPTH_BIT) {
/* Retrieve source texture. */
const GPUAttachment &src_attachment = attachments_[GPU_FB_DEPTH_STENCIL_ATTACHMENT].tex ?
attachments_[GPU_FB_DEPTH_STENCIL_ATTACHMENT] :
attachments_[GPU_FB_DEPTH_ATTACHMENT];
const GPUAttachment &dst_attachment =
dst_framebuffer.attachments_[GPU_FB_DEPTH_STENCIL_ATTACHMENT].tex ?
dst_framebuffer.attachments_[GPU_FB_DEPTH_STENCIL_ATTACHMENT] :
dst_framebuffer.attachments_[GPU_FB_DEPTH_ATTACHMENT];
if (src_attachment.tex && dst_attachment.tex) {
VKTexture &src_texture = *unwrap(unwrap(src_attachment.tex));
VKTexture &dst_texture = *unwrap(unwrap(dst_attachment.tex));
blit_aspect(context,
dst_texture,
src_texture,
dst_offset_x,
dst_offset_y,
VK_IMAGE_ASPECT_DEPTH_BIT);
}
}
}
/** \} */
/* -------------------------------------------------------------------- */
/** \name Update attachments
* \{ */
Vulkan: Rewrite GHOST_ContextVK This is a rewrite of GHOST_ContextVK to align with Metal backend as described in #111389 - solution 3 with the adaptation that GHOST is still responsible for presenting the swap chain image and a post callback is still needed in case the swapchain is recreated. This PR also includes some smaller improvements in stability. Technical documentation: https://developer.blender.org/docs/eevee_and_viewport/gpu/vulkan/swap_chain/ * Renderpasses and framebuffers are not owned anymore by GHOST_ContextVK * VKFramebuffer doesn't contain a swap chain image. * Swapchain images can only be used as a blit destination or present source. Not as an attachment. * GHOST_ContextVK::swapBuffers would call a callback with the image the GPU module needs to blit the results to. * Clearing of depth/stencil attachments when no depth write state is set. * Enable VK_KHR_maintenance4 to relax the stage interface mapping. * Removes most vulkan validation warnings/errors. * Detection of frame buffer changes that needs to be applied before performing a command requiring render pass (draw/clear attachment) **Benefits** * Late retrieval of a swap chain image results in better overall performance as Blender doesn't need to wait until the image is presented on the screen. * Easier API and clearer state (transitions) * More control over Image layouts and command buffer states. (Better alignment with Vulkan API) Pull Request: https://projects.blender.org/blender/blender/pulls/111473
2023-08-29 15:05:08 +02:00
void VKFrameBuffer::update_size()
{
if (!dirty_attachments_) {
return;
}
for (int i = 0; i < GPU_FB_MAX_ATTACHMENT; i++) {
GPUAttachment &attachment = attachments_[i];
if (attachment.tex) {
int size[3];
GPU_texture_get_mipmap_size(attachment.tex, attachment.mip, size);
size_set(size[0], size[1]);
return;
}
}
}
void VKFrameBuffer::update_srgb()
{
for (int i : IndexRange(GPU_FB_MAX_COLOR_ATTACHMENT)) {
VKTexture *texture = unwrap(unwrap(color_tex(i)));
if (texture) {
srgb_ = (texture->format_flag_get() & GPU_FORMAT_SRGB) != 0;
return;
}
}
}
int VKFrameBuffer::color_attachments_resource_size() const
{
int size = 0;
for (int color_slot : IndexRange(GPU_FB_MAX_COLOR_ATTACHMENT)) {
if (color_tex(color_slot) != nullptr) {
size = max_ii(color_slot + 1, size);
}
}
return size;
}
/** \} */
void VKFrameBuffer::rendering_reset()
{
is_rendering_ = false;
}
void VKFrameBuffer::rendering_ensure_render_pass(VKContext &context)
{
render_pass_free();
depth_attachment_format_ = VK_FORMAT_UNDEFINED;
stencil_attachment_format_ = VK_FORMAT_UNDEFINED;
render_graph::VKResourceAccessInfo access_info;
Vector<VkAttachmentDescription> vk_attachment_descriptions;
Vector<VkAttachmentReference> color_attachments;
Vector<VkAttachmentReference> input_attachments;
Vector<VkImageView> vk_image_views;
uint32_t max_layer_count = 1;
/* Color attachments */
VkAttachmentReference depth_attachment_reference = {0u};
for (int color_attachment_index :
IndexRange(GPU_FB_COLOR_ATTACHMENT0, GPU_FB_MAX_COLOR_ATTACHMENT))
{
const GPUAttachment &attachment = attachments_[color_attachment_index];
if (attachment.tex == nullptr) {
continue;
}
VKTexture &color_texture = *unwrap(unwrap(attachment.tex));
BLI_assert_msg(color_texture.usage_get() & GPU_TEXTURE_USAGE_ATTACHMENT,
"Texture is used as an attachment, but doesn't have the "
"GPU_TEXTURE_USAGE_ATTACHMENT flag.");
GPUAttachmentState attachment_state = attachment_states_[color_attachment_index];
uint32_t layer_base = max_ii(attachment.layer, 0);
int layer_count = color_texture.layer_count();
if (attachment.layer == -1 && layer_count != 1) {
max_layer_count = max_ii(max_layer_count, layer_count);
}
VKImageViewInfo image_view_info = {
eImageViewUsage::Attachment,
IndexRange(layer_base,
layer_count != 1 ? max_ii(layer_count - layer_base, 1) : layer_count),
IndexRange(attachment.mip, 1),
{{'r', 'g', 'b', 'a'}},
false,
srgb_ && enabled_srgb_,
VKImageViewArrayed::DONT_CARE};
const VKImageView &image_view = color_texture.image_view_get(image_view_info);
// TODO: Use VK_IMAGE_LAYOUT_SHADER_READ_ONLY_OPTIMAL for readonly attachments.
VkImageLayout vk_image_layout = (attachment_state == GPU_ATTACHMENT_READ) ?
VK_IMAGE_LAYOUT_GENERAL :
VK_IMAGE_LAYOUT_COLOR_ATTACHMENT_OPTIMAL;
uint32_t attachment_reference = color_attachment_index - GPU_FB_COLOR_ATTACHMENT0;
/* Depth attachment should always be right after the last color attachment. If not shaders
2024-11-25 13:24:46 +11:00
* cannot be reused between frame-buffers with and without depth/stencil attachment. */
depth_attachment_reference.attachment = attachment_reference + 1;
VkAttachmentDescription vk_attachment_description = {};
vk_attachment_description.format = image_view.vk_format();
vk_attachment_description.samples = VK_SAMPLE_COUNT_1_BIT;
vk_attachment_description.initialLayout = vk_image_layout;
vk_attachment_description.finalLayout = vk_image_layout;
vk_attachment_descriptions.append(std::move(vk_attachment_description));
vk_image_views.append(image_view.vk_handle());
switch (attachment_state) {
case GPU_ATTACHMENT_WRITE: {
color_attachments.append({attachment_reference, vk_image_layout});
access_info.images.append(
{color_texture.vk_image_handle(),
VK_ACCESS_COLOR_ATTACHMENT_READ_BIT | VK_ACCESS_COLOR_ATTACHMENT_WRITE_BIT,
VK_IMAGE_ASPECT_COLOR_BIT,
layer_base});
break;
}
case GPU_ATTACHMENT_READ: {
input_attachments.append({attachment_reference, vk_image_layout});
access_info.images.append({color_texture.vk_image_handle(),
VK_ACCESS_COLOR_ATTACHMENT_READ_BIT,
VK_IMAGE_ASPECT_COLOR_BIT,
layer_base});
break;
}
case GPU_ATTACHMENT_IGNORE: {
input_attachments.append({VK_ATTACHMENT_UNUSED, VK_IMAGE_LAYOUT_UNDEFINED});
break;
}
}
}
/* Update the color attachment size attribute. This is used to generate the correct amount of
* color blend states in the graphics pipeline. */
color_attachment_size = color_attachments.size();
/* Depth attachment */
bool has_depth_attachment = false;
for (int depth_attachment_index : IndexRange(GPU_FB_DEPTH_ATTACHMENT, 2)) {
const GPUAttachment &attachment = attachments_[depth_attachment_index];
if (attachment.tex == nullptr) {
continue;
}
has_depth_attachment = true;
bool is_stencil_attachment = depth_attachment_index == GPU_FB_DEPTH_STENCIL_ATTACHMENT;
VKTexture &depth_texture = *unwrap(unwrap(attachment.tex));
BLI_assert_msg(depth_texture.usage_get() & GPU_TEXTURE_USAGE_ATTACHMENT,
"Texture is used as an attachment, but doesn't have the "
"GPU_TEXTURE_USAGE_ATTACHMENT flag.");
VkImageAspectFlags depth_texture_aspect = to_vk_image_aspect_flag_bits(
depth_texture.device_format_get());
bool is_depth_stencil_attachment = depth_texture_aspect & VK_IMAGE_ASPECT_STENCIL_BIT;
VkImageLayout vk_image_layout = is_depth_stencil_attachment ?
VK_IMAGE_LAYOUT_DEPTH_STENCIL_ATTACHMENT_OPTIMAL :
VK_IMAGE_LAYOUT_DEPTH_ATTACHMENT_OPTIMAL;
GPUAttachmentState attachment_state = attachment_states_[GPU_FB_DEPTH_ATTACHMENT];
VkImageView depth_image_view = VK_NULL_HANDLE;
uint32_t layer_base = max_ii(attachment.layer, 0);
if (attachment_state == GPU_ATTACHMENT_WRITE) {
VKImageViewInfo image_view_info = {eImageViewUsage::Attachment,
IndexRange(layer_base, 1),
IndexRange(attachment.mip, 1),
{{'r', 'g', 'b', 'a'}},
is_stencil_attachment,
false,
VKImageViewArrayed::DONT_CARE};
depth_image_view = depth_texture.image_view_get(image_view_info).vk_handle();
}
VkAttachmentDescription vk_attachment_description = {};
vk_attachment_description.format = to_vk_format(depth_texture.device_format_get());
vk_attachment_description.samples = VK_SAMPLE_COUNT_1_BIT;
vk_attachment_description.initialLayout = vk_image_layout;
vk_attachment_description.finalLayout = vk_image_layout;
vk_attachment_descriptions.append(std::move(vk_attachment_description));
depth_attachment_reference.layout = vk_image_layout;
vk_image_views.append(depth_image_view);
access_info.images.append({depth_texture.vk_image_handle(),
VK_ACCESS_DEPTH_STENCIL_ATTACHMENT_READ_BIT |
VK_ACCESS_DEPTH_STENCIL_ATTACHMENT_WRITE_BIT,
is_stencil_attachment ?
static_cast<VkImageAspectFlags>(VK_IMAGE_ASPECT_DEPTH_BIT |
VK_IMAGE_ASPECT_STENCIL_BIT) :
static_cast<VkImageAspectFlags>(VK_IMAGE_ASPECT_DEPTH_BIT),
0});
VkFormat vk_format = to_vk_format(depth_texture.device_format_get());
depth_attachment_format_ = vk_format;
if (is_stencil_attachment) {
stencil_attachment_format_ = vk_format;
}
}
2024-11-25 13:24:46 +11:00
/* Sub-pass description. */
VkSubpassDescription vk_subpass_description = {};
vk_subpass_description.pipelineBindPoint = VK_PIPELINE_BIND_POINT_GRAPHICS;
vk_subpass_description.colorAttachmentCount = color_attachments.size();
vk_subpass_description.pColorAttachments = color_attachments.data();
vk_subpass_description.inputAttachmentCount = input_attachments.size();
vk_subpass_description.pInputAttachments = input_attachments.data();
if (has_depth_attachment) {
vk_subpass_description.pDepthStencilAttachment = &depth_attachment_reference;
}
VKDevice &device = VKBackend::get().device;
2024-11-25 13:24:46 +11:00
/* Render-pass create info. */
VkRenderPassCreateInfo vk_render_pass_create_info = {};
vk_render_pass_create_info.sType = VK_STRUCTURE_TYPE_RENDER_PASS_CREATE_INFO;
vk_render_pass_create_info.subpassCount = 1;
vk_render_pass_create_info.pSubpasses = &vk_subpass_description;
vk_render_pass_create_info.attachmentCount = vk_attachment_descriptions.size();
vk_render_pass_create_info.pAttachments = vk_attachment_descriptions.data();
vkCreateRenderPass(device.vk_handle(), &vk_render_pass_create_info, nullptr, &vk_render_pass);
debug::object_label(vk_render_pass, name_);
/* Frame buffer create info */
VkFramebufferCreateInfo vk_framebuffer_create_info = {};
vk_framebuffer_create_info.sType = VK_STRUCTURE_TYPE_FRAMEBUFFER_CREATE_INFO;
vk_framebuffer_create_info.renderPass = vk_render_pass;
vk_framebuffer_create_info.attachmentCount = vk_image_views.size();
vk_framebuffer_create_info.pAttachments = vk_image_views.data();
vk_framebuffer_create_info.width = width_;
vk_framebuffer_create_info.height = height_;
vk_framebuffer_create_info.layers = max_layer_count;
vkCreateFramebuffer(device.vk_handle(), &vk_framebuffer_create_info, nullptr, &vk_framebuffer);
debug::object_label(vk_framebuffer, name_);
/* Begin rendering */
render_graph::VKBeginRenderingNode::CreateInfo begin_rendering(access_info);
VkRenderPassBeginInfo &begin_info = begin_rendering.node_data.vk_render_pass_begin_info;
begin_info.sType = VK_STRUCTURE_TYPE_RENDER_PASS_BEGIN_INFO;
begin_info.renderPass = vk_render_pass;
begin_info.framebuffer = vk_framebuffer;
render_area_update(begin_info.renderArea);
Vulkan: Device command builder This PR implements a new the threading model for building render graphs based on tests performed last month. For out workload multithreaded command building will block in the driver or device. So better to use a single thread for command building. Details of the internal working is documented at https://developer.blender.org/docs/features/gpu/vulkan/render_graph/ - When a context is activated on a thread the context asks for a render graph it can use by calling `VKDevice::render_graph_new`. - Parts of the GPU backend that requires GPU commands will add a specific render graph node to the render graph. The nodes also contains a reference to all resources it needs including the access it needs and the image layout. - When the context is flushed the render graph is submitted to the device by calling `VKDevice::render_graph_submit`. - The device puts the render graph in `VKDevice::submission_pool`. - There is a single background thread that gets the next render graph to send to the GPU (`VKDevice::submission_runner`). - Reorder the commands of the render graph to comply with Vulkan specific command order rules and reducing possible bottlenecks. (`VKScheduler`) - Generate the required barriers `VKCommandBuilder::groups_extract_barriers`. This is a separate step to reduce resource locking giving other threads access to the resource states when they are building the render graph nodes. - GPU commands and pipeline barriers are recorded to a VkCommandBuffer. (`VKCommandBuilder::record_commands`) - When completed the command buffer can be submitted to the device queue. `vkQueueSubmit` - Render graphs that have been submitted can be reused by a next thread. This is done by pushing the render graph to the `VKDevice::unused_render_graphs` queue. Pull Request: https://projects.blender.org/blender/blender/pulls/132681
2025-01-27 08:55:23 +01:00
context.render_graph().add_node(begin_rendering);
2024-11-25 13:24:46 +11:00
/* Load store operations are not supported inside a render pass.
* It requires duplicating render passes and frame-buffers to support suspend/resume rendering.
* After suspension all the graphics pipelines needs to be created using the resume handles.
* Due to command reordering it is unclear when this switch needs to be made and would require
* to double the graphics pipelines.
*
* This all adds a lot of complexity just to support clearing ops on legacy platforms. An easier
2024-11-25 13:24:46 +11:00
* solution is to use #vkCmdClearAttachments right after the begin rendering.
*/
if (use_explicit_load_store_) {
render_graph::VKClearAttachmentsNode::CreateInfo clear_attachments = {};
for (int attachment_index : IndexRange(GPU_FB_MAX_ATTACHMENT)) {
GPULoadStore &load_store = load_stores[attachment_index];
if (load_store.load_action != GPU_LOADACTION_CLEAR) {
continue;
}
bool is_depth = attachment_index < GPU_FB_COLOR_ATTACHMENT0;
if (is_depth) {
build_clear_attachments_depth_stencil(
GPU_DEPTH_BIT, load_store.clear_value[0], 0, clear_attachments);
}
else {
build_clear_attachments_color(&load_store.clear_value, false, clear_attachments);
}
}
if (clear_attachments.attachment_count != 0) {
render_area_update(clear_attachments.vk_clear_rect.rect);
clear_attachments.vk_clear_rect.baseArrayLayer = 0;
clear_attachments.vk_clear_rect.layerCount = 1;
Vulkan: Device command builder This PR implements a new the threading model for building render graphs based on tests performed last month. For out workload multithreaded command building will block in the driver or device. So better to use a single thread for command building. Details of the internal working is documented at https://developer.blender.org/docs/features/gpu/vulkan/render_graph/ - When a context is activated on a thread the context asks for a render graph it can use by calling `VKDevice::render_graph_new`. - Parts of the GPU backend that requires GPU commands will add a specific render graph node to the render graph. The nodes also contains a reference to all resources it needs including the access it needs and the image layout. - When the context is flushed the render graph is submitted to the device by calling `VKDevice::render_graph_submit`. - The device puts the render graph in `VKDevice::submission_pool`. - There is a single background thread that gets the next render graph to send to the GPU (`VKDevice::submission_runner`). - Reorder the commands of the render graph to comply with Vulkan specific command order rules and reducing possible bottlenecks. (`VKScheduler`) - Generate the required barriers `VKCommandBuilder::groups_extract_barriers`. This is a separate step to reduce resource locking giving other threads access to the resource states when they are building the render graph nodes. - GPU commands and pipeline barriers are recorded to a VkCommandBuffer. (`VKCommandBuilder::record_commands`) - When completed the command buffer can be submitted to the device queue. `vkQueueSubmit` - Render graphs that have been submitted can be reused by a next thread. This is done by pushing the render graph to the `VKDevice::unused_render_graphs` queue. Pull Request: https://projects.blender.org/blender/blender/pulls/132681
2025-01-27 08:55:23 +01:00
context.render_graph().add_node(clear_attachments);
}
}
}
void VKFrameBuffer::rendering_ensure_dynamic_rendering(VKContext &context,
const VKExtensions &extensions)
{
const VKDevice &device = VKBackend::get().device;
const bool supports_local_read = device.extensions_get().dynamic_rendering_local_read;
depth_attachment_format_ = VK_FORMAT_UNDEFINED;
stencil_attachment_format_ = VK_FORMAT_UNDEFINED;
render_graph::VKResourceAccessInfo access_info;
render_graph::VKBeginRenderingNode::CreateInfo begin_rendering(access_info);
begin_rendering.node_data.vk_rendering_info.sType = VK_STRUCTURE_TYPE_RENDERING_INFO;
begin_rendering.node_data.vk_rendering_info.layerCount = 1;
render_area_update(begin_rendering.node_data.vk_rendering_info.renderArea);
color_attachment_formats_.clear();
for (int color_attachment_index :
IndexRange(GPU_FB_COLOR_ATTACHMENT0, GPU_FB_MAX_COLOR_ATTACHMENT))
{
const GPUAttachment &attachment = attachments_[color_attachment_index];
if (attachment.tex == nullptr) {
continue;
}
VKTexture &color_texture = *unwrap(unwrap(attachment.tex));
BLI_assert_msg(color_texture.usage_get() & GPU_TEXTURE_USAGE_ATTACHMENT,
"Texture is used as an attachment, but doesn't have the "
"GPU_TEXTURE_USAGE_ATTACHMENT flag.");
/* To support `gpu_Layer` we need to set the layerCount to the number of layers it can
* access.
*/
int layer_count = color_texture.layer_count();
if (attachment.layer == -1 && layer_count != 1) {
begin_rendering.node_data.vk_rendering_info.layerCount = max_ii(
begin_rendering.node_data.vk_rendering_info.layerCount, layer_count);
}
VkRenderingAttachmentInfo &attachment_info =
begin_rendering.node_data
.color_attachments[begin_rendering.node_data.vk_rendering_info.colorAttachmentCount++];
attachment_info.sType = VK_STRUCTURE_TYPE_RENDERING_ATTACHMENT_INFO;
VkImageView vk_image_view = VK_NULL_HANDLE;
Vulkan: Layer tracking during render scope EEVEE can bind layers of a texture that is also used as an attachment. When binding the image layout of these specific layers can be different that the image layout of the whole image. This fixes the known synchronization issues inside EEVEE. wasp_bot, tree_creature and wanderer scenes can be rendered without any synchronization issue reported by the Vulkan validation layers. Design task: #124214 When beginning to render the attachments are being evaluated. If there is an arrayed texture (with multiple layers) the individual layers of that texture can be tracked during until the rendering is ended. When the same texture is bound to a shader it will be a different layer (otherwise there is a feedback loop, which isn't allowed). The bound layers will typically need a different layout the transition to the new layout is executed and recorded. When the rendering ends, the layers are transitioned back to the layout the texture is expected in. It can happen that a layer is used multiple times during the same rendering. In that case the rendering should be suspended to perform the transition. Image layout transitions are not allowed during rendering. There is one place where a layer needs to be transited multiple times that is when EEVEE wants to extract the thickness from the shadow. The thickness is stored inside the gbuffer_normal which is also used as an attachment. Eval then samples the thickness from the gbuffer_normal as a sampler. To work around this issue we suspend the rendering when a `GPU_BARRIER_SHADER_IMAGE_ACCESS` is signaled. Pull Request: https://projects.blender.org/blender/blender/pulls/124407
2024-07-16 16:39:18 +02:00
uint32_t layer_base = max_ii(attachment.layer, 0);
GPUAttachmentState attachment_state = attachment_states_[color_attachment_index];
VkFormat vk_format = to_vk_format(color_texture.device_format_get());
if (attachment_state == GPU_ATTACHMENT_WRITE) {
VKImageViewInfo image_view_info = {
eImageViewUsage::Attachment,
IndexRange(layer_base,
layer_count != 1 ? max_ii(layer_count - layer_base, 1) : layer_count),
IndexRange(attachment.mip, 1),
{{'r', 'g', 'b', 'a'}},
false,
srgb_ && enabled_srgb_,
VKImageViewArrayed::DONT_CARE};
const VKImageView &image_view = color_texture.image_view_get(image_view_info);
vk_image_view = image_view.vk_handle();
vk_format = image_view.vk_format();
}
attachment_info.imageView = vk_image_view;
attachment_info.imageLayout = supports_local_read ? VK_IMAGE_LAYOUT_RENDERING_LOCAL_READ_KHR :
VK_IMAGE_LAYOUT_COLOR_ATTACHMENT_OPTIMAL;
set_load_store(attachment_info, load_stores[color_attachment_index]);
access_info.images.append(
{color_texture.vk_image_handle(),
VK_ACCESS_COLOR_ATTACHMENT_READ_BIT | VK_ACCESS_COLOR_ATTACHMENT_WRITE_BIT,
Vulkan: Layer tracking during render scope EEVEE can bind layers of a texture that is also used as an attachment. When binding the image layout of these specific layers can be different that the image layout of the whole image. This fixes the known synchronization issues inside EEVEE. wasp_bot, tree_creature and wanderer scenes can be rendered without any synchronization issue reported by the Vulkan validation layers. Design task: #124214 When beginning to render the attachments are being evaluated. If there is an arrayed texture (with multiple layers) the individual layers of that texture can be tracked during until the rendering is ended. When the same texture is bound to a shader it will be a different layer (otherwise there is a feedback loop, which isn't allowed). The bound layers will typically need a different layout the transition to the new layout is executed and recorded. When the rendering ends, the layers are transitioned back to the layout the texture is expected in. It can happen that a layer is used multiple times during the same rendering. In that case the rendering should be suspended to perform the transition. Image layout transitions are not allowed during rendering. There is one place where a layer needs to be transited multiple times that is when EEVEE wants to extract the thickness from the shadow. The thickness is stored inside the gbuffer_normal which is also used as an attachment. Eval then samples the thickness from the gbuffer_normal as a sampler. To work around this issue we suspend the rendering when a `GPU_BARRIER_SHADER_IMAGE_ACCESS` is signaled. Pull Request: https://projects.blender.org/blender/blender/pulls/124407
2024-07-16 16:39:18 +02:00
VK_IMAGE_ASPECT_COLOR_BIT,
layer_base});
color_attachment_formats_.append(
(!extensions.dynamic_rendering_unused_attachments && vk_image_view == VK_NULL_HANDLE) ?
VK_FORMAT_UNDEFINED :
vk_format);
begin_rendering.node_data.vk_rendering_info.pColorAttachments =
begin_rendering.node_data.color_attachments;
}
color_attachment_size = color_attachment_formats_.size();
for (int depth_attachment_index : IndexRange(GPU_FB_DEPTH_ATTACHMENT, 2)) {
const GPUAttachment &attachment = attachments_[depth_attachment_index];
if (attachment.tex == nullptr) {
continue;
}
bool is_stencil_attachment = depth_attachment_index == GPU_FB_DEPTH_STENCIL_ATTACHMENT;
VKTexture &depth_texture = *unwrap(unwrap(attachment.tex));
BLI_assert_msg(depth_texture.usage_get() & GPU_TEXTURE_USAGE_ATTACHMENT,
"Texture is used as an attachment, but doesn't have the "
"GPU_TEXTURE_USAGE_ATTACHMENT flag.");
bool is_depth_stencil_attachment = to_vk_image_aspect_flag_bits(
depth_texture.device_format_get()) &
VK_IMAGE_ASPECT_STENCIL_BIT;
VkImageLayout vk_image_layout = is_depth_stencil_attachment ?
VK_IMAGE_LAYOUT_DEPTH_STENCIL_ATTACHMENT_OPTIMAL :
VK_IMAGE_LAYOUT_DEPTH_ATTACHMENT_OPTIMAL;
GPUAttachmentState attachment_state = attachment_states_[GPU_FB_DEPTH_ATTACHMENT];
VkImageView depth_image_view = VK_NULL_HANDLE;
if (attachment_state == GPU_ATTACHMENT_WRITE) {
VKImageViewInfo image_view_info = {eImageViewUsage::Attachment,
IndexRange(max_ii(attachment.layer, 0), 1),
IndexRange(attachment.mip, 1),
{{'r', 'g', 'b', 'a'}},
is_stencil_attachment,
false,
VKImageViewArrayed::DONT_CARE};
depth_image_view = depth_texture.image_view_get(image_view_info).vk_handle();
}
VkFormat vk_format = (!extensions.dynamic_rendering_unused_attachments &&
depth_image_view == VK_NULL_HANDLE) ?
VK_FORMAT_UNDEFINED :
to_vk_format(depth_texture.device_format_get());
/* TODO: we should be able to use a single attachment info and only set the
* #pDepthAttachment/#pStencilAttachment to the same struct.
* But perhaps the stencil clear op might be different. */
{
VkRenderingAttachmentInfo &attachment_info = begin_rendering.node_data.depth_attachment;
attachment_info.sType = VK_STRUCTURE_TYPE_RENDERING_ATTACHMENT_INFO;
attachment_info.imageView = depth_image_view;
attachment_info.imageLayout = vk_image_layout;
set_load_store(attachment_info, load_stores[depth_attachment_index]);
depth_attachment_format_ = vk_format;
begin_rendering.node_data.vk_rendering_info.pDepthAttachment =
&begin_rendering.node_data.depth_attachment;
}
if (is_stencil_attachment) {
VkRenderingAttachmentInfo &attachment_info = begin_rendering.node_data.stencil_attachment;
attachment_info.sType = VK_STRUCTURE_TYPE_RENDERING_ATTACHMENT_INFO;
attachment_info.imageView = depth_image_view;
attachment_info.imageLayout = vk_image_layout;
set_load_store(attachment_info, load_stores[depth_attachment_index]);
stencil_attachment_format_ = vk_format;
begin_rendering.node_data.vk_rendering_info.pStencilAttachment =
&begin_rendering.node_data.stencil_attachment;
}
access_info.images.append({depth_texture.vk_image_handle(),
VK_ACCESS_DEPTH_STENCIL_ATTACHMENT_READ_BIT |
VK_ACCESS_DEPTH_STENCIL_ATTACHMENT_WRITE_BIT,
is_stencil_attachment ?
static_cast<VkImageAspectFlags>(VK_IMAGE_ASPECT_DEPTH_BIT |
VK_IMAGE_ASPECT_STENCIL_BIT) :
Vulkan: Layer tracking during render scope EEVEE can bind layers of a texture that is also used as an attachment. When binding the image layout of these specific layers can be different that the image layout of the whole image. This fixes the known synchronization issues inside EEVEE. wasp_bot, tree_creature and wanderer scenes can be rendered without any synchronization issue reported by the Vulkan validation layers. Design task: #124214 When beginning to render the attachments are being evaluated. If there is an arrayed texture (with multiple layers) the individual layers of that texture can be tracked during until the rendering is ended. When the same texture is bound to a shader it will be a different layer (otherwise there is a feedback loop, which isn't allowed). The bound layers will typically need a different layout the transition to the new layout is executed and recorded. When the rendering ends, the layers are transitioned back to the layout the texture is expected in. It can happen that a layer is used multiple times during the same rendering. In that case the rendering should be suspended to perform the transition. Image layout transitions are not allowed during rendering. There is one place where a layer needs to be transited multiple times that is when EEVEE wants to extract the thickness from the shadow. The thickness is stored inside the gbuffer_normal which is also used as an attachment. Eval then samples the thickness from the gbuffer_normal as a sampler. To work around this issue we suspend the rendering when a `GPU_BARRIER_SHADER_IMAGE_ACCESS` is signaled. Pull Request: https://projects.blender.org/blender/blender/pulls/124407
2024-07-16 16:39:18 +02:00
static_cast<VkImageAspectFlags>(VK_IMAGE_ASPECT_DEPTH_BIT),
0});
break;
}
Vulkan: Device command builder This PR implements a new the threading model for building render graphs based on tests performed last month. For out workload multithreaded command building will block in the driver or device. So better to use a single thread for command building. Details of the internal working is documented at https://developer.blender.org/docs/features/gpu/vulkan/render_graph/ - When a context is activated on a thread the context asks for a render graph it can use by calling `VKDevice::render_graph_new`. - Parts of the GPU backend that requires GPU commands will add a specific render graph node to the render graph. The nodes also contains a reference to all resources it needs including the access it needs and the image layout. - When the context is flushed the render graph is submitted to the device by calling `VKDevice::render_graph_submit`. - The device puts the render graph in `VKDevice::submission_pool`. - There is a single background thread that gets the next render graph to send to the GPU (`VKDevice::submission_runner`). - Reorder the commands of the render graph to comply with Vulkan specific command order rules and reducing possible bottlenecks. (`VKScheduler`) - Generate the required barriers `VKCommandBuilder::groups_extract_barriers`. This is a separate step to reduce resource locking giving other threads access to the resource states when they are building the render graph nodes. - GPU commands and pipeline barriers are recorded to a VkCommandBuffer. (`VKCommandBuilder::record_commands`) - When completed the command buffer can be submitted to the device queue. `vkQueueSubmit` - Render graphs that have been submitted can be reused by a next thread. This is done by pushing the render graph to the `VKDevice::unused_render_graphs` queue. Pull Request: https://projects.blender.org/blender/blender/pulls/132681
2025-01-27 08:55:23 +01:00
context.render_graph().add_node(begin_rendering);
}
void VKFrameBuffer::rendering_ensure(VKContext &context)
{
if (!dirty_state_ && is_rendering_) {
return;
}
if (is_rendering_) {
rendering_end(context);
}
#ifndef NDEBUG
if (G.debug & G_DEBUG_GPU) {
char message[256];
message[0] = '\0';
BLI_assert_msg(this->check(message), message);
}
#endif
const VKExtensions &extensions = VKBackend::get().device.extensions_get();
is_rendering_ = true;
if (extensions.dynamic_rendering) {
rendering_ensure_dynamic_rendering(context, extensions);
}
else {
rendering_ensure_render_pass(context);
}
dirty_attachments_ = false;
dirty_state_ = false;
}
VkFormat VKFrameBuffer::depth_attachment_format_get() const
{
return depth_attachment_format_;
}
VkFormat VKFrameBuffer::stencil_attachment_format_get() const
{
return stencil_attachment_format_;
};
Span<VkFormat> VKFrameBuffer::color_attachment_formats_get() const
{
return color_attachment_formats_;
}
void VKFrameBuffer::rendering_end(VKContext &context)
{
if (!is_rendering_ && use_explicit_load_store_) {
rendering_ensure(context);
}
if (is_rendering_) {
const VKExtensions &extensions = VKBackend::get().device.extensions_get();
render_graph::VKEndRenderingNode::CreateInfo end_rendering = {};
end_rendering.vk_render_pass = VK_NULL_HANDLE;
if (!extensions.dynamic_rendering) {
BLI_assert(vk_render_pass);
end_rendering.vk_render_pass = vk_render_pass;
}
Vulkan: Device command builder This PR implements a new the threading model for building render graphs based on tests performed last month. For out workload multithreaded command building will block in the driver or device. So better to use a single thread for command building. Details of the internal working is documented at https://developer.blender.org/docs/features/gpu/vulkan/render_graph/ - When a context is activated on a thread the context asks for a render graph it can use by calling `VKDevice::render_graph_new`. - Parts of the GPU backend that requires GPU commands will add a specific render graph node to the render graph. The nodes also contains a reference to all resources it needs including the access it needs and the image layout. - When the context is flushed the render graph is submitted to the device by calling `VKDevice::render_graph_submit`. - The device puts the render graph in `VKDevice::submission_pool`. - There is a single background thread that gets the next render graph to send to the GPU (`VKDevice::submission_runner`). - Reorder the commands of the render graph to comply with Vulkan specific command order rules and reducing possible bottlenecks. (`VKScheduler`) - Generate the required barriers `VKCommandBuilder::groups_extract_barriers`. This is a separate step to reduce resource locking giving other threads access to the resource states when they are building the render graph nodes. - GPU commands and pipeline barriers are recorded to a VkCommandBuffer. (`VKCommandBuilder::record_commands`) - When completed the command buffer can be submitted to the device queue. `vkQueueSubmit` - Render graphs that have been submitted can be reused by a next thread. This is done by pushing the render graph to the `VKDevice::unused_render_graphs` queue. Pull Request: https://projects.blender.org/blender/blender/pulls/132681
2025-01-27 08:55:23 +01:00
context.render_graph().add_node(end_rendering);
is_rendering_ = false;
}
}
2023-01-31 15:49:04 +11:00
} // namespace blender::gpu