This commit adds the "is volume scatter" output to the light path node
in the shader editor.
All the funcitonal code for this feature already exists in Cycles SVM
and OSL, but the output wasn't exposed on the node.
EEVEE does not support the feature, so it's output will
always be zero.
Pull Request: https://projects.blender.org/blender/blender/pulls/134343
This patch adds the texture pool functionality that was previously
only available in the DRW module to the GPU module.
This allows to not rely on global `DST` variable for the managment
of these temporary textures.
Moreover, this can be extended using dedicated GPU backend
specific behavior to reduce the amount of memory needed
to render.
The implementation is mostly copy pasted from the draw implementation
but with more documentation. Also it is simplified since the
`DRW_texture_pool_query` functionality is not needed.
Pull Request: https://projects.blender.org/blender/blender/pulls/134403
There seems to be a pattern where this commonly failed.
This patch adds the async flush (which is effectively not async)
when there were no previous call to `async_flush_to_host`.
This is only done on Intel Macs (or any mac that has non
unified memory arch).
Pull Request: https://projects.blender.org/blender/blender/pulls/134216
The removal of the loose uniform made the shader not compile.
This patch adds a new define for these type of shaders and add
back the loose uniform.
Note that these shaders might no longer work on Metal as
the source is not parsed anymore.
Pull Request: https://projects.blender.org/blender/blender/pulls/134341
Use sub-pixel differentials for bump mapping helps with reducing
artifacts when objects are moving or when textures have high frequency
details.
Currently we scale it by 0.1 because it seems to work good in practice,
we can adjust the value in the future if it turns out to be impractical.
Ref: #122892
Pull Request: https://projects.blender.org/blender/blender/pulls/133991
Previous implementation used the resource state tracker which is a hash
table lookup. `is_link_to_buffer` is a bit cheaper as it is compares
already loaded data.
This PR changes the resource locking when reordering render graph
nodes. Reordering could be done without locking resources. No measurable
speedup detected.
Pull Request: https://projects.blender.org/blender/blender/pulls/134032
This PR fixes an issue that shaders compilation could stall. This could
be seen in the viewport (sometime not showing first EEVEE render) but
was more prominent when running test cases.
Pull Request: https://projects.blender.org/blender/blender/pulls/134020
Native tile input wasn't part of the MTLCapability struct, but stored locally
in the shader generator and checked in MTLFramebuffer. This PR moves it
to the MTLCapability struct and disables it when workarounds are forced.
Pull Request: https://projects.blender.org/blender/blender/pulls/133818
This allows to run with the --debug-gpu option (which
does NAN and 0xF0F0F0F0 clearing) without asserts
even when the texture atomic workaround is enabled.
The refactor 9c0321ae9b
had the wrong mental model of the backing texture
layout for the atomic workaround.
For 3D textures, the layout is breaking the 3D texture
and reinterpreting the linear location as its 2D
linear location. This breaks the 3D texture Z slices
into non contiguous regions in 2D.
Comments have been added to avoid future confusion.
Pull Request: https://projects.blender.org/blender/blender/pulls/133830
VkBufferViews could be used after they were freed. The reason is that
they were not managed by the discard pool. Detected when looking in
failing render tests (pointcloud_motion.blend).
This part of the API is used by motion blur in EEVEE. Fixes the next
render tests
- `eevee_next_motion_blur_vulkan`
- `eevee_next_pointcloud_vulkan`
- `eevee_next_hair_vulkan`
Related: #133546
Pull Request: https://projects.blender.org/blender/blender/pulls/133856
This was caused by the subpass input workaround for non-tilebased
GPU using `texelFetch` on an `image`. This was supported before
the cleanup 9c0321ae9b.
But is against the GLSL specification and was removed inside the
cleanup.
Using `imageLoad` instead of `texelFetch` fixes the crash.
However rendering seems to be broken for other reasons.
In renderdoc the debug stack got corrupted when render graphs where
reused. The previous usage didn't clear the stack. This PR clears
the debug stack when render graphs are reset.
Previously, there was a `StringRef.copy` method which would copy the string into
the given buffer. However, it was not defined for the case when the buffer was
too small. It moved the responsibility of making sure the buffer is large enough
to the caller.
Unfortunately, in practice that easily hides bugs in builds without asserts
which don't come up in testing much. Now, the method is replaced with
`StringRef.copy_utf8_truncated` which has much more well defined semantics and
also makes sure that the string remains valid utf-8.
This also renames `unsafe_copy` to `copy_unsafe` to make the naming more similar
to `copy_utf8_truncated`.
Pull Request: https://projects.blender.org/blender/blender/pulls/133677
Attribute name could be a path built from multiple object/property names
while each of them can be 64 symbols long.
This was fixed by cff53fdb53, so Cycles
can handle this. But eevee need additional change.
Pull Request: https://projects.blender.org/blender/blender/pulls/131183
Recently it came to out attention that macOs13 doesn't always work due
to texture atomics not supported by that version of the OS.
Development happens most of the time on newer versions of the OS without
ability to check if it still works on the older versions.
This PR enables to disable some Metal capabilities to better check how
Blender works on those OS's. The capabilities that will be disabled
are texture gathering and texture atomics. It doesn't disable the
capabilities that are required to start Blender, which are still
part of the `MTLCapabilities` struct.
This allows us to reproduce issues like #129571
Pull Request: https://projects.blender.org/blender/blender/pulls/133636
This PR implements a new the threading model for building render graphs
based on tests performed last month. For out workload multithreaded
command building will block in the driver or device. So better to use a
single thread for command building.
Details of the internal working is documented at https://developer.blender.org/docs/features/gpu/vulkan/render_graph/
- When a context is activated on a thread the context asks for a
render graph it can use by calling `VKDevice::render_graph_new`.
- Parts of the GPU backend that requires GPU commands will add a
specific render graph node to the render graph. The nodes also
contains a reference to all resources it needs including the
access it needs and the image layout.
- When the context is flushed the render graph is submitted to the
device by calling `VKDevice::render_graph_submit`.
- The device puts the render graph in `VKDevice::submission_pool`.
- There is a single background thread that gets the next render
graph to send to the GPU (`VKDevice::submission_runner`).
- Reorder the commands of the render graph to comply with Vulkan
specific command order rules and reducing possible bottlenecks.
(`VKScheduler`)
- Generate the required barriers `VKCommandBuilder::groups_extract_barriers`.
This is a separate step to reduce resource locking giving other
threads access to the resource states when they are building
the render graph nodes.
- GPU commands and pipeline barriers are recorded to a VkCommandBuffer.
(`VKCommandBuilder::record_commands`)
- When completed the command buffer can be submitted to the device
queue. `vkQueueSubmit`
- Render graphs that have been submitted can be reused by a next
thread. This is done by pushing the render graph to the
`VKDevice::unused_render_graphs` queue.
Pull Request: https://projects.blender.org/blender/blender/pulls/132681
Vulkan shader compiler accesses the cache folder via multiple threads.
GHOST part isn't thread safe and can return and overwrite the returned
cache path. This resulted into crashes when performing background
rendering and failing test cases, loading of incorrect shaders etc.
This PR fixes this to cache the cache folder location in the
VKShaderCompiler, which is loaded via the main thread when the vulkan
backend is initialized.
Pull Request: https://projects.blender.org/blender/blender/pulls/133535
Memory areas was requested to be preferable host visible. On some
platforms this would fail to allocate. Best is to not add preferable
host visible for typically large allocations.
This PR also gives the caller the responsibility to set the allocation flags.
Pull Request: https://projects.blender.org/blender/blender/pulls/133528
The issue was twofold, the `draw_tests` library was missing a link
dependency on `gpu_tests`, and the `gpu_tests` would only be generated
if `WITH_GPU_BACKEND_TESTS` or `WITH_VULKAN_BACKEND` were also ON due
to a superflous condition.
Pull Request: https://projects.blender.org/blender/blender/pulls/133511
Cycles uses pixel buffers to update the display. Due to making things
work the vulkan backend downloaded the GPU allocated pixel buffer to the
CPU, Copied it to a GPU allocated staging buffer and update the display
texture using the staging buffer. Needless to say that a (CPU->)GPU->CPU->GPU
roundtrip is a bottleneck.
This PR fixes this by allowing the pixel buffer to act as a staging
buffer as well.
Viewport and final image rendering performance is now also similar.
| **Render** | **GPU Backend** | **Path tracing** | **Display** |
| ---------- | --------------- | ---------------- | ----------- |
| Viewport | OpenGL | 2.7 | 0.06 |
| Viewport | Vulkan | 2.7 | 0.04 |
| Image | OpenGL | 3.9 | 0.02 |
| Image | Vulkan | 3.9 | 0.02 |
Tested on:
```
Operating system: Linux-6.8.0-49-generic-x86_64-with-glibc2.39 64 Bits, X11 UI
Graphics card: AMD Radeon Pro W7700 (RADV NAVI32) Advanced Micro Devices radv Mesa 24.3.1 - kisak-mesa PPA Vulkan Backend
```
Pull Request: https://projects.blender.org/blender/blender/pulls/133485
Color to grayscale conversions should take into account the colorspace,
and these are considered to be in scene linear colorspace.
Note the RBG to BW node implementation is used for implicit conversions,
so that is covered as well.
No change with the default configuration.
Pull Request: https://projects.blender.org/blender/blender/pulls/133368
VKRenderGraphNode is 892 bytes and most of the bytes are used for
specific nodes. By storing large structs in separate vectors we can
reduce the needed memory and improve cache pre-fetching.
With this change the VKRenderGraphNode is reduced to 64 bytes.
On a (50 frames shader_balls.blend) the end user performance is improved by
2%.
| **Platform** | **Before** | **After** |
| ---------------- | ---------- | --------- |
| AMD W7700 | 1409 ms | 1383 ms |
| NVIDIA RTX 6000 | 1443 ms | 1428 ms |
Pull Request: https://projects.blender.org/blender/blender/pulls/133317
Images used to be tracked with ownership in order to reset swap chain
images to its original layout. This isn't used anymore as we always mark
them in VK_IMAGE_LAYOUT_UNDEFINED to make the first pipeline barrier a
nop.
This change reduces unneeded complexity and safe a few CPU cycles.
Pull Request: https://projects.blender.org/blender/blender/pulls/133197