This makes no sense to have in the draw namespace.
Also take the opportunity for making the coordinates
a float2 and rename them to something more descriptive.
They are actually already some literals with the `f` suffix
that are in our shader codebase and we never had problem in
the past 5 years (or even 8 years).
So I think it is safe to do and improves convergence of codestyles.
Pull Request: https://projects.blender.org/blender/blender/pulls/137352
This is just the shader change.
It allows more freedom for the UI team to tweak the appearance.
The is not functional changes in this patch.
Rel #126334
This port is not so straightforward.
This shader is used in different configurations and is
available to python bindings. So we need to keep
compatibility with different attributes configurations.
This is why attributes are loaded per component and a
uniform sets the length of the component.
Since this shader can be used from both the imm and batch
API, we need to inject some workarounds to bind the buffers
correctly.
The end result is still less versatile than the previous
metal workaround (i.e.: more attribute fetch mode supported),
but it is also way less code.
### Limitations:
The new shader has some limitation:
- Both `color` and `pos` attributes need to be `F32`.
- Each attribute needs to be 4byte aligned.
- Fetch type needs to be `GPU_FETCH_FLOAT`.
- Primitive type needs to be `GPU_PRIM_LINES`, `GPU_PRIM_LINE_STRIP` or `GPU_PRIM_LINE_LOOP`.
- If drawing using an index buffer, it must contain no primitive restart.
Rel #127493
Co-authored-by: Jeroen Bakker <jeroen@blender.org>
Pull Request: https://projects.blender.org/blender/blender/pulls/129315
Add a new shader specifically for node sockets rather than using the
keyframe shader.
Motivation:
1. Allow easier addition of new socket shapes
2. Simplify socket drawing by avoiding special handling of multi-inputs
3. Support multi-inputs for all socket types (diamond, square, etc.)
The new shader is tweaked to look the same to the old ones.
**Comparison**
The biggest difference is that the multi socket is now more consistent
with the other sockets.
For single sockets there can be small size differences depending on zoom
level because the old socket shader always aligned the sockets to the
pixel grid. This could cause a bit of jiggling compared to the rest of
the node when slowly zooming. Therefore I left it out of the new shader
and it now scales strictly linear with the view.
**Multi Socket Types**
While there currently is no need for (.) internally, there are a few
obvious use-cases for multi-input field (diamond) sockets like
generalized math nodes with an arbitrary number of inputs (Add,
Multiply, Minimum etc.).
Co-authored-by: Jacques Lucke <jacques@blender.org>
Pull Request: https://projects.blender.org/blender/blender/pulls/119243
This removes the need for the geometry shader and the
workaround path for Metal.
Note that creating 2 batches for each stroke might become
a bottleneck in bigger scenes. But currently the bottleneck
is always be the fill algorithm. It can be optimized further
if needed.
Rel #127493
Pull Request: https://projects.blender.org/blender/blender/pulls/129274
VSE timeline, when many (hundreds/thousands) of thumbnails were visible, was
very slow to redraw. This PR makes them 3-10x faster to redraw, by stopping
doing things that are slow :) Part of #126087 thumbnail improvements task.
- No longer do mute semitransparency or corner rounding on the CPU, do it in
shader instead.
- Stop creating a separate GPU texture for each thumbnail, on every repaint,
and drawing each thumbnail as a separate draw call. Instead, put thumbnails
into a single texture atlas (using a simple shelf packing algorithm), and
draw them in batch, passing data via UBO. The atlas is still re-created every
frame, but that does not seem to be a performance issue. Thumbnails are
cropped horizontally based on how much of their parts are visible (e.g. a
narrow strip on screen), so realistically the atlas size is kinda
proportional to screen size, and ends up being just several megabytes of data
transfer between CPU -> GPU each frame.
On this Sprite Fright edit timeline view (612 visible thumbnails), time taken
to repaint the timeline window:
- Mac (M1 Max, Metal): 68.1ms -> 4.7ms
- Windows (Ryzen 5950X, RTX 3080Ti, OpenGL): 23.7ms -> 6.8ms
This also fixes a visual issue with thumbnails, where when strips are very
tall, the "rounded corners" that were poked right into the thumbnail bitmap
on the CPU were showing up due to actual bitmap being scaled up a lot.
Pull Request: https://projects.blender.org/blender/blender/pulls/126972
This PR introduces the concept of primitive expansion draws.
This allows to create a drawcall that will generate N amount of new
primitive for an original primitive in a `gpu::Batch`. The intent is to
phase out the use of geometry shader for this purpose.
This adds a new `Frequency::GEOMETRY` only available for SSBOs.
The resources using this will be fed the current `gpu::Batch` VBOs
using name matching.
A dedicated slot is reserved for the index buffer, which has its own
internal lib to decode the index buffer content.
A new attribute lib is added to ease the loading of unaligned attribute.
This should be revisited and made obsolete once more refactor
lands.
It is similar to the Metal backend SSBO vertex fetch path but it is
defined on a different level. The main difference is that this PR is
backend independant and modify the draw module instead of the GPU
module. However, it doesn't cover all possible attribute conversion
cases. This will only be added if needed.
This system is less automatic than the Metal backend one and needs
more care to make sure the data matches what the shader expects.
The Metal system will be removed once all its usage have been
converted.
This PR only shows example usage for workbench shadows. Cleanup PRs
will follow this one.
Rel #105221
Pull Request: https://projects.blender.org/blender/blender/pulls/125782
This allows much easier debugging of shader programs.
Usage is as simple as adding `printf` calls inside shaders.
example: `printf("Formating %d\n", my_var);`
Contrary to the `drw_print`, this is not limited
to draw manager shader dispatch/draws. It is compatible
with any shader inside blender.
Most notably, this doesn't need a viewport to display.
So this can be used to debug render pipeline.
Data formating is currently limited to only `%x`, `%d`,
`%u` and `%f`. This could be easily extended if this is
really needed.
There is no type checking, so values are directly reinterpreted
as specified by the printf format.
The current approach for making this work is to bind a
storage buffer inside `GPU_shader_bind`, making it
available to any shader that needs it. The storage buffer
is downloaded back to CPU after a frame or a render
step and the content printed to the console.
This scheduling means that you cannot rely on these printfs
to detect crashes. We could add a mode to force flushing
at shader binding to avoid this limitation.
The values are written from the shaders in binary form and
only formated on the CPU. This avoid issues with manual
printing like with `drw_print`.
Pull Request: https://projects.blender.org/blender/blender/pulls/125071
This adds support for attaching gizmos for input values. The goal is to make it
easier for users to set input values intuitively in the 3D viewport.
We went through multiple different possible designs until we settled on the one
implemented here. We picked it for it's flexibility and ease of use when using
geometry node assets. The core principle in the design is that **gizmos are
attached to existing input values instead of being the input value themselves**.
This actually fits the existing concept of gizmos in Blender well, but may be a
bit unintutitive in a node setup at first. The attachment is done using links in
the node editor.
The most basic usage of the node is to link a Value node to the new Linear Gizmo
node. This attaches the gizmo to the input value and allows you to change it
from the 3D view. The attachment is indicated by the gizmo icon in the sockets
which are controlled by a gizmo as well as the back-link (notice the double
link) when the gizmo is active.
The core principle makes it straight forward to control the same node setup from
the 3D view with gizmos, or by manually changing input values, or by driving the
input values procedurally.
If the input value is controlled indirectly by other inputs, it's often possible
to **automatically propagate** the gizmo to the actual input.
Backpropagation does not work for all nodes, although more nodes can be
supported over time.
This patch adds the first three gizmo nodes which cover common use cases:
* **Linear Gizmo**: Creates a gizmo that controls a float or integer value using
a linear movement of e.g. an arrow in the 3D viewport.
* **Dial Gizmo**: Creates a circular gizmo in the 3D viewport that can be
rotated to change the attached angle input.
* **Transform Gizmo**: Creates a simple gizmo for location, rotation and scale.
In the future, more built-in gizmos and potentially the ability for custom
gizmos could be added.
All gizmo nodes have a **Transform** geometry output. Using it is optional but
it is recommended when the gizmo is used to control inputs that affect a
geometry. When it is used, Blender will automatically transform the gizmos
together with the geometry that they control. To achieve this, the output should
be merged with the generated geometry using the *Join Geometry* node. The data
contained in *Transform* output is not visible geometry, but just internal
information that helps Blender to give a better user experience when using
gizmos.
The gizmo nodes have a multi-input socket. This allows **controlling multiple
values** with the same gizmo.
Only a small set of **gizmo shapes** is supported initially. It might be
extended in the future but one goal is to give the gizmos used by different node
group assets a familiar look and feel. A similar constraint exists for
**colors**. Currently, one can choose from a fixed set of colors which can be
modified in the theme settings.
The set of **visible gizmos** is determined by a multiple factors because it's
not really feasible to show all possible gizmos at all times. To see any of the
geometry nodes gizmos, the "Active Modifier" option has to be enabled in the
"Viewport Gizmos" popover. Then all gizmos are drawn for which at least one of
the following is true:
* The gizmo controls an input of the active modifier of the active object.
* The gizmo controls a value in a selected node in an open node editor.
* The gizmo controls a pinned value in an open node editor. Pinning works by
clicking the gizmo icon next to the value.
Pull Request: https://projects.blender.org/blender/blender/pulls/112677
* Better separation between drawing backdrop and main line.
* Pass u and v coordinates of line to fragment shader for further processing.
* Remove `colorGradient` which can also be computed from `lineUV`.
* Simplify drawing potentially more than one parallel line as is done in #112677.
VSE timeline strips now have rounded corners. Strip corner rounding radius is
4, 6 or 8px depending on strip height (if strip is too narrow to fit
rounding, then rounding is turned off).
This is achieved with a dedicated GPU shader for drawing most of VSE
strip widget, that it could do proper rounded corner masking.
More details and images in the PR.
Pull Request: https://projects.blender.org/blender/blender/pulls/122576
Overlay texts were previously drawn with two sets of shadows:
- 3px blur,
- 5px blur, slightly offset
But since the shadow color was always set to black, it was still
causing legibility issues when the text itself was dark (set
via theme for example).
This PR adds a new "outline" BLF text decoration, and uses that
for the overlays. And it picks text/outline color depending
on the "background" color of the view.
Details:
- Instead of "shadow level" integer where the only valid options
are 0, 3 or 5, have a FontShadowType enum.
- Add a new FontShadowType::Outline enum entry, that does a 1px
outline by doing a 3x3 dilation in the font shader.
- BLF_draw_default_shadowed is changed to do outline, instead of
drawing the shadow twice.
- In the font shader, instead of encoding shadow type in signs of
the glyph_size, pass that as a "flags" vertex attribute. Put
font texture channel count into the same flags, so that the
vertex size stays the same.
- Well actually, vertex size becomes smaller by 4 bytes, since turns
out glyph_mode vertex attribute was not used for anything at all.
Images in the PR.
Co-authored-by: Harley Acheson <harley.acheson@gmail.com>
Pull Request: https://projects.blender.org/blender/blender/pulls/121383
- Expected results should come before actual result.
- Add test case for 8192 bytes as apple has a push constants size of 4096.
- Add more variation to the first order test data.
Improvements detected when working on vulkan backend and validated they
work on metal and opengl as well.
Pull Request: https://projects.blender.org/blender/blender/pulls/120557
Resolves an issue with stroke rendering in
Metal using the geometry shader fallback
path. Stroke rendering now matches OpenGL
which should enable the GPencil fill tool to
function correctly at all zoom levels.
Authored by Apple: Michael Parkin-White
Pull Request: https://projects.blender.org/blender/blender/pulls/119660
Simplifies/optimizes the "font" shader. It runs faster now too, but primarily
this is so that it loads/initializes faster.
* Instead of doing blur via individual bilinear samples (where each sample is 4
texel fetches), do raw texel fetches of the kernel footprint and compute final
result by shifting the kernel weights according to bilinear fraction weight.
For 5x5 blur, this reduces number of texel fetches from 64 down to 36.
* Instead of checking "is the texel inside the glyph box? if so, then fetch it",
first fetch it, and then set result to zero if it was outside. Simplifies the
branching code flow in the compiled GPU shader.
* Avoid costly integer modulo/division for "unwrapping" the font texture. The
texture width is always power of two size, so division/modulo can be replaced
by masking and a shift. Setup uniforms to contain the needed data.
### Fixes
* The 3x3 blur was not doing a 3x3 blur, due to a copy-pasta typo (one of the
sample offsets was repeated twice, and thus another sample offset was
missing).
* Blur towards left/top edges of the glyphs had artifacts, because float->int
casting in GLSL rounds towards zero, but the code actually wanted to round
towards floor.
Image of how the blur has changed in the PR.
### First time initialization
* Windows 10, NVIDIA RTX 3080Ti, OpenGL: 274.4ms -> 51.3ms
* macOS, Apple M1 Max, Metal: 456ms -> 289ms (this is including PSO creation
time).
### Shader performance/complexity
Performance I only measured on macOS (M1 Max), by making a BLF text that is
scaled up to cover most of screen via Python. Using Xcode Metal profiler,
drawing that text with 5x5 shadow blur: 1.5ms -> 0.3ms.
More performance analysis details in PR.
Pull Request: https://projects.blender.org/blender/blender/pulls/119653
This optimizes a few loops that become significant bottlenecks during
viewport rendering of scenes with large numbers of curves.
To render a curves object, Blender needs to generate a potentially
very large (but trivial) index buffer. As previously implemented,
this index buffer is generated in an extremely inefficient manner,
with a single-threaded loop and an explicit function call per entry.
The buffer then needs to be pushed onto the GPU, which is also a fairly
slow task.
The PR generates the index buffer directly on the GPU with compute
shader.
Pull Request: https://projects.blender.org/blender/blender/pulls/116617
Adds API to allow usage of specialization constants in shaders.
Specialization constants are dynamic runtime constants which can
be compiled into a shader pipeline state object (PSO) to improve
runtime performance by reducing shader complexity through
shader compiler constant-folding.
This API allows specialization constant values to be specified
along with a default value if no constant value has been declared.
Each GPU backend is then responsible for caching PSO permutations
against the current specialization configuration.
This patch adds support for specialization constants in the
Metal backend and provides a generalised high-level solution
which can be adopted by other graphics APIs supporting
this feature.
Authored by Apple: Michael Parkin-White
Authored by Blender: Clément Foucault (files in gpu/test folder)
Pull Request: https://projects.blender.org/blender/blender/pulls/115193
This modify the GBuffer layout to store less bits per closures.
This allows packing all closures into 64 bits or 96 bits.
In turn, this reduces the amount of data stored for most
usual materials.
Moreover, this contain some groundwork for the getting rid of the
hard-coded closure type. But evaluation shaders still use
the hard-coded types.
This adds tests for checking packing and unpacking of the gbuffer
doesn't loose any data.
Related to #115966
Pull Request: https://projects.blender.org/blender/blender/pulls/116476
Improvements to the drawing of shadows, used with blocks, menus, nodes,
etc. Improvements to shape, especially at the top corner or at extremes
of widget roundness. Allows transparent objects to have shadows. This
is a nice refactor that removes a lot of code.
Pull Request: https://projects.blender.org/blender/blender/pulls/111794
This adds correct object bounds estimation.
This works by creating an occupancy texture where one
bit represents one froxel. A geometry pre-pass fill this
occupancy texture and doesn't do any shading. Each bit
set to 0 will not be considered occupied by the object
volume and will discard the material compute shader for
this froxel.
There is 2 method of computing the occupancy map:
- Atomic XOR: For each fragment we compute the amount of
froxels **center** in-front of it. We then convert that
into occupancy bitmask that we apply to the occupancy
texture using `imageAtomicXor`. This is straight forward
and works well for any manifold geometry.
- Hit List: For each fragment we write the fragment depth
in a list (contained in one array texture). This list
is then processed by a fullscreen pass (see
`eevee_occupancy_convert_frag.glsl`) that sorts and
converts all the hits to the occupancy bits. This
emulate Cycles behavior by considering only back-face
hits as exit events and front-face hits as entry events.
The result stores it to the occupancy texture using
bit-wise `OR` operation to compose it with other non-hit
list objects. This also decouple the hit-list evaluation
complexity from the material evaluation shader.
## Limitations
### Fast
- Non-manifolds geometry objects are rendered incorrectly.
- Non-manifolds geometry objects will affect other objects
in front of them.
### Accurate
- Limited to 16 hits per layer for now.
- Non-manifolds geometry objects will affect other objects
in front of them.
Pull Request: https://projects.blender.org/blender/blender/pulls/113731
Adjust the width, dash length and amount of anti-aliasing of node links
so they look the same independent of the UI scaling.
Adding another parameter to the shader exceeded the limit of 16
attributes. Therefore the parameters to describe the dashes (length,
factor, alpha) are passed in together as a vector.
Ref #102919
Pull Request: https://projects.blender.org/blender/blender/pulls/111270
Splitting interface stages based on the interpolation of its
attributes.
Then naming convention that have been used:
* use `interp` as instance name when using smooth interpolation
* use `interp_noperspective` when using no perpective interpolation
* use `interp_flat` when using flat interpolation
The same suffix will be added to the struct names.
The naming convention will be added to the GLSL code style and
applied to other shaders as well.
Pull Request: https://projects.blender.org/blender/blender/pulls/111210
Listing the "Blender Foundation" as copyright holder implied the Blender
Foundation holds copyright to files which may include work from many
developers.
While keeping copyright on headers makes sense for isolated libraries,
Blender's own code may be refactored or moved between files in a way
that makes the per file copyright holders less meaningful.
Copyright references to the "Blender Foundation" have been replaced with
"Blender Authors", with the exception of `./extern/` since these this
contains libraries which are more isolated, any changed to license
headers there can be handled on a case-by-case basis.
Some directories in `./intern/` have also been excluded:
- `./intern/cycles/` it's own `AUTHORS` file is planned.
- `./intern/opensubdiv/`.
An "AUTHORS" file has been added, using the chromium projects authors
file as a template.
Design task: #110784
Ref !110783.
Add a High Dynamic Range option in the Color Management > Display panel.
This enables display of extended color ranges above 1.0 for the 3D
viewport, image editor and render previews.
This requires a monitor that can display HDR colors, and a view
transform designed for HDR output. The Standard view transform works,
but Filmic does not as it was designed to bring values into the 0..1
range for SDR displays.
This patch is limited to allowing the display to visualize extended
colors, but does not include future looking work to better integrate HDR
into the full workflow.
It is implemented by rendering to high bit-depth texture formats for
the user interface, and uncapping the color range in color management.
Authored by Apple: Michael Parkin-White
Pull Request: https://projects.blender.org/blender/blender/pulls/105662
A lot of files were missing copyright field in the header and
the Blender Foundation contributed to them in a sense of bug
fixing and general maintenance.
This change makes it explicit that those files are at least
partially copyrighted by the Blender Foundation.
Note that this does not make it so the Blender Foundation is
the only holder of the copyright in those files, and developers
who do not have a signed contract with the foundation still
hold the copyright as well.
Another aspect of this change is using SPDX format for the
header. We already used it for the license specification,
and now we state it for the copyright as well, following the
FAQ:
https://reuse.software/faq/
The goal is to solve confusion of the "All rights reserved" for licensing
code under an open-source license.
The phrase "All rights reserved" comes from a historical convention that
required this phrase for the copyright protection to apply. This convention
is no longer relevant.
However, even though the phrase has no meaning in establishing the copyright
it has not lost meaning in terms of licensing.
This change makes it so code under the Blender Foundation copyright does
not use "all rights reserved". This is also how the GPL license itself
states how to apply it to the source code:
<one line to give the program's name and a brief idea of what it does.>
Copyright (C) <year> <name of author>
This program is free software ...
This change does not change copyright notice in cases when the copyright
is dual (BF and an author), or just an author of the code. It also does
mot change copyright which is inherited from NaN Holding BV as it needs
some further investigation about what is the proper way to handle it.
**What are push constants?**
Push constants is a way to quickly provide a small amount of uniform data to shaders.
It should be much quicker than UBOs but a huge limitation is the size of data - spec
requires 128 bytes to be available for a push constant range.
**What are the challenges with push constants?**
The challenge with push constants is that the limited available size. According to
the Vulkan spec each platform should at least have 128 bytes reserved for push
constants. Current Mesa/AMD drivers supports 256 bytes, but Mesa/Intel is only 128
bytes.
**What is our solution?**
Some shaders of Blender uses more than these boundaries. When more data is needed
push constants will not be used, but the shader will be patched to use an uniform
buffer instead. This mechanism will be part of the Vulkan backend and shader
developers should not see any difference on API level.
**Known limitations**
Current state of the vulkan backend does not track resources that are in the
command queue. This patch includes some test cases that identified this issue as
well. See #104771.
Pull Request #104880
GPencil 3D stroke rendering uses a geometry shader.
This is unsupported by the Metal backend, so implement
fix for this failing compilation by shifting geometry shader
logic into the Vertex shader for Metal backend.
Authored by Apple: Michael Parkin-White
Ref #96261
Pull Request #105143
Resolves issue with nearest filtering on UI Icons. Note that as
Metal does not support LOD bias as a parameter on a sampler
object, the original code has been modified to perform LOD
biasing at the shader level.
As GPU_SAMPLER_ICON is not widely used, it is more
efficient to apply directly to the affected shaders, rather
than workaround passing in the sampler LOD bias as a
separate value e.g. uniform or push constant.
Original PR feedback addressed to also refactor ICON
shaders to use consistent style for single and multi
Icon rendering.
Authored by Apple: Michael Parkin-White
Ref #96261
Pull Request #105145
Cycles fallback display shader previously did not use viewport.
This would crash or cause the display not to show when using
GPU backends other than OpenGL, if another display shader
was unavailable.
Now use ShaderCreateInfo for Cycles fallback display.
Authored by Apple: Michael Parkin-White
Ref #96261
Pull Request #104987
The stoke shader of grease pencil uses a geometry shader stage. Apple
devices don't support shaders with geometry shader stage. In the
OpenGL driver there was a pass-through implemented so it didn't fail.
When using the metal backend this needs to be solved more explicitly.
This change patches the grease pencil shader to support both the
backends supporting a geometry stage and those without.
Fixes#105059
Pull Request #105116
This patch adds initial support for compute shaders to
the vulkan backend. As the development is oriented to the test-
cases we have the implementation is limited to what is used there.
It has been validated that with this patch that the following test
cases are running as expected
- `GPUVulkanTest.gpu_shader_compute_vbo`
- `GPUVulkanTest.gpu_shader_compute_ibo`
- `GPUVulkanTest.gpu_shader_compute_ssbo`
- `GPUVulkanTest.gpu_storage_buffer_create_update_read`
- `GPUVulkanTest.gpu_shader_compute_2d`
This patch includes:
- Allocating VkBuffer on device.
- Uploading data from CPU to VkBuffer.
- Binding VkBuffer as SSBO to a compute shader.
- Execute compute shader and altering VkBuffer.
- Download the VkBuffer to CPU ram.
- Validate that it worked.
- Use device only vertex buffer as SSBO
- Use device only index buffer as SSBO
- Use device only image buffers
GHOST API has been changed as the original design was created before
we even had support for compute shaders in blender. The function
`GHOST_getVulkanBackbuffer` has been separated to retrieve the command
buffer without a backbuffer (`GHOST_getVulkanCommandBuffer`). In order
to do correct command buffer processing we needed access to the queue
owned by GHOST. This is returned as part of the `GHOST_getVulkanHandles`
function.
Open topics (not considered part of this patch)
- Memory barriers & command buffer encoding
- Indirect compute dispatching
- Rest of the test cases
- Data conversions when requested data format is different than on device.
- GPUVulkanTest.gpu_shader_compute_1d is supported on AMD devices.
NVIDIA doesn't seem to support 1d textures.
Pull-request: #104518