This is a new implementation of the draw manager using modern
rendering practices and GPU driven culling.
This only ports features that are not considered deprecated or to be
removed.
The old DRW API is kept working along side this new one, and does not
interfeer with it. However this needed some more hacking inside the
draw_view_lib.glsl. At least the create info are well separated.
The reviewer might start by looking at `draw_pass_test.cc` to see the
API in usage.
Important files are `draw_pass.hh`, `draw_command.hh`,
`draw_command_shared.hh`.
In a nutshell (for a developper used to old DRW API):
- `DRWShadingGroups` are replaced by `Pass<T>::Sub`.
- Contrary to DRWShadingGroups, all commands recorded inside a pass or
sub-pass (even binds / push_constant / uniforms) will be executed in order.
- All memory is managed per object (except for Sub-Pass which are managed
by their parent pass) and not from draw manager pools. So passes "can"
potentially be recorded once and submitted multiple time (but this is
not really encouraged for now). The only implicit link is between resource
lifetime and `ResourceHandles`
- Sub passes can be any level deep.
- IMPORTANT: All state propagate from sub pass to subpass. There is no
state stack concept anymore. Ensure the correct render state is set before
drawing anything using `Pass::state_set()`.
- The drawcalls now needs a `ResourceHandle` instead of an `Object *`.
This is to remove any implicit dependency between `Pass` and `Manager`.
This was a huge problem in old implementation since the manager did not
know what to pull from the object. Now it is explicitly requested by the
engine.
- The pases need to be submitted to a `draw::Manager` instance which can
be retrieved using `DRW_manager_get()` (for now).
Internally:
- All object data are stored in contiguous storage buffers. Removing a lot
of complexity in the pass submission.
- Draw calls are sorted and visibility tested on GPU. Making more modern
culling and better instancing usage possible in the future.
- Unit Tests have been added for regression testing and avoid most API
breakage.
- `draw::View` now contains culling data for all objects in the scene
allowing caching for multiple views.
- Bounding box and sphere final setup is moved to GPU.
- Some global resources locations have been hardcoded to reduce complexity.
What is missing:
- ~~Workaround for lack of gl_BaseInstanceARB.~~ Done
- ~~Object Uniform Attributes.~~ Done (Not in this patch)
- Workaround for hardware supporting a maximum of 8 SSBO.
Reviewed By: jbakker
Differential Revision: https://developer.blender.org/D15817
This was an oversight as the matrix multiplication present in original
code was reversed.
Reviewed By: jbakker
Differential Revision: https://developer.blender.org/D15858
This was caused by un-wanted normalization. This is a requirement of
the MikkTspace. The issue is that g_data.N is expected to be normalized
by many other functions and overriden by bump displacement.
Adding a new global variable containing the interpolated normal fixes the
issue AND make it match cycles behavior better (mix between bump and
interpolated normal).
Workaround the issue by adding an intermediate function. This is usually
the case when working with attributes.
Reviewed By: jbakker
Differential Revision: https://developer.blender.org/D15860
This uses the same sample classification approach as used for PMJ,
because it turns out to also work equally well with Sobol-Burley.
This also implements a fallback (random classification) that should
work "okay" for other samplers, though there are no other samplers
at the moment.
Differential Revision: https://developer.blender.org/D15845
This patch implements the dilate/erode node for the realtime compositor.
Differential Revision: https://developer.blender.org/D15790
Reviewed By: Clement Foucault
Since rB6269d66da, creating formats no longer depends solely on the
shader, but now depends on the dimensions used to fill the VBOs.
This allows 3D shaders to work flawlessly when assigned dimensions are
2D.
So there's no real benefit to us having shaders that are limited to 2D
use anymore.
This limitation makes it difficult to implement other builtin shaders
as they indirectly require a 2D version.
So this commit removes the 2D versions of the builtin sahders used in
Python , renames the string enums but keeps the old enums working for
backward compatibility.
(This brings parts of the changes reviewed in D15836).
This particular GPU driver does not constant fold all the way in order
to discard the unused branches.
To workaround that, we introduce a series of material flag that generates
defines that only keep used branches.
Reviewed By: jbakker
Differential Revision: https://developer.blender.org/D15852
Upcoming cryptomatte patch would need access to these defines. So moving
them from film_lib to shader shared. We cannot include the film_lib as
it requires images/textures to be bound that we don't need.
At the same time fixes incorrect casing (`lAYER` => `LAYER`).
Full support for translation and compilation of shaders in Metal, using
GPUShaderCreateInfo. Includes render pipeline state creation and management,
enabling all standard GPU viewport rendering features in Metal.
Authored by Apple: Michael Parkin-White, Marco Giordano
Ref T96261
Reviewed By: fclem
Maniphest Tasks: T96261
Differential Revision: https://developer.blender.org/D15563
- Adding in compatibility paths to support minimum per-vertex strides for vertex formats. OpenGL supports a minimum stride of 1 byte, in Metal, this minimum stride is 4 bytes. Meaing a vertex format must be atleast 4-bytes in size.
- Replacing transform feedback compile-time check to conditional look-up, given TF is supported on macOS with Metal.
- 3D texture size safety check added as a general capability, rather than being in the gl backend only. Also required for Metal.
Authored by Apple: Michael Parkin-White
Ref T96261
Reviewed By: fclem
Maniphest Tasks: T96261
Differential Revision: https://developer.blender.org/D14510
Implementation also contains a number of optimisations and feature enablements specific to the Metal API and Apple Silicon GPUs.
Ref T96261
Reviewed By: fclem
Maniphest Tasks: T96261
Differential Revision: https://developer.blender.org/D15369
As pointed out in D15827 comment, the unique_ptr usage in
ShaderNodetreeWrap related code does not sound very useful. Looking at
it, whole ShaderNodetreeWrap does not make much sense - it's only
ever created, and then immediately just one thing is fetched from it.
This very much sounds like "a function", so make it just that -
header file contains just a `create_mtl_node_tree` function, and the
whole implementation is hidden from the users. Which I've also
simplified into just a handful of freestanding functions.
No functionality or performance changes, but the code does get ~80
lines shorter.
Several visual tweaks to node links to make them overall fit in
better with the look of the node editor:
- Change the link thickness with the zoom level to a certain degree.
- Remove the fuzziness of the node link and its shadow/outline.
- The link outline color can now be made transparent.
- Add circles at the end of dragged links when connecting to sockets.
- Improve the banding of the color interpolation along the link.
- Adjust the spacing of dashes along straight node links.
Reviewed By: Pablo Vazquez, Hans Goudey
Differential Revision: http://developer.blender.org/D15036
The material indices have been moved out of MPoly since f1c0249f34.
That conversion happens in file reading code currently, so the material
indices have to be accessed the new way everywhere.
Fixes issues in importers written in C++ (T100737):
- Materials had one reference count too much. Affected Collada,
Alembic, USD, OBJ importers, looks like "since forever".
- Active material index was not properly set on imported meshes.
Regression since 3.3 (D15145). Affected Alembic, USD, OBJ. Note:
now it sets the first material as the active one, whereas
previously the last one was set as active. First one sounds more
"intuitive" to me.
Reviewed By: Bastien Montagne
Differential Revision: https://developer.blender.org/D15831
The "set default" callback doesn't need to be defined since it falls
back to clearing the memory, but since "construct" is optional, it
needs to be defined. Mistake in 25237d2625.
Blender may not apply certain UI data changes immediately when done via BPY.
This is a rather typical gotcha, better to have it documented.
Reviewed By: campbellbarton
Differential Revision: https://developer.blender.org/D15614
The multi-dimensional Sobol pattern required us to carefully use as low
dimensions as possible, as quality goes down in higher dimensions. Now that we
have two sampling patterns that are at least as good, there is no need to keep
it around and the implementation can be simplified.
Differential Revision: https://developer.blender.org/D15788
Fix two issues in the previous implementation:
* Only power-of-two prefixes were progressively stratified, not suffixes.
This resulted in unnecessarily increased noise when using non-power-of-two
sample counts.
* In order to try to get away with just a single sample pattern, the code
used a combination of sample index shuffling and Cranley-Patterson rotation.
Index shuffling is normally fine, but due to the sample patterns themselves
not being quite right (as described above) this actually resulted in
additional increased noise. Cranley-Patterson, on the other hand, always
increases noise with randomized (t,s) nets like PMJ02, and should be avoided
with these kinds of sequences.
Addressed with the following changes:
* Replace the sample pattern generation code with a much simpler algorithm
recently published in the paper "Stochastic Generation of (t, s) Sample
Sequences". This new implementation is easier to verify, produces fully
progressively stratified PMJ02, and is *far* faster than the previous code,
being O(N) in the number of samples generated.
* It keeps the sample index shuffling, which works correctly now due to the
improved sample patterns. But it now uses a newer high-quality hash instead
of the original Laine-Karras hash.
* The scrambling distance feature cannot (to my knowledge) be implemented with
any decorrelation strategy other than Cranley-Patterson, so Cranley-Patterson
is still used when that feature is enabled. But it is now disabled otherwise,
since it increases noise.
* In place of Cranley-Patterson, multiple independent patterns are generated
and randomly chosen for different pixels and dimensions as described in the
original PMJ paper. In this patch, the pattern selection is done via
hash-based shuffling to ensure there are no repeats within a single pixel
until all patterns have been used.
The combination of these fixes brings the quality of Cycles' PMJ sampler in
line with the previously submitted Sobol-Burley sampler in D15679. They are
essentially indistinguishable in terms of quality/noise, which is expected
since they are both randomized (0,2) sequences.
Differential Revision: https://developer.blender.org/D15746
Use lowercase rgba channel names which still by-passes lossy nature
of DWA compression and which also keeps external compositing tools
happy.
Thanks Steffen Dünner for testing this patch!
Differential Revision: https://developer.blender.org/D15834
The old way of creating shaders is being replaced by using
`GPUShaderCreateInfo` for more portability.
Reviewed By: fclem
Differential Revision: https://developer.blender.org/D14634
With the new `attrs_info_get` method, we can get information about
the attributes used in a `GPUShader` and thus have more freedom in the
automatic creation of `GPUVertFormat`s
Reviewed By: fclem, campbellbarton
Differential Revision: https://developer.blender.org/D15764
The DWA compression code in OpenEXR has hardcoded rules which decides
which channels are lossy or lossless. There is no control over these
rules via API.
This change makes it so channel names of xyzw is used for cryptomatte
passes in Cycles. This works around the hardcoded rules in the DWA code
making it so lossless compression is used. It is important to use lower
case y channel name as the upper case Y uses lossy compression.
The change in the channel naming also makes it so the write code uses
32bit for the cryptomatte even when saving half-float EXR.
Fixes T96933: Cryptomatte layers saved incorrectly with EXR DWA compression
Fixes T88049: Cryptomatte EXR Output Bit Depth should always be 32bit
Differential Revision: https://developer.blender.org/D15823
An ID created with regualr ID management code should never ever be
directly freed directly.
For embedded nodetrees, there is a dedicated function.
Reviewed By: aras_p
Differential Revision: https://developer.blender.org/D15827