Commit Graph

39 Commits

Author SHA1 Message Date
Brecht Van Lommel
fb2ba20b67 Refactor: Use more typed MEM_calloc<> and MEM_malloc<>
Pull Request: https://projects.blender.org/blender/blender/pulls/137822
2025-04-22 11:22:18 +02:00
Brecht Van Lommel
637c6497e9 Refactor: Use more typed MEM_calloc<>, avoid unnecessary size_t cast
Handle some cases that were missed in previous refactor. And eliminate
unnecessary size_t casts as these could hide issues.

Pull Request: https://projects.blender.org/blender/blender/pulls/137404
2025-04-21 17:59:41 +02:00
Campbell Barton
12e3a046a5 Cleanup: remove unused defines 2025-03-29 11:49:08 +11:00
Hans Goudey
9b1a5a1c43 Refactor: Draw: Further changes to mesh buffer extraction
Followup to 9b70851d91.

Return buffers by value rather than creating an empty/uninitialized
buffer first, then initializing it in an extraction function. This generally
makes the code easier to follow. And avoiding these half-created buffers
is an essential step to adding some sort of more global cache.

Pull Request: https://projects.blender.org/blender/blender/pulls/136570
2025-03-27 16:52:55 +01:00
Hans Goudey
e5d9b7a43f Cleanup: GPU: Remove unused index buffer function
This information is set when creating the indices,
there should be no need to compute it later.

Pull Request: https://projects.blender.org/blender/blender/pulls/135986
2025-03-14 16:55:55 +01:00
Brecht Van Lommel
c7502b092d Cleanup: Various clang-tidy warnings in gpu
Pull Request: https://projects.blender.org/blender/blender/pulls/133734
2025-01-31 17:03:18 +01:00
Clément Foucault
324517fd78 Cleanup: GPU: Fix clang tidy warnings
Removes some other things like:
- `TRUST_NO_ONE` which was the same as `#ifndef NDEBUG`.
- Replace `reinterpret_cast` by `unwrap`

Pull Request: https://projects.blender.org/blender/blender/pulls/129631
2024-10-31 15:18:29 +01:00
Hans Goudey
da1ea4cdd1 Revert "Draw: Avoid temporary copy for mesh triangulation index buffer"
This reverts commit 108ab1df2d.

This causes issues when duplicating objects that I don't have time
to investigate right now.
2024-05-23 23:43:34 -04:00
Hans Goudey
108ab1df2d Draw: Avoid temporary copy for mesh triangulation index buffer
The mesh triangulation data is stored in CPU memory with the same format
as the triangles GPU index buffer. Because of that we can skip creating a
temporary copied owned by the GPU API. One way to do that is to just
upload the data directly and avoid keeping a reference to it. However, we
can only upload GPU data from the main thread with OpenGL, so instead
reference the data and keep track of whether to free it.

When drawing a mesh with a single material and 1.8 million faces, this
change gives a 12-15% improvement in framerate, from about 32 to 37 FPS.

Part of #116901.

Pull Request: https://projects.blender.org/blender/blender/pulls/122175
2024-05-23 19:59:36 +02:00
Hans Goudey
07eac30070 Mesh: Points index buffer improvement
A continuation of #116901. This one doesn't have a performance impact
in my testing. It also adds a bit more code compared to main so it isn't
really obviously "better" like the previous refactors. However it does
get us closer to removing the "extractors" callback iteration loop
(`edit_data` is the only other enabled by default), and I'd argue that
the final code is easier to iterate on in the future since it's more
self-contained.

I made an effort to avoid storing restart indices in the index buffers.
Though this requires a bit more calculation on the CPU (particularly
because the hidden gaps in the IBO need to be compressed out), it
reduces overall CPU->GPU traffic and removes the need to strip the
restart indices on Metal.

Pull Request: https://projects.blender.org/blender/blender/pulls/122084
2024-05-22 16:08:55 +02:00
Hans Goudey
0f46e02310 Mesh: Draw triangle index buffer creation improvements
This PR is another step in the refactor described by #116901.
This change applies to the triangle index buffer. The main improvement
is the ability to recognize when the mesh corner triangles index array
can be uploaded directly (when there is a single material and no hidden
faces). In that case the index data should be copied directly to the
GPU rather than to a temporary array owned by the IBO first. Though
that isn't implemented yet since it will be handled by the GPU module
later, the code is now structured to make that change simple from the
data extraction perspective.

Other than that, the main change is to not use the extractor iterator
framework anymore, and to set index data directly instead of using GPU
API functions. Though we're mainly bottlenecked by memory-bandwidth
anyway, it's nice to avoid function call overhead.

We also now avoid creating the array of sorted triangle indices when
there is a single material and no hidden faces. And we don't use
restart indices for the single-material case anymore. For Metal that's
nice because we can avoid `strip_restart_indices`.

I didn't notice significant performance improvements in my test files
beyond a few percent here and there. With a hacked implementation of
the copy-directly-to-the-gpu optimization, I did see more consistent
improvements though.

Pull Request: https://projects.blender.org/blender/blender/pulls/119130
2024-04-11 04:49:27 +02:00
Hans Goudey
fe76d8c946 Refactor: Remove unnecessary C wrappers for vertex and index buffers
Now that all relevant code is C++, the indirection from the C struct
`GPUVertBuf` to the C++ `blender::gpu::VertBuf` class just adds
complexity and necessitates a wrapper API, making more cleanups like
use of RAII or other C++ types more difficult.

This commit replaces the C wrapper structs with direct use of the
vertex and index buffer base classes. In C++ we can choose which parts
of a class are private, so we don't risk exposing too many
implementation details here.

Pull Request: https://projects.blender.org/blender/blender/pulls/119825
2024-03-24 16:38:30 +01:00
Hans Goudey
8b514bccd1 Cleanup: Move remaining GPU headers to C++
Pull Request: https://projects.blender.org/blender/blender/pulls/119807
2024-03-23 01:24:18 +01:00
laurynas
aa3ffca8dc Fix #119247: Curves: Extra point in evaluated spline of Curves geometry
In bf17fc8d79  after extending buffer to multiple of 4 there appeared trailing
space in buffer not covered by shader's `for` loop.

Pull Request: https://projects.blender.org/blender/blender/pulls/119346
2024-03-12 15:01:10 +01:00
Campbell Barton
ed5fb3eaba Cleanup: various non-functional C++ changes 2024-03-05 11:32:42 +11:00
laurynas
bf17fc8d79 Fix: GPU: Ensures length of curves GPUIndexBuf to be multiple of 4
Exception is thrown in gpu_storage_buffer.cc

To reproduce create legacy Bezier curve and convert it to new Curves.
Code is from #116617

Pull Request: https://projects.blender.org/blender/blender/pulls/118951
2024-03-03 16:39:11 +01:00
Eugene Kuznetsov
7f43699ebf DRW: Curves: Indexbuf optimization for large numbers of curves
This optimizes a few loops that become significant bottlenecks during
viewport rendering of scenes with large numbers of curves.

To render a curves object, Blender needs to generate a potentially
very large (but trivial) index buffer. As previously implemented,
this index buffer is generated in an extremely inefficient manner,
with a single-threaded loop and an explicit function call per entry.
The buffer then needs to be pushed onto the GPU, which is also a fairly
slow task.

The PR generates the index buffer directly on the GPU with compute
shader.

Pull Request: https://projects.blender.org/blender/blender/pulls/116617
2024-02-25 17:22:58 +01:00
Hans Goudey
0618de49ad Cleanup: Replace MIN/MAX macros with C++ functions
Use `std::min` and `std::max` instead. Though keep MIN2 and MAX2
just for C code that hasn't been moved to C++ yet.

Pull Request: https://projects.blender.org/blender/blender/pulls/117384
2024-01-22 15:58:18 +01:00
Campbell Barton
611930e5a8 Cleanup: use std::min/max instead of MIN2/MAX2 macros 2023-11-07 16:33:19 +11:00
Sergey Sharybin
c1bc70b711 Cleanup: Add a copyright notice to files and use SPDX format
A lot of files were missing copyright field in the header and
the Blender Foundation contributed to them in a sense of bug
fixing and general maintenance.

This change makes it explicit that those files are at least
partially copyrighted by the Blender Foundation.

Note that this does not make it so the Blender Foundation is
the only holder of the copyright in those files, and developers
who do not have a signed contract with the foundation still
hold the copyright as well.

Another aspect of this change is using SPDX format for the
header. We already used it for the license specification,
and now we state it for the copyright as well, following the
FAQ:

    https://reuse.software/faq/
2023-05-31 16:19:06 +02:00
Campbell Barton
13c815085b Cleanup: spelling in comments 2023-05-24 11:21:18 +10:00
Jeroen Bakker
f828ecf4ba GPU: Use same read back API as SSBOs
The GPU module has 2 different styles when reading back data from
GPU buffers. The SSBOs used a memcpy to copy the data to a
pre-allocated buffer. IndexBuf/VertBuf gave back a driver/platform
controlled pointer to the memory.

Readback is done for test cases returning mapped pointers is not safe.
For this reason we settled on using the same approach as the SSBO.
Copy the data to a caller pre-allocated buffer.

Reason why this API is currently changed is that the Vulkan API is more
strict on mapping/unmapping buffers that can lead to potential issues
down the road.

Pull Request #104571
2023-02-13 08:34:19 +01:00
Campbell Barton
8bdd4b4685 Cleanup: use function style casts for C++ 2022-09-30 14:51:49 +10:00
Campbell Barton
f68cfd6bb0 Cleanup: replace C-style casts with functional casts for numeric types 2022-09-25 20:17:08 +10:00
Campbell Barton
6c6a53fad3 Cleanup: spelling in comments, formatting, move comments into headers 2022-09-06 16:25:20 +10:00
Jason Fielder
5f4409b02e Metal: MTLIndexBuf class implementation.
Implementation also contains a number of optimisations and feature enablements specific to the Metal API and Apple Silicon GPUs.

Ref T96261

Reviewed By: fclem

Maniphest Tasks: T96261

Differential Revision: https://developer.blender.org/D15369
2022-09-01 21:45:12 +02:00
Clément Foucault
b47c5505aa Fix T96892 Overlay: Hiding all of a mesh in edit mode causes visual glitch
This is caused by the geometry shader used by the edit mode line drawing.
If the drawcall uses indexed drawing and if the index buffer only contains
restart indices, it seems the result is 1 glitchy invocation of the
geometry shader.

Workaround by tagging these special case index buffers and bypassing
their drawcall.
2022-05-10 23:36:16 +02:00
Campbell Barton
c434782e3a File headers: SPDX License migration
Use a shorter/simpler license convention, stops the header taking so
much space.

Follow the SPDX license specification: https://spdx.org/licenses

- C/C++/objc/objc++
- Python
- Shell Scripts
- CMake, GNUmakefile

While most of the source tree has been included

- `./extern/` was left out.
- `./intern/cycles` & `./intern/atomic` are also excluded because they
  use different header conventions.

doc/license/SPDX-license-identifiers.txt has been added to list SPDX all
used identifiers.

See P2788 for the script that automated these edits.

Reviewed By: brecht, mont29, sergey

Ref D14069
2022-02-11 09:14:36 +11:00
Kévin Dietrich
eed45d2a23 OpenSubDiv: add support for an OpenGL evaluator
This evaluator is used in order to evaluate subdivision at render time, allowing for
faster renders of meshes with a subdivision surface modifier placed at the last
position in the modifier list.

When evaluating the subsurf modifier, we detect whether we can delegate evaluation
to the draw code. If so, the subdivision is first evaluated on the GPU using our own
custom evaluator (only the coarse data needs to be initially sent to the GPU), then,
buffers for the final `MeshBufferCache` are filled on the GPU using a set of
compute shaders. However, some buffers are still filled on the CPU side, if doing so
on the GPU is impractical (e.g. the line adjacency buffer used for x-ray, whose
logic is hardly GPU compatible).

This is done at the mesh buffer extraction level so that the result can be readily used
in the various OpenGL engines, without having to write custom geometry or tesselation
shaders.

We use our own subdivision evaluation shaders, instead of OpenSubDiv's vanilla one, in
order to control the data layout, and interpolation. For example, we store vertex colors
as compressed 16-bit integers, while OpenSubDiv's default evaluator only work for float
types.

In order to still access the modified geometry on the CPU side, for use in modifiers
or transform operators, a dedicated wrapper type is added `MESH_WRAPPER_TYPE_SUBD`.
Subdivision will be lazily evaluated via `BKE_object_get_evaluated_mesh` which will
create such a wrapper if possible. If the final subdivision surface is not needed on
the CPU side, `BKE_object_get_evaluated_mesh_no_subsurf` should be used.

Enabling or disabling GPU subdivision can be done through the user preferences (under
Viewport -> Subdivision).

See patch description for benchmarks.

Reviewed By: campbellbarton, jbakker, fclem, brecht, #eevee_viewport

Differential Revision: https://developer.blender.org/D12406
2021-12-27 16:35:54 +01:00
Aaron Carlisle
c1279768a7 Cleanup: Clang-Tidy modernize-redundant-void-arg 2021-12-08 00:31:20 -05:00
Germano Cavalcante
0eb9351296 Refactor: use 'BLI_task_parallel_range' in Draw Cache
One drawback to trying to predict the number of threads that will be
used in the `task_graph` is that we are only sure of the number when the
threads are running.

Using `BLI_task_parallel_range` allows the driver to
choose the best thread distribution through `parallel_reduce`.

The benefit is most evident on hardware with fewer cores.

This is the result on an 4-core laptop:
||before:|after:
|---|---|---|
|large_mesh_editing:|Average: 5.203638 FPS|Average: 5.398925 FPS
||rdata 15ms iter 43ms (frame 193ms)|rdata 14ms iter 36ms (frame 187ms)

Differential Revision: https://developer.blender.org/D11558
2021-06-11 10:49:50 -03:00
Germano Cavalcante
2330cec2c6 Refactor: Draw Cache: use 'BLI_task_parallel_range'
This is an adaptation of {D11488}.

A disadvantage of manually setting the iter ranges per thread is that
we don't know how many threads are running in the background and so we
don't know how to best distribute the ranges.

To solve this limitation we can use `parallel_reduce` and thus let the
driver choose the best distribution of ranges among the threads.

This proved to be especially beneficial for computers with few cores.

**Benchmarking:**
Here's the result on an 4-core laptop:
||master:|PATCH:
|---|---|---|
|large_mesh_editing:|Average: 5.203638 FPS|Average: 5.398925 FPS
||rdata 15ms iter 43ms (frame 193ms)|rdata 14ms iter 36ms (frame 187ms)

Here's the result on an 8-core PC:
||master:|PATCH:
|---|---|---|
|large_mesh_editing:|Average: 15.267482 FPS|Average: 15.906881 FPS
||rdata 9ms iter 28ms (frame 65ms)|rdata 9ms iter 25ms (frame 63ms)
|large_mesh_editing_ledge: |Average: 15.145966 FPS|Average: 15.520474 FPS
||rdata 9ms iter 29ms (frame 65ms)|rdata 9ms iter 25ms (frame 64ms)
|looptris_test:|Average: 4.001917 FPS|Average: 4.061105 FPS
||rdata 12ms iter 90ms (frame 236ms)|rdata 12ms iter 87ms (frame 230ms)
|subdiv_mesh_cage_and_final:|Average: 1.917769 FPS|Average: 1.971790 FPS
||rdata 7ms iter 37ms (frame 261ms)|rdata 7ms iter 31ms (frame 258ms)
||rdata 7ms iter 38ms (frame 252ms)|rdata 7ms iter 33ms (frame 249ms)
|subdiv_mesh_final_only:|Average: 6.387240 FPS|Average: 6.591251 FPS
||rdata 3ms iter 25ms (frame 151ms)|rdata 3ms iter 16ms (frame 145ms)
|subdiv_mesh_final_only_ledge:|Average: 6.247393 FPS|Average: 6.596024 FPS
||rdata 3ms iter 26ms (frame 158ms)|rdata 3ms iter 16ms (frame 148ms)

**Notes:**
- The improvement can only be noticed if all extracts are multithreaded.
- This patch touches different areas of the code, so it can be split into another patch if the idea is accepted.

These screenshots show how threads behave in a quadcore:
Master:
{F10164664}
Patch:
{F10164666}

Differential Revision: https://developer.blender.org/D11558
2021-06-11 10:45:12 -03:00
Jeroen Bakker
259b9c73d0 GPU: Thread safe index buffer builders.
Current index builder is designed to be used in a single thread.
This makes all index buffer extractions single threaded.
This patch adds a thread safe solution enabling multithreaded
building of index buffers.

To reduce locking the solution would provide a task/thread local
index buffer builder (called sub builder).
When a thread is finished this thread local index buffer builder
can be joined with the initial index buffer builder.

`GPU_indexbuf_subbuilder_init`: Initialized a sub builder. The
index list is shared between the parent and sub buffer, but the
counters are localized. Ensuring that updating counters would
not need any locking.

`GPU_indexbuf_subbuilder_finish`: merge the information of the
sub builder back to the parent builder. Needs to be invoked outside
the worker thread, or when sure that all worker threads have been
finished. Internal the function is not thread safe.

For testing purposes the extract_points extractor has been migrated to
the new API. Herefore changes to the mesh extractor were needed.

* When creating tasks, the task number of current task is stored in
  ExtractTaskData including the total number of tasks.
* Adding two functions  in `MeshExtract`.
** `task_init` will initialize the task specific userdata.
** `task_finish` should merge back the task specific userdata back.
* adding task_id parameter to the iteration functions so they can
  access the correct task data without any need for locking.

There is no noticeable change in end user performance.

Reviewed By: mano-wii

Differential Revision: https://developer.blender.org/D11499
2021-06-08 16:36:06 +02:00
Germano Cavalcante
223016a408 GPUIndexBuf: Find the minimum and maximum index through the builder
Moving the bounds code to the builder can be useful
for future optimizations like building multithreaded.

Reviewed By: fclem, jbakker

Differential Revision: https://developer.blender.org/D11455
2021-06-07 08:41:38 -03:00
Jeroen Bakker
87055dc71b GPU: Compute Pipeline.
With the compute pipeline calculation can be offloaded to the GPU.
This patch only adds the framework for compute. So no changes for users at
this moment.

NOTE: As this is an OpenGL4.3 feature it must always have a fallback.

Use `GPU_compute_shader_support` to check if compute pipeline can be used.
Check `gpu_shader_compute*` test cases for usage.

This patch also adds support for shader storage buffer objects and device only
vertex/index buffers.

An alternative that had been discussed was adding this to the `GPUBatch`, this
was eventually not chosen as it would lead to more code when used as part of a
shading group. The idea is that we add an `eDRWCommandType` in the near
future.

Reviewed By: fclem

Differential Revision: https://developer.blender.org/D10913
2021-05-26 16:49:30 +02:00
Sybren A. Stüvel
16732def37 Cleanup: Clang-Tidy modernize-use-nullptr
Replace `NULL` with `nullptr` in C++ code.

No functional changes.
2020-11-06 18:08:25 +01:00
Clément Foucault
cd849076d2 Fix T80681 Wireframe is not visible on square planes with 16384 quads
Fix wrong logic.
2020-09-15 14:56:02 +02:00
Clément Foucault
4ea93029c6 GPUIndexBuf: GL backend Isolation
This is part of the Vulkan backend task T68990.

There is no real change, only making some code re-organisation.
This also make the IndexBuf completely abstract from outside the
GPU module.
2020-09-06 22:13:06 +02:00
Clément Foucault
84d67bd0a9 Cleanup: GPU: Rename GPU_element to GPU_index_buffer
Makes it follow the functions names.
2020-09-06 22:13:06 +02:00