Similar to 8d0920ec6d and aef0e72e5a.
In a test case with 2 million curves and 15 million points
I observed a 10x performance improvement, from 2.2s to 0.2s
to copy the data from Blender to Cycles.
Similar to 8d0920ec6d.
In a test case with 8 million points and 3 attributes, I observed
around a 9x performance improvement, from 1.8s to 0.2s to copy
the data from Blender to Cycles. For some attribute types, using
implicit sharing could remove the need to copy entirely, but removing
the overhead from the RNA API makes sense anyway.
Only use the denoised buffer for access of denoised passes, and
access the rest of the passes from the original render buffer.
This allows in-place modification of the guiding passes needed
by the denoiser without affecting the final render result pixels.
Pull Request: https://projects.blender.org/blender/blender/pulls/106668
Barycentric coordinate convention was changed at some point, which is
reflected in `motion_triangle_shader.h` (1c2c468abc) but not in
`motion_triangle.h`, this is also fixed by sharing functions between the
two header files.
Pull Request: https://projects.blender.org/blender/blender/pulls/106629
Similar to 4bcd59d644. This probably got worse recently with
the generic attribute refactors for `Mesh`, but the final performance is
probably much better than older versions too.
Timings extracting attributes from a 16 million vertex grid (seconds):
- Corner float attribute: 0.72 -> 0.19
- Face float attribute: 0.60 -> 0.07
- UV map: 3.18 -> 0.05
Implements #95967.
Currently the `MPoly` struct is 12 bytes, and stores the index of a
face's first corner and the number of corners/verts/edges. Polygons
and corners are always created in order by Blender, meaning each
face's corners will be after the previous face's corners. We can take
advantage of this fact and eliminate the redundancy in mesh face
storage by only storing a single integer corner offset for each face.
The size of the face is then encoded by the offset of the next face.
The size of a single integer is 4 bytes, so this reduces memory
usage by 3 times.
The same method is used for `CurvesGeometry`, so Blender already has
an abstraction to simplify using these offsets called `OffsetIndices`.
This class is used to easily retrieve a range of corner indices for
each face. This also gives the opportunity for sharing some logic with
curves.
Another benefit of the change is that the offsets and sizes stored in
`MPoly` can no longer disagree with each other. Storing faces in the
order of their corners can simplify some code too.
Face/polygon variables now use the `IndexRange` type, which comes with
quite a few utilities that can simplify code.
Some:
- The offset integer array has to be one longer than the face count to
avoid a branch for every face, which means the data is no longer part
of the mesh's `CustomData`.
- We lose the ability to "reference" an original mesh's offset array
until more reusable CoW from #104478 is committed. That will be added
in a separate commit.
- Since they aren't part of `CustomData`, poly offsets often have to be
copied manually.
- To simplify using `OffsetIndices` in many places, some functions and
structs in headers were moved to only compile in C++.
- All meshes created by Blender use the same order for faces and face
corners, but just in case, meshes with mismatched order are fixed by
versioning code.
- `MeshPolygon.totloop` is no longer editable in RNA. This API break is
necessary here unfortunately. It should be worth it in 3.6, since
that's the best way to allow loading meshes from 4.0, which is
important for an LTS version.
Pull Request: https://projects.blender.org/blender/blender/pulls/105938
This patch refactors the texture samples code by mainly splitting the
eGPUSamplerState enum into multiple smaller enums and packing them
inside a GPUSamplerState struct. This was done because many members of
the enum were mutually exclusive, which was worked around during setting
up the samplers in the various backends, and additionally made the API
confusing, like the GPU_texture_wrap_mode function, which had two
mutually exclusive parameters.
The new structure also improved and clarified the backend sampler cache,
reducing the cache size from 514 samplers to just 130 samplers, which
also slightly improved the initialization time. Further, the
GPU_SAMPLER_MAX signal value was naturally incorporated into the
structure using the GPU_SAMPLER_STATE_TYPE_INTERNAL type.
The only expected functional change is in the realtime compositor, which
now supports per-axis repetition control, utilizing new API functions
for that purpose.
This patch is loosely based on an older patch D14366 by Ethan Hall.
Pull Request: https://projects.blender.org/blender/blender/pulls/105642
The renderdoc integration used to be behind the `--debug-gpu`
command line option. When using `--debug-gpu` outside renderdoc
error messages where displayed that aren't relevant.
This PR adds a specific command line option for the renderdoc
integration. This option will also enable `--debug-gpu`.
Pull Request: https://projects.blender.org/blender/blender/pulls/106541
Logic for the recently included fractional scaling support [0] was
difficult to reason about as it depended on two different callbacks
one that listened to a preferred scale, another that tracked which
physical displays the window overlapped.
Checking if fractional scaling was in used depended on the order
the callbacks ran - which is undefined.
In practice - mixing non-fractional and fractional displays would
flicker when the window was moved between monitors.
Resolve this problem with the following changes:
- When the fractional-scale manager is supported,
only respond to the scale from it's preferred_scale callback.
- When no fractional-scale manager is available,
set the scale based on the scale of overlapping outputs.
- Add support for postponing the buffers commit call to prevent
flickering when changing the windows scale.
Other changes:
- Use a lock before setting the pending frame state from
wp_fractional_scale_handle_preferred_scale.
- Ensure pending actions that themselves trigger pending actions
run in the time gwl_window_pending_actions_handle is called.
- Rename GWL_Window::scale -> GWL_WindowFrame::buffer_scale.
[0]: cde99075e8
Depend on fractional-scale when searching for wayland-protocols
This will impact builders that don't use Blender's `../lib/` and
have wayland-protocols older than v1.31.
Make it a native Cycles light option instead of counter-acting the inverse
area calculation in Hydra.
Differential Revision: https://developer.blender.org/D16838
HdRenderDelegate got a change with the interface, adding gpuSupported. It
currently is just a dummy implementation without checking for anything
GPU-related.
Differential Revision: https://developer.blender.org/D17207
The problem is that `set_geometry()` otherwise ends up implicitly
casting `Geometry*` to bool. In Blender this worked because the
geometry header was always included before the object header.
Differential Revision: https://developer.blender.org/D16737
Use raw Blender structs and mesh data rather than using the RNA API.
There isn't any benefit from using the RNA when Cycles is compiled
with Blender anyway, and a profile showed that the majority of time
was spent in Blender RNA API functions.
This gives a significant improvement in performance when ingesting
meshes. Here are some tests of the runtime of the `create_mesh`
function (in seconds):
| | Before | After |
| ------------------------- | ------ | ----- |
| Grid | 0.66 | 0.11 |
| Many realized cubes | 2.60 | 0.48 |
| Large curve to mesh setup | 4.18 | 1.14 |
Also change to resizing the arrays and filling them by index rather
than appending. This makes the parallel aspect of the logic clearer,
and makes the loops easier to parallelize in the future, and makes
it easier to have a performance benefit when an attribute like
`sharp_face` doesn't exist.
Pull Request: https://projects.blender.org/blender/blender/pulls/106275
Previously, fractional scaling was detected but set an integer buffer
scale which the compositor would down-scale causing blurry output.
Now the fractional scaling interface is used when available to set the
DPI and set the internal buffers size & viewport transformation to
ensure 1:1 pixels from Blender to the Wayland output.
Tested to work with multiple monitors with mixed
fractional/non-fractional scale.
Note that this change causes a regression for when fractional scaling
is set on a compositor without support for fractional-scale-v1.
Supporting fractional scaling in both cases is possible but overly
complicated. This case already wasn't working so well - with blurry
output due to image scaling, now the DPI wont be accurate in this case
although Blender is still usable.
The goal is to solve confusion of the "All rights reserved" for licensing
code under an open-source license.
The phrase "All rights reserved" comes from a historical convention that
required this phrase for the copyright protection to apply. This convention
is no longer relevant.
However, even though the phrase has no meaning in establishing the copyright
it has not lost meaning in terms of licensing.
This change makes it so code under the Blender Foundation copyright does
not use "all rights reserved". This is also how the GPL license itself
states how to apply it to the source code:
<one line to give the program's name and a brief idea of what it does.>
Copyright (C) <year> <name of author>
This program is free software ...
This change does not change copyright notice in cases when the copyright
is dual (BF and an author), or just an author of the code. It also does
mot change copyright which is inherited from NaN Holding BV as it needs
some further investigation about what is the proper way to handle it.
This patch replaces `dispatchThreadgroups` with `dispatchThreads` which takes care of non-uniform threadgroup bounds. This allows us to remove the bounds guards in the integrator kernel entry points.
Pull Request: https://projects.blender.org/blender/blender/pulls/106217
This patch fixes a MetalRT issue where viable shadow hits are discounted based on the false assumption that hits are ordered by distance. With this patch, the following unit tests now pass:
- openvdb smoke
- shadow catcher pt transparent lamp only 0.8
- shadow catcher pt transparent lamp only 1.0
Pull Request: https://projects.blender.org/blender/blender/pulls/106276
For example
```
OIIOOutputDriver::~OIIOOutputDriver()
{
}
```
becomes
```
OIIOOutputDriver::~OIIOOutputDriver() {}
```
Saves quite some vertical space, which is especially handy for
constructors.
Pull Request: https://projects.blender.org/blender/blender/pulls/105594
The DPI returned by the GHOST/Wayland didn't account the buffer being
rendered at a higher (non-fractional) resolution, then scaled down.
This caused the software cursor and UI to rendered very small.
A fractional scale of 101% would show the UI just over 50% of the size
(making the UI to be close to half the scale it should have been).
Resolve by accounting for down-scaling of the buffer to it's
fractional size.
This is a workaround fix for Open PGL 0.4.1 when the first volume
samples are collected in a later training iteration.
The problem is fixed in Open PGL > 0.5.0 and the workaround
can be removed after upgrading Open PGL.
This PR adds pre-checks when enabling validation layers.
For validation layers to work some platforms require that
the Vulkan SDK is installed. Validation layers are activated
when running blender with `--debug-gpu`.
Sometimes we expect users to run with `--debug-gpu` for
narrowing down an issue and we cannot expect them to have
the Vulkan SDK installed.
This patch will check if the `VK_LAYER_PATH` is available
and that the configuration file of the validation layer is
present. If this isn't the case we don't activate the
requested validation layer.
Pull Request: https://projects.blender.org/blender/blender/pulls/105922
Calling render (for example) with an existing window open now activates
the window on Wayland. Tested to work on GNOME & KDE.
Use the xdg-activation protocol which typically brings the
window to the foreground.
Partially resolves#102985.