Blender already had its own copy of OpenSubDiv containing some local fixes
and code-style. This code still used gl-calls. This PR updates the calls
to use GPU module. This allows us to use OpenSubDiv to be usable on other
backends as well.
This PR was tested on OpenGL, Vulkan and Metal. Metal can be enabled,
but Vulkan requires some API changes to work with loose geometry.

# Considerations
**ShaderCreateInfo**
intern/opensubdiv now requires access to GPU module. This to create buffers
in the correct context and trigger correct dispatches. ShaderCreateInfo is used
to construct the shader for cross compilation to Metal/Vulkan. However opensubdiv
shader caching structures are still used.
**Vertex buffers vs storage buffers**
Implementation tries to keep as close to the original OSD implementation. If
they used storage buffers for data, we will use GPUStorageBuf. If it uses vertex
buffers, we will use gpu::VertBuf.
**Evaluator const**
The evaluator cannot be const anymore as the GPU module API only allows
updating SSBOs when constructing. API could be improved to support updating
SSBOs.
Current implementation has a change to use reads out of bounds when constructing
SSBOs. An API change is in the planning to remove this issue. This will be fixed in
an upcoming PR. We wanted to land this PR as the visibility of the issue is not
common and multiple other changes rely on this PR to land.
Pull Request: https://projects.blender.org/blender/blender/pulls/135296
Make the ray self primitives store and restore reliable for cases when
the intersect_shadow kernel is called multiple times:
- Light object and primitive are stored in dedicated fields in the
state. This adds 2 integers per state.
- The self object and primitive are used from the previous intersection
when the intersect_shadow is called multiple times.
There is more detailed explanation added in the code.
The issue was introduced by the light refactor to be objects in #134846.
Pull Request: https://projects.blender.org/blender/blender/pulls/135573
With auto mode, integrator_intersect_subsurface still ended up being
compiled in large GRF mode on Intel Arc B580, while normal GRF provides
the best performance for this kernel.
When using Alembic procedurals, the Mesh Sequence Cache attempts to
replace the original geometry with a plain old cube. However, it never
frees this new cube geometry. Transfer ownership to the underlying
GeometrySet instead.
Investigating the scenario also showed that the `~AlembicProcedural`
dtor was removing an item from the `nodes` vector while iterating over
it, which triggers debug asserts on at least MSVC. I believe the removal
is unnecessary since this is the dtor and ASAN appears clean now.
Pull Request: https://projects.blender.org/blender/blender/pulls/134085
The default was large GRF mode for all kernels and normal GRF for
intersection kernels.
path_array kernels also benefit from normal GRF, being almost 2x faster
in this mode, as measured on my Arc B580. This translates to a much
smaller 1-3% speedup in overall rendering.
Instead of manually adding them to the list of kernels to compile in
normal GRF mode, I've switched to auto that provides the same result.
The crash has been introduced by the refactor of lights to be
objects in #134846.
We can make such cases easier to catch at compile time in the
future, but for now applying the minimal patch which solves the
problem without going deeper into refactor.
Pull Request: https://projects.blender.org/blender/blender/pulls/135570
Is harmless from functional perspective, but uses more resources and
potentially slower than it should be. Although, probably something
hard to measure in practice, but still better not follow this anti-
pattern.
Pull Request: https://projects.blender.org/blender/blender/pulls/135529
The general idea is to keep the 'old', C-style MEM_callocN signature, and slowly
replace most of its usages with the new, C++-style type-safer template version.
* `MEM_cnew<T>` allocation version is renamed to `MEM_callocN<T>`.
* `MEM_cnew_array<T>` allocation version is renamed to `MEM_calloc_arrayN<T>`.
* `MEM_cnew<T>` duplicate version is renamed to `MEM_dupallocN<T>`.
Similar templates type-safe version of `MEM_mallocN` will be added soon
as well.
Following discussions in !134452.
NOTE: For now static type checking in `MEM_callocN` and related are slightly
different for Windows MSVC. This compiler seems to consider structs using the
`DNA_DEFINE_CXX_METHODS` macro as non-trivial (likely because their default
copy constructors are deleted). So using checks on trivially
constructible/destructible instead on this compiler/system.
Pull Request: https://projects.blender.org/blender/blender/pulls/134771
This is preparation for #129495.
Currently, all of OSL is managed by the OSLShaderManager. This makes it so that the general OSL setup is handled by a general OSLManager, and both the OSLShaderManager and (in the future) the Camera can use it to manage their scripts.
Pull Request: https://projects.blender.org/blender/blender/pulls/135050
`OpenSubdiv_Buffer` is a wrapper that was introduced at the time
that Blender couldn't use CPP directly. It contains a pointer to
a VertBuf and callbacks to use GPU module on that buffer.
This PR replaces OpenSubdiv_Buffer with `blender::gpu::VertBuf` and
removes the wrapper.
NOTE: OpenSubdiv tests are added to blender_test executable to make the
library dependencies not to complicated.
Pull Request: https://projects.blender.org/blender/blender/pulls/135389
Now ccl_device sets inlining and ccl_device_inline forces inlining.
This matches more closely with what is currently done for cuda and metal
backends.
I've measured from 1% to 6% overall performance improvement in rendering
benchmark scenes on Arc B580, as well as a small decrease in compile
time.
Various fixes in the HIP-RT BVH building related on making sure
curves motion blur is supported and is working correctly, as well
as properly handle motion pass configuration when path tracing is
to ignore motion blur (and instead write vector pass).
This PR contains #134797 with fixes needed to fully finish it:
moving commits from that PR here made it easier to ensure all
moving parts are tested without mental overhead.
Fixes#134510
Co-authored-by: Sahar A. Kashi <sahar.alipourkashi@amd.com>
Co-authored-by: Sergey Sharybin <sergey@blender.org>
Co-authored-by: Brecht Van Lommel <brecht@blender.org>
Pull Request: https://projects.blender.org/blender/blender/pulls/135125
Rewrite the ONEAPI Blender texture allocation code to make use of
1D images backed by linear USM memory. This increases parity
with the CUDA implementation and sets the ground work for enabling
host USM allocations in Blender. By enabling this functionality,
previously failing benchmarks are now passing.
Together with the previous commit, no functional changes are expected.
These have bugs in with the latest HIP-RT and HIP SDK, so just disable them
as we do not expect a fix in time, and rolling back would re-introduce other
bugs. As RDNA1 does not have hardware raytracing, it is also less important
to use HIP-RT.
Note that only RDNA2+ is officially supported by HIP, so these GPUs working
at all is somewhat lucky.
Fix#134979Fix#134978Fix#134975
Pull Request: https://projects.blender.org/blender/blender/pulls/135179
HI-DPI screens now select larger custom cursors on Wayland,
previously small cursors would be scaled up.
This only works well when all outputs have the same scaling
as custom-cursors don't support sending multiple sized cursors
to GHOST at once, see code-comments for details.
Float->byte rendered image dithering uses triangle noise algorithm. Keep
the algorithm the same, just make some improvements and fix some issues:
1) The hash function for noise was using "trig" hash from "On generating
random numbers" (Rey 1998), but that is not a great quality hash, plus it
can produce very different results between CPUs/GPUs. Replace it with
"iqint3" (recommended by "Hash Functions for GPU Rendering", JCGT 2020),
which is same performance on GPU, faster on CPU, and much better quality.
This is the same hash as Cycles already uses elsewhere. Also it is purely
integer based, so exactly the same results on all platforms.
2) For the above point, replace `dither_random_value` to take integer
pixel coordinates and adjust calling code accordingly. Some previous
callers were (accidentally?) passing integer coordinates already. Other
places actually get a tiny bit simpler, since they now no longer need an
extra multiplication.
3) The CPU dithering path was wrongly introducing bias, i.e. making the
image lighter. The CPU path also needs dither noise to be in [-1..+1]
range (not [-0.5..+1.5]!) just like GPU path does, since the later
float->byte conversion already does rounding.
4) The CPU dithering path was using thread-slice-local Y coordinate,
meaning the dithering pattern was repeating vertically. The more CPU cores
you use, the worse the repetition.
5) Change the way that uniform noise is converted to triangle noise.
Previous implementation was based on one shadertoy from 2015, change it
to another shadertoy from 2020. The new one fixes issues with the old way,
and it just works on the CPU too, so now both CPU and GPU code paths are
exactly the same.
6) Cleanup: remove DitherContext, just a single float is enough
Performance and image comparisons in the PR.
Pull Request: https://projects.blender.org/blender/blender/pulls/135224
After the recent HIP SDK 6.3 update on Windows, the minimum GPU driver
required to use HIP in Cycles has increased.
This commit increases the required driver version listed in the UI and
adds a check to avoid showing HIP devices if they're below a certain
driver version number as they don't work properly.
Pull Request: https://projects.blender.org/blender/blender/pulls/134965
Previously point cloud rendering was disabled on the HIPRT backend due
to unexpected performance regressions introduce by it.
With the recent update to HIP SDK 6.3 and HIPRT 2.5, these performance
regressions have been resolved and so this commit re-enables
point cloud rendering on HIPRT.
Pull Request: https://projects.blender.org/blender/blender/pulls/134902
Dropping the inlining hint for `light_tree_pdf` and reverting to the
default inlining thresholds for DPC++ compiler gives a ~4% speedup on
classroom and other scenes on Arc B580.
Pull Request: https://projects.blender.org/blender/blender/pulls/135042
This implements three improvements to the energy preservation and albedo
scaling logic, which help the Principled BSDF pass the white-furnace test
when using the coat layers at high roughness.
Specifically, at roughness 0.3, the albedo scaling brings it from 60% at
the edge to 95%, and with the energy preservation it's 99.8%.
Pull Request: https://projects.blender.org/blender/blender/pulls/134620