This is preparation for #129495.
Currently, all of OSL is managed by the OSLShaderManager. This makes it so that the general OSL setup is handled by a general OSLManager, and both the OSLShaderManager and (in the future) the Camera can use it to manage their scripts.
Pull Request: https://projects.blender.org/blender/blender/pulls/135050
`OpenSubdiv_Buffer` is a wrapper that was introduced at the time
that Blender couldn't use CPP directly. It contains a pointer to
a VertBuf and callbacks to use GPU module on that buffer.
This PR replaces OpenSubdiv_Buffer with `blender::gpu::VertBuf` and
removes the wrapper.
NOTE: OpenSubdiv tests are added to blender_test executable to make the
library dependencies not to complicated.
Pull Request: https://projects.blender.org/blender/blender/pulls/135389
Now ccl_device sets inlining and ccl_device_inline forces inlining.
This matches more closely with what is currently done for cuda and metal
backends.
I've measured from 1% to 6% overall performance improvement in rendering
benchmark scenes on Arc B580, as well as a small decrease in compile
time.
Various fixes in the HIP-RT BVH building related on making sure
curves motion blur is supported and is working correctly, as well
as properly handle motion pass configuration when path tracing is
to ignore motion blur (and instead write vector pass).
This PR contains #134797 with fixes needed to fully finish it:
moving commits from that PR here made it easier to ensure all
moving parts are tested without mental overhead.
Fixes#134510
Co-authored-by: Sahar A. Kashi <sahar.alipourkashi@amd.com>
Co-authored-by: Sergey Sharybin <sergey@blender.org>
Co-authored-by: Brecht Van Lommel <brecht@blender.org>
Pull Request: https://projects.blender.org/blender/blender/pulls/135125
Rewrite the ONEAPI Blender texture allocation code to make use of
1D images backed by linear USM memory. This increases parity
with the CUDA implementation and sets the ground work for enabling
host USM allocations in Blender. By enabling this functionality,
previously failing benchmarks are now passing.
Together with the previous commit, no functional changes are expected.
These have bugs in with the latest HIP-RT and HIP SDK, so just disable them
as we do not expect a fix in time, and rolling back would re-introduce other
bugs. As RDNA1 does not have hardware raytracing, it is also less important
to use HIP-RT.
Note that only RDNA2+ is officially supported by HIP, so these GPUs working
at all is somewhat lucky.
Fix#134979Fix#134978Fix#134975
Pull Request: https://projects.blender.org/blender/blender/pulls/135179
HI-DPI screens now select larger custom cursors on Wayland,
previously small cursors would be scaled up.
This only works well when all outputs have the same scaling
as custom-cursors don't support sending multiple sized cursors
to GHOST at once, see code-comments for details.
Float->byte rendered image dithering uses triangle noise algorithm. Keep
the algorithm the same, just make some improvements and fix some issues:
1) The hash function for noise was using "trig" hash from "On generating
random numbers" (Rey 1998), but that is not a great quality hash, plus it
can produce very different results between CPUs/GPUs. Replace it with
"iqint3" (recommended by "Hash Functions for GPU Rendering", JCGT 2020),
which is same performance on GPU, faster on CPU, and much better quality.
This is the same hash as Cycles already uses elsewhere. Also it is purely
integer based, so exactly the same results on all platforms.
2) For the above point, replace `dither_random_value` to take integer
pixel coordinates and adjust calling code accordingly. Some previous
callers were (accidentally?) passing integer coordinates already. Other
places actually get a tiny bit simpler, since they now no longer need an
extra multiplication.
3) The CPU dithering path was wrongly introducing bias, i.e. making the
image lighter. The CPU path also needs dither noise to be in [-1..+1]
range (not [-0.5..+1.5]!) just like GPU path does, since the later
float->byte conversion already does rounding.
4) The CPU dithering path was using thread-slice-local Y coordinate,
meaning the dithering pattern was repeating vertically. The more CPU cores
you use, the worse the repetition.
5) Change the way that uniform noise is converted to triangle noise.
Previous implementation was based on one shadertoy from 2015, change it
to another shadertoy from 2020. The new one fixes issues with the old way,
and it just works on the CPU too, so now both CPU and GPU code paths are
exactly the same.
6) Cleanup: remove DitherContext, just a single float is enough
Performance and image comparisons in the PR.
Pull Request: https://projects.blender.org/blender/blender/pulls/135224
After the recent HIP SDK 6.3 update on Windows, the minimum GPU driver
required to use HIP in Cycles has increased.
This commit increases the required driver version listed in the UI and
adds a check to avoid showing HIP devices if they're below a certain
driver version number as they don't work properly.
Pull Request: https://projects.blender.org/blender/blender/pulls/134965
Previously point cloud rendering was disabled on the HIPRT backend due
to unexpected performance regressions introduce by it.
With the recent update to HIP SDK 6.3 and HIPRT 2.5, these performance
regressions have been resolved and so this commit re-enables
point cloud rendering on HIPRT.
Pull Request: https://projects.blender.org/blender/blender/pulls/134902
Dropping the inlining hint for `light_tree_pdf` and reverting to the
default inlining thresholds for DPC++ compiler gives a ~4% speedup on
classroom and other scenes on Arc B580.
Pull Request: https://projects.blender.org/blender/blender/pulls/135042
This implements three improvements to the energy preservation and albedo
scaling logic, which help the Principled BSDF pass the white-furnace test
when using the coat layers at high roughness.
Specifically, at roughness 0.3, the albedo scaling brings it from 60% at
the edge to 95%, and with the energy preservation it's 99.8%.
Pull Request: https://projects.blender.org/blender/blender/pulls/134620
Drag and dropping bitmap images (as in drag and dropping directly from
another software such as a web browser, not from an image file in
Finder) inside the Blender window on macOS would segfault due to the
dropped image being [autorelease]d even though its data was meant to
outlive the function scope.
Fixed by removing the superflous autorelease and adding a comment note.
The only caller of this function (GHOST_SystemCocoa::handleDraggingEvent)
already properly [release]s the image in question.
Note that currently, non-file/bitmap image drag and drop is not
implemented on the WM side, and as such this feature/GHOST event does
not do anything practical.
Pull Request: https://projects.blender.org/blender/blender/pulls/135076
Remove GP legacy obtype and unused functions
Few hidden bugs are fixed with that:
- Outliner drag-drop for GP material/effect elements now works
- Correct stats are shown in status bar.
Pull Request: https://projects.blender.org/blender/blender/pulls/133957
This is an intermediate steps towards making lights actual geometry.
Light is now a subclass of Geometry, which simplifies some code.
The geometry is not added to the BVH yet, which would be the next
step and improve light intersection performance with many lights.
This makes object attributes work on lights.
Co-authored-by: Lukas Stockner <lukas@lukasstockner.de>
Pull Request: https://projects.blender.org/blender/blender/pulls/134846
The attribute handling code in the kernel is currently highly duplicated since
it needs to handle five different data types and we couldn't use templates
back then.
We can now, so might as well make use of it and get rid of ~1000 lines.
There are also some small fixes for the GPU OSL code:
- Wrong derivative for .w component when converting float2/float3->float4
- Different conversion for float2->float (CPU averages, GPU used to take .x)
- Removed useless code for converting to float2, not used by OSL
Pull Request: https://projects.blender.org/blender/blender/pulls/134694
This change brings the following improvements on the user level
- Support of GPUs with gfx12 architecture
- New HIP-RT library which in addition to the gfx12 support brings
various bug-fixes.
The known limitation of gfx12 is that OpenImageDenoiser does not yet
support this GPU architecture. This means that while Cycles will use the
full advantage of the gfx12 (including hardware accelerated ray-tracing),
denoising will only be possible on CPU, or secondary gfx11 or below GPU.
This is something that requires a change in OIDN and it is to late to do
it for Blender 4.4, but it is something to look forward for Blender 4.5.
The gfx12 changes for the pre-compiled kernels is rather trivial,
so it comes together (in the same PR) as the bigger HIP-RT change.
On the development side this change brings the following improvements:
- One step compile and link (much simpler CMake rules)
- Embedding BVH binaries in hiprt dll (which makes it easier to package
and load, without relying on special path configuration)
Co-authored-by: Sahar Kashi <sahar.kashi@amd.com>
Co-authored-by: Sergey Sharybin <sergey@blender.org>
Co-authored-by: Brecht Van Lommel <brecht@blender.org>
Pull Request: https://projects.blender.org/blender/blender/pulls/133129
Followup to 48e26c3afe, and discussions in !134771 about keeping
'C-style' and 'C++ template type-safe style' implementations of our
guardedalloc separated. And it makes `MEM_freeN<T>` code simpler.
Also skip type-checking in `MEM_freeN<T>` only with MSVC, as clang-cl on
windows-arm64 does work fine with DNA structs using
`DNA_DEFINE_CXX_METHODS`.
Pull Request: https://projects.blender.org/blender/blender/pulls/134861
The main goal of these changes are to improve static (i.e. build-time)
checks on whether a given data can be allocated and freed with `malloc`
and `free` (C-style), or requires proper C++-style construction and
destruction (`new` and `delete`).
* Add new `MEM_malloc_arrayN_aligned` API.
* Make `MEM_freeN` a template function in C++, which does static assert on
type triviality.
* Add `MEM_SAFE_DELETE`, similar to `MEM_SAFE_FREE` but calling
`MEM_delete`.
The changes to `MEM_freeN` was painful and useful, as it allowed to fix a bunch
of invalid calls in existing codebase already.
It also highlighted a fair amount of places where it is called to free incomplete
type pointers, which is likely a sign of badly designed code (there should
rather be an API to destroy and free these data then, if the data type is not fully
publicly exposed). For now, these are 'worked around' by explicitly casting the
freed pointers to `void *` in these cases - which also makes them easy to search for.
Some of these will be addressed separately (see blender/blender!134765).
Finally, MSVC seems to consider structs defining new/delete operators (e.g. by
using the `MEM_CXX_CLASS_ALLOC_FUNCS` macro) as non-trivial. This does not
seem to follow the definition of type triviality, so for now static type checking in
`MEM_freeN` has been disabled for Windows. We'll likely have to do the same
with type-safe `MEM_[cm]allocN` API being worked on in blender/blender!134771
Based on ideas from Brecht in blender/blender!134452
Pull Request: https://projects.blender.org/blender/blender/pulls/134463
There is a bug in Embree that makes BVH updates crash. Disabling multithreaded
BVH updates after the initial BVH build appears to work around it, at the cost
of some performance.
This will not affect performance of the initial BVH build, transforming objects
or editing a single mesh. It will only affect performance when multiple smaller
meshes are edited together, as those can no longer have their BVH updated in
parallel or benefit from parallellization over many primitives.
Pull Request: https://projects.blender.org/blender/blender/pulls/134747
Plotting happens from the given root node into a graphviz file.
Supports plotting from both scene level LightTreeNode and the kernel
level KernelLightTreeNode.
An external graphviz command is to be used to convert generated file
to an image.
Pull Request: https://projects.blender.org/blender/blender/pulls/134738
The original report stumbled upon this issue with a more tricky
configuration when light linking is combined with light tress.
However, the actual contributing factor was a mesh with emission
shader which is not assigned to any triangles. This triggered a
bug in the BoundBox::transformed() which converted non-valid bounds
to bounds by performing per-corner growing.
Additionally fix incorrect handling of shared nodes which only
worked for leaf nodes. This was due to the fact how the measure
was accumulated: it is possible that add() is called with an empty
measure.
Pull Request: https://projects.blender.org/blender/blender/pulls/134699
194e233d86 caused a discussion in the chat about the initialization
behavior of `MEM_new()`, and agreement was to not rely on
zero-initialization ever. Noted this in the API comment now.
Some people found the existing comment useful but it still left some
questions. Tried to clarify that now.
This is a crucial memory management function, it's important to have
behavior documented well, even if a full explanation is out-of-scope.
Also added another link in case people want to check more details.
Pull Request: https://projects.blender.org/blender/blender/pulls/134577
This patch modifies the logic behind handling NDOF device events on
macOS so that it can benefit from the `Blender` profile available in
3DConnexion driver v10.8.7 and later.
A new device command was introduced: `kConnexionCmdAppEvent`, which is
sent by the driver upong getting an apprioriate NDOF device button
input. This allow the driver to consumes all NDOF device input and then
send appropriate app events based on its configuration instead of
forwarding raw data to the application directly.
When using 3DConnexion driver versions prior to v10.8.7, the behavior
is unchanged. This approach allows for supporting all of the SpaceMouse
Enterprise buttons, long presses included (solving issue #119206 on macOS)
Co-authored-by: Sergey Sharybin <sergey@blender.org>
Pull Request: https://projects.blender.org/blender/blender/pulls/126694