openSubdiv_init() would detect available evaluators before any OpenGL context
exists, causing a crash with libepoxy. This test however is redundant as we
already check the requirements on the Blender side through the GPU API.
To simplify things, completely remove the device detection in the opensubdiv
module and reduce the evaluators to just CPU and GPU. The plan here is to move
to the GPU module abstraction over OpenGL/Metal/Vulkan and so all these
different backends no longer make sense.
This also removes the user preference for OpenSubdiv compute device, which was
not used for the new GPU subdivision implementation.
Ref D15291
Differential Revision: https://developer.blender.org/D15470
Simplify logic for freeing a NULL pointer. While no null-pointer
de-reference was performed, this wasn't as so obvious as the pointer
was passed to MEM_lockfree_allocN_len before checking for NULL.
NOTE: T99744 claimed the a NULL pointer free was a vulnerability,
while I can't see evidence for this - exiting early makes it clearer
the memory isn't accessed.
*Details*
- Add MEMHEAD_LEN macro, avoids redundant NULL check.
- Use "UNLIKELY(..)" hint's for error cases
(freeing NULL pointer and checking if `leak_detector_has_run`).
For transparency, volume and light intersection rays, adjust these distances
rather than the ray start position. This way we increment the start distance
by the smallest possible float increment to avoid self intersections, and be
sure it works as the distance compared to be will be exactly the same as
before, due to the ray start position and direction remaining the same.
Fix T98764, T96537, hair ray tracing precision issues.
Differential Revision: https://developer.blender.org/D15455
This was added for Metal, but also gives good results with CUDA and OptiX.
Also enable it for future Apple GPUs instead of only M1 and M2, since this has
been shown to help across multiple GPUs so the better bet seems to enable
rather than disable it.
Also moves some of the logic outside of the Metal device code, and always
enables the code in the kernel since other devices don't do dynamic compile.
Time per sample with OptiX + RTX A6000:
new old
barbershop_interior 0.0730s 0.0727s
bmw27 0.0047s 0.0053s
classroom 0.0428s 0.0464s
fishy_cat 0.0102s 0.0108s
junkshop 0.0366s 0.0395s
koro 0.0567s 0.0578s
monster 0.0206s 0.0223s
pabellon 0.0158s 0.0174s
sponza 0.0088s 0.0100s
spring 0.1267s 0.1280s
victor 0.0524s 0.0531s
wdas_cloud 0.0817s 0.0816s
Ref D15331, T87836
The Metal backend now compiles and caches a second set of kernels which are
optimized for scene contents, enabled for Apple Silicon.
The implementation supports doing this both for intersection and shading
kernels. However this is currently only enabled for intersection kernels that
are quick to compile, and already give a good speedup. Enabling this for
shading kernels would be faster still, however this also causes a long wait
times and would need a good user interface to control this.
M1 Max samples per minute (macOS 13.0):
PSO_GENERIC PSO_SPECIALIZED_INTERSECT PSO_SPECIALIZED_SHADE
barbershop_interior 83.4 89.5 93.7
bmw27 1486.1 1671.0 1825.8
classroom 175.2 196.8 206.3
fishy_cat 674.2 704.3 719.3
junkshop 205.4 212.0 257.7
koro 310.1 336.1 342.8
monster 376.7 418.6 424.1
pabellon 273.5 325.4 339.8
sponza 830.6 929.6 1142.4
victor 86.7 96.4 96.3
wdas_cloud 111.8 112.7 183.1
Code contributed by Jason Fielder, Morteza Mostajabodaveh and Michael Jones
Differential Revision: https://developer.blender.org/D14645
This is useful when using an armature as a camera rig, to avoid creating and
targetting an empty object.
Differential Revision: https://developer.blender.org/D7012
When the solve is successful, the light sample needs to be updated since the
effective shading point is now on the last refractive interface. Spread was
not taken into account, creating false caustics.
Differential Revision: https://developer.blender.org/D15449
With choices Default, Lower Memory and Faster Render. For convenience, and
to help communicate what the various settings do.
Differential Revision: https://developer.blender.org/D15446
This patch partitions the active indices into chunks prior to sorting by material in order to tradeoff some material coherence for better locality. On Apple Silicon GPUs (particularly higher end M1-family GPUs), we observe overall render time speedups of up to 15%. The partitioning is implemented by repeating the range of `shader_sort_key` for each partition, and encoding a "locator" key which distributes the indices into sorted chunks.
Reviewed By: brecht
Differential Revision: https://developer.blender.org/D15331
Quiet warning from [0], no functional change as the this information
was always ignored.
Key release events shouldn't have associated text, this was cleared
for wmEvent's, so there is no reason to pass it from GHOST.
[0]: d6fef73ef1
- Remove references to `ISTEXTINPUT` as any keyboard event with it's
utf8_buf set can be handled as text input.
- Update references to the key repeat flag.
The `ascii` member was only kept for historic reason as some platforms
didn't support utf8 when it was first introduced.
Remove the `ascii` struct members since many checks used this as a
fall-back for utf8_buf not being set which isn't needed.
There are a few cases where it's convenient to access the ASCII value
of an event (or nil) so a function has been added to do that.
*Details*
- WM_event_utf8_to_ascii() has been added for the few cases an events
ASCII value needs to be accessed, this just avoids having to do
multi-byte character checks in-line.
- RNA Event.ascii remains, using utf8_buf[0] for single byte characters.
- GHOST_TEventKeyData.ascii has been removed.
- To avoid regressions non-ASCII Latin1 characters from GHOST are
converted into multi-byte UTF8, when building X11 without
XInput & X_HAVE_UTF8_STRING it seems like could still occur.
While GHOST/SDL doesn't support non-ASCII text input,
use the UTF8 buffer to be consistent with all other back-ends.
Move the conversion from SDL_KeyboardEvent to ASCII into a function.
Also only lookup this value on key press (not release).
Measurements shown on average a 1.08x speedup for a 1.04x increase in
memory usage which is an acceptable trade off for a default setting,
although discoverability of such settings influencing memory usage could
be improved.
Reviewed By: brecht
Differential Revision: https://developer.blender.org/D15429
The current specific CentOS7 workaround we have for AoT, which is to
disable __FAST_MATH__ by using -fhonor-nans, now also fixes the
compilation issue for JIT as well since at least driver 23570.
Simplify logic for initializing the wl_buffer, ensure the cursors
custom data is never heft in a half initialized state.
Also remove the need for multiple calls to close when handling errors.
To avoid Cycles not showing any hair by default, and to avoid very slow render
due to many overlaps with the previous 1 meter default in the node.
Fixes T97584, T99319
Differential Revision: https://developer.blender.org/D15405
with a very high min-driver version requirement, placeholder until JIT
CentOS runtime compilation issue gets fixed in a defined version.
min-driver version check can be worked around by setting
CYCLES_ONEAPI_ALL_DEVICES environment variable.
Add logging to all Wayland listener callbacks as it can be difficult
to detect the cause of problems.
Using break-points often isn't practical for debugging interactive
windowing / compositor issues
Logging needs to be enabled on the command line, e.g:
blender --log "ghost.wl.*" --log-level 2 --log-show-basename
Add macros from BLI_utildefines, mainly to avoid that avoid repetition
(ELEM, UNPACK*, CLAMP* & ARRAY_SIZE).
Also add macros LIKELY/UNLIKELY as there are quiet a lot of checks
for unlikely situations for GHOST/Wayland (not having a keyboard,
or mouse for e.g.).
Pass in arguments for internal grab logic instead of accessing
some values from the window and other values as arguments.
While more verbose it's simpler to reason about.
We used it only to access device id for explicitly allowing Arc GPUs.
It made the backend require ze_loader.dll which could be problematic if
we end up using direct linking.
I've replaced filtering based on PCI device id by using other HW properties
instead (EUs, threads per EU), that are now available through Level-Zero.
Initially oneAPI implementation have waited after each memory
operation, even if there was no need for this. Now, the implementation
will wait only if it is really necessary - it have improved
performance noticeble for some scenes and a bit for the rest of them.
Add intern/wayland_dynload which is used when WITH_GHOST_WAYLAND_DYNLOAD
is enabled (off by default). When enabled, systems without Wayland
installed will fall back to X11.
This allows Blender to dynamically load:
- libwayland-client
- libwayland-cursor
- libwayland-egl
- libdecor-0 (when WITH_GHOST_WAYLAND_LIBDECOR is enabled).