Commit Graph

1400 Commits

Author SHA1 Message Date
Sergey Sharybin
2050eca8bb Merge branch 'blender-v4.4-release' 2025-02-13 18:43:51 +01:00
Sergey Sharybin
ee8b9a3799 Fix: Adding new objects in HIP-RT rendered viewport errors out
The issue was caused by c0ba800f64.

Simple solution for now: check the data size and free the memory to
allow the device memory to be re-allocated. Seems the safest for the
upcoming release.

Ideally we'd need to avoid having these manual tricks with the device
memory pointers.

Pull Request: https://projects.blender.org/blender/blender/pulls/134516
2025-02-13 18:42:05 +01:00
Campbell Barton
61b2bf4953 Merge branch 'blender-v4.4-release' 2025-02-13 11:23:45 +11:00
Campbell Barton
c83c62439e Cleanup: correct typo 2025-02-13 11:14:50 +11:00
Brecht Van Lommel
68a510fe9b Merge branch 'blender-v4.4-release' 2025-02-12 21:50:28 +01:00
Nikita Sirgienko
2bab4ae370 Cycles: oneAPI: Optimize texture access by using GPU HW sampler
The current usage of software-based texture operations in
the oneAPI implementation puts additional register pressure on
the GPU compiler during register allocation. And it also creates
code that requires maintenance. This commit is intended to address
this situation by utilizing a recently productized SYCL bindless
texture API to enable HW-based texture operations using
Intel GPUs' hardware sampler.

This currently translates to 1-11% rendering speedups (scene-specific)
on my Arc A770 and Arc B580. At the moment, there are small
performance regressions with NanoVDB texture operations on Arc B580
and small performance regressions in shade surface MNEE and Raytrace
kernels on Arc A770, but they look recoverable and will be handled
in the future.

Pull Request: https://projects.blender.org/blender/blender/pulls/133457
2025-02-12 21:47:34 +01:00
Sean Kim
a2b75c87c8 Merge branch 'blender-v4.4-release' 2025-02-11 14:10:10 -08:00
Sergey Sharybin
a535a1a027 Fix #132782: MetalRT: Missing Geometry in Cycles preview on MacOS 15.2
The issue also happens on macOS 15.3.

This is a Metal driver bug, a fix is coming in macOS 15.4. Until then
disable refitting the viewport. There is no perceptible benefit from
refitting, so while it might be less that ideal it allows to side step
the problem and still benefit from the HWRT.

Pull Request: https://projects.blender.org/blender/blender/pulls/134399
2025-02-11 21:42:38 +01:00
Brecht Van Lommel
2c34786474 Merge branch 'blender-v4.4-release' 2025-02-11 20:43:17 +01:00
Brecht Van Lommel
9ad19396f5 Fix: Cycles Metal invalid storage mode check
Pull Request: https://projects.blender.org/blender/blender/pulls/134337
2025-02-11 20:42:01 +01:00
Brecht Van Lommel
21a90f26b6 Cleanup: Fix C++20 deprecation warnings in Cycles
Pull Request: https://projects.blender.org/blender/blender/pulls/134338
2025-02-11 16:42:03 +01:00
Nikita Sirgienko
bee534eea5 Build: Upgrade Intel Graphics Compiler to 2.1.14 on Linux
This corresponds the latest rolling 2448.13 release:
https://dgpu-docs.intel.com/releases/packages.html?release=Rolling+2448.13&os=Ubuntu+24.04

Graphics compiler upgrades require increasing the minimum required
driver (compute-runtime) version to the corresponding one to guarantee
compatibility, which is XX.XX.31740.15 in this release, so we bump this
requirement accordingly.

Co-authored-by: Xavier Hallade <me@ph0b.com>
Pull Request: https://projects.blender.org/blender/blender/pulls/134051
2025-02-05 15:00:04 +01:00
Xavier Hallade
e7589f8973 Fix: Cycles: Missing texture transfers in oneAPI backend
Since 2cfe2e0bfe, textures were not being
allocated nor transfered to device.

This fix improves the situation reported in
https://projects.blender.org/blender/blender/issues/133953 but is not
enough to make all unit tests pass.
2025-02-03 20:20:21 +01:00
Brecht Van Lommel
c4c0c23c5a Fix: Cycles: Always try to alloc MEM_DEVICE_ONLY on device
Regardless of what mem info reports. We can't move this to the host, so
might as well try because the free memory might not be a reliable predictor
of success.

Pull Request: https://projects.blender.org/blender/blender/pulls/132912
2025-01-29 14:12:25 +01:00
Brecht Van Lommel
e8ebcb3ee3 Fix: Cycles: Check if memory is host mapped without access to device_mem_map
This avoids concurrency issues.

Pull Request: https://projects.blender.org/blender/blender/pulls/132912
2025-01-29 14:12:23 +01:00
Brecht Van Lommel
8b7fce492e Refactor: Cycles: Change API so host and device memory are freed together
With host mapped memory these can be shared, and we can't get back the
original host pointer unless we make a copy which is inefficient.

Also add asserts to verify this doesn't happen.

Pull Request: https://projects.blender.org/blender/blender/pulls/132912
2025-01-29 14:12:19 +01:00
Brecht Van Lommel
c0ba800f64 Refactor: Cycles: Avoid double host alloc in HIP-RT
This code should be changed to not modify host pointers directly. But as
long as we are going to do it, avoid unnecessary alloc and immediate free.

Pull Request: https://projects.blender.org/blender/blender/pulls/132912
2025-01-29 14:12:16 +01:00
Brecht Van Lommel
b06def6b3e Refactor: Cycles: Remove confusing test for condition that should not happen
Device shouldn't have to allocate host pointer.

Pull Request: https://projects.blender.org/blender/blender/pulls/132912
2025-01-29 14:12:12 +01:00
Brecht Van Lommel
1ec04e0eec Fix: Cycles: Only move textures to host on one device at a time
This was not thread safe. And it's better to do them one by one to avoid
moving more than is needed, when another thread already freed up enough.

Thanks to Jorn Visser for investigating and finding this problem.

Pull Request: https://projects.blender.org/blender/blender/pulls/132912
2025-01-29 14:12:09 +01:00
Brecht Van Lommel
cd3d3b2646 Refactor: Cycles: Delay load_texture_info() to enqueue
Doing it immediately after moving textures to the host is less efficient, and
interacts in confusing ways.

Pull Request: https://projects.blender.org/blender/blender/pulls/132912
2025-01-29 14:12:06 +01:00
Brecht Van Lommel
fec593ec3b Fix: Cycles: Avoid unnecessary move to host with multi-device
If one of the devices already used host happed memory but another not,
it would previously realloc both.

Thanks to Jorn Visser for investigating and finding this problem.

Pull Request: https://projects.blender.org/blender/blender/pulls/132912
2025-01-29 14:12:02 +01:00
Brecht Van Lommel
2cfe2e0bfe Fix: Cycles: Re-copy memory from host to device without realloc
Should be a bit more efficient, and it fixes host memory fallback bugs,
where host memory was incorrectly freed during re-copy. For the case
where memory should get reallocated on the host, a new mem_move_to_host
was added.

Thanks to Jorn Visser for investigating and finding this problem.

Pull Request: https://projects.blender.org/blender/blender/pulls/132912
2025-01-29 14:11:50 +01:00
Brecht Van Lommel
0e8a7c751a Refactor: Cycles: Simplify util_guarded_mem_alloc/free calls
Pull Request: https://projects.blender.org/blender/blender/pulls/132912
2025-01-29 14:11:47 +01:00
Brecht Van Lommel
1fc73188e3 Cleanup: Code style 2025-01-29 14:10:13 +01:00
Brecht Van Lommel
bd0cca5d6d Fix: Cycles Metal RT assert with persistent data render
Pull Request: https://projects.blender.org/blender/blender/pulls/133490
2025-01-23 15:29:41 +01:00
Brecht Van Lommel
f2bf9d747e Cleanup: Cycles: Remove some unused kernel entry points on CPU 2025-01-13 10:07:37 +01:00
Brecht Van Lommel
2bf6d0fd71 Cleanup: Cycles: Remove unnecessary SSE4.2 CPU kernel
This is the minimum requirement, so just the regular kernel already
includes these instructions if supported by the CPU architecture.
2025-01-13 10:07:37 +01:00
Xavier Hallade
ce463bd6b1 Cycles: oneAPI: optimize device<->host copies
There is a large overhead when doing copies between a device and non-USM host memory.
Using the prepare/release API avoids it, as presented in the optimization guide:
https://www.intel.com/content/www/us/en/docs/oneapi/optimization-guide-gpu/2025-0/optimizing-data-transfers.html

This currently translates to a 4-5% overall rendering speedups on my Arc B580 in most scenes.

Pull Request: https://projects.blender.org/blender/blender/pulls/132859
2025-01-09 21:00:12 +01:00
Stefan Werner
a79d95099f Cycles: Fix OneAPI crash after unique_ptr refactor
Memory was freed too early, probably a typo.
2025-01-07 09:37:47 +01:00
Michael Jones
fd06944d15 Fix #131458: Cycles Metal workaround for binary archives crash
There is a macOS bug that causes `[binaryArchive serializeToURL]` to crash sometimes. The fix is coming in macOS 15.4.

Pull Request: https://projects.blender.org/blender/blender/pulls/132688
2025-01-06 14:12:22 +01:00
Brecht Van Lommel
d48e73977c Fix: Build errors on Linux/GCC after recent Cycles refactoring 2025-01-03 11:52:13 +01:00
Brecht Van Lommel
9971648783 Refactor: Cycles: Replace new/delete by unique_ptr, in simple cases
Pull Request: https://projects.blender.org/blender/blender/pulls/132361
2025-01-03 10:23:30 +01:00
Brecht Van Lommel
a8654a1dbe Refactor: Cycles: Make CPU kernel globals storage more sane
Pull Request: https://projects.blender.org/blender/blender/pulls/132361
2025-01-03 10:23:27 +01:00
Brecht Van Lommel
57ff24cb99 Refactor: Cycles: Add const keyword to more function parameters
Pull Request: https://projects.blender.org/blender/blender/pulls/132361
2025-01-03 10:23:24 +01:00
Brecht Van Lommel
dd51c8660b Refactor: Cycles: Add const keyword where possible, using clang-tidy
Check was misc-const-correctness, combined with readability-isolate-declaration
as suggested by the docs.

Temporarily clang-format "QualifierAlignment: Left" was used to get consistency
with the prevailing order of keywords.

Pull Request: https://projects.blender.org/blender/blender/pulls/132361
2025-01-03 10:23:20 +01:00
Brecht Van Lommel
689633d802 Refactor: Cycles: Avoid unsafe memcpy and memcmp
Pull Request: https://projects.blender.org/blender/blender/pulls/132361
2025-01-03 10:23:15 +01:00
Brecht Van Lommel
d9150484a2 Cleanup: Cycles: Remove some unnecessary #if 0 and #if 1
Pull Request: https://projects.blender.org/blender/blender/pulls/132361
2025-01-03 10:23:09 +01:00
Brecht Van Lommel
60bec183cb Refactor: Cycles: Replace foreach() by range based for loops
Pull Request: https://projects.blender.org/blender/blender/pulls/132361
2025-01-03 10:23:05 +01:00
Brecht Van Lommel
d0c2e68e5f Refactor: Cycles: Automated clang-tidy fixups in Cycles
* Use .empty() and .data()
* Use nullptr instead of 0
* No else after return
* Simple class member initialization
* Add override for virtual methods
* Include C++ instead of C headers
* Remove some unused includes
* Use default constructors
* Always use braces
* Consistent names in definition and declaration
* Change typedef to using

Pull Request: https://projects.blender.org/blender/blender/pulls/132361
2025-01-03 10:22:55 +01:00
Brecht Van Lommel
5c46063607 Refactor: Cycles: Make kernel headers work by themselves
Shuffle around some code and add more includes so that individual
header files compile without errors.

Pull Request: https://projects.blender.org/blender/blender/pulls/132361
2025-01-03 10:22:50 +01:00
Brecht Van Lommel
f53e13411b Refactor: Cycles: Use #pragma once
Pull Request: https://projects.blender.org/blender/blender/pulls/132361
2025-01-03 10:22:45 +01:00
Brecht Van Lommel
3c2a6fbb9c Refactor: Cycles: Use nullptr instead of NULL
Pull Request: https://projects.blender.org/blender/blender/pulls/132361
2025-01-03 10:22:43 +01:00
Brecht Van Lommel
4e777476b5 Refactor: Cycles: Replace std::bind by lambdas
Pull Request: https://projects.blender.org/blender/blender/pulls/132361
2025-01-03 10:22:35 +01:00
salipourto
4e5a9c5dfb Cycles: Handling SDK/ROCm 6+ lack of backward compatibility with pre ROCm 6
This commit introduces proper handling of ROCm 5 and ROCm 6 runtimes on
Linux, based on the version of the ROCm compiler used at build time.
Previously, HIPEW (the HIP equivalent of Cuda Wrangler) defaulted to
loading the ROCm 5 runtime. If ROCm 5 was unavailable, it would attempt
to load ROCm 6. However, ROCm 6 introduces changes in certain
structures and functions that are not backward compatible, leading to
potential issues when kernels compiled with the ROCm 6 compiler are
executed on the ROCm 5 runtime.

### Summary of Changes:

**Separation of Structures and Functions:**
Structures and functions are now separated into hipew5 and hipew6 to
accommodate the differences between ROCm versions.

**Build-Time Version Detection:**
The ROCm version is determined during build time, and the corresponding
hipew5 or hipew6 is included accordingly.

**Runtime Default to ROCm 6:**
By default, HIPEW now loads the ROCm 6 runtime and
includes hipew6 (Linux only).

**JIT Compilation Behavior:**
Since ROCm 6 is the default version, JIT compilation is supported only
when the ROCm 6 compiler is detected at runtime.

**HIP-RT Update:**
HIP-RT has been updated to load the ROCm 6 runtime by default.

These changes ensure compatibility and stability when switching
between ROCm versions, avoiding issues caused by runtime
and compiler mismatches.

Co-authored-by: Alaska <alaskayou01@gmail.com>
Co-authored-by: Sergey Sharybin <sergey@blender.org>
Pull Request: https://projects.blender.org/blender/blender/pulls/130153
2024-12-17 16:19:36 +01:00
Alaska
c42894a695 Fix: Various issues with Cycles HIP JIT compilation
On Linux, Cycles HIP has a JIT compilation feature.
This feature is used when Cycles can not find a precompiled kernel
for your GPU. Which is most common when using hardware that wasn't
out at the time that a version of Blender was released.

There were various issues with this JIT compilation system, this commit
aims to solve them. The changes include:
- Enable `WITH_NANOVDB` when Blender is built with NanoVDB.
  - This fixes a issue where VDB objects would not render.
- Enable some extra debug options for developers when desired
(This is so we match the CUDA implementation of the same feature).
- Reduce the optimizaiton level from -O3 to the default.
  - This is to avoid any extra issues that may occur as a result
  of an increase optimization level that isn't tested with
  precompiled kernels.
- Reduce the optimization level even further to -O1 for Vega.
  - This was done on precompiled kernels to work around some issues,
  so I decided to apply it to JIT kernels as well.
  - Note: Although Vega is not officially supported, this may help
  people that unofficially use Vega.
- Added some previously missing compiler arguments and fixed errors that
were introduced when enabling these compiler arguments.
- Fixed a issue where JIT compilation would fail if Blener was
installed in a path that had a space in it.

Pull Request: https://projects.blender.org/blender/blender/pulls/131853
2024-12-17 01:02:39 +01:00
Lukas Stockner
0de1cea5c5 Cycles: Use fused OptiX OSL programs
Based on #123377 by @brecht, but Gitea doesn't like the rebase these
so here's a new PR.

The purpose here is to switch to fused OptiX programs for OSL execution
on CUDA. On the one hand, this makes the code easier since, but there's
also another advantage - how memory allocation is managed.

OSL shaders need memory to store intermediate values, but how much is
needed depends on the complexity of the shader. With the split program
approach, Cycles had to provide that memory, so we had to allocate a
certain amount (2 KiB, to be precise) statically and show an error if
the shader would need more. If the shader used less (which is the case
for the vast majority), the memory was just wasted.

By switching to fused kernels, OSL knows the required amount during JIT
codegen, so it can allocate only what's required, which avoids this
waste. One still needs to set a maximum, and in theory, OSL would also
support spilling over into a Cycles-provided alternative memory region.
However, we currently don't implement that - instead, we default to the
same 2048 limit as before and let advanced users override it via the
CYCLES_OSL_GROUPDATA_ALLOC environment variable if really needed.

Co-authored-by: Brecht Van Lommel <brecht@blender.org>
Pull Request: https://projects.blender.org/blender/blender/pulls/130149
2024-11-26 23:58:32 +01:00
Patrick Mours
6f0ed29378 Cycles: Add OptiX 8.1 support
The function table symbol declared in the headers was renamed starting
in OptiX 8.1, from `g_optixFunctionTable` to
`g_optixFunctionTable_<ABI version>`. This adds support for that by
using the new macro for the name when available (after OptiX 8.1) and
falling back to the old name when it is not (before OptiX 8.1).

Pull Request: https://projects.blender.org/blender/blender/pulls/130451
2024-11-18 17:20:49 +01:00
Sergey Sharybin
aec4ba39b9 Merge branch 'blender-v4.3-release' 2024-11-04 17:54:52 +01:00
Michael Jones
d1368883ed Cycles: MetalRT: Fix logic bug when deciding if HW RT should be used
Don't try to use MetalRT by default unless the device explicitly reports that RT is supported. We shouldn't just rely on an assumption that it's supported for M3 and beyond, ad infinitum.

Pull Request: https://projects.blender.org/blender/blender/pulls/129688
2024-11-04 17:54:12 +01:00
Clément Foucault
47f7aaa2cc Merge branch 'blender-v4.3-release' 2024-11-01 12:16:38 +01:00