Commit Graph

140 Commits

Author SHA1 Message Date
Michael Jones
9558fa5196 Cycles: Metal host-side code
This patch adds the Metal host-side code:

- Add all core host-side Metal backend files (device_impl, queue, etc)
- Add MetalRT BVH setup files
- Integrate with Cycles device enumeration code
- Revive `path_source_replace_includes` in util/path (required for MSL compilation)

This patch also includes a couple of small kernel-side fixes:

- Add an implementation of `lgammaf` for Metal [Nemes, Gergő (2010), "New asymptotic expansion for the Gamma function", Archiv der Mathematik](https://users.renyi.hu/~gergonemes/)
- include "work_stealing.h" inside the Metal context class because it accesses state now

Ref T92212

Reviewed By: brecht

Maniphest Tasks: T92212

Differential Revision: https://developer.blender.org/D13423
2021-12-07 15:52:21 +00:00
Thomas Dinges
83a4d51997 Cleanup: Remove unused show_samples() device code in Cycles. 2021-11-17 11:16:48 +01:00
Hans Goudey
9e611c5616 Merge branch 'blender-v3.0-release' 2021-11-05 16:33:08 -05:00
Brecht Van Lommel
97ff37bf54 Cycles: perform CPU film reading in the kernel, to use AVX2 half conversion
Adds a bunch of CPU kernel function to process on row of pixels, and use those
instead of calling unoptimized implementations.

Fixes T92598
2021-11-05 22:04:36 +01:00
Thomas Dinges
5327413b37 Cleanup: Remove Cycles device checks for half float.
All supported devices support half float now, so we can remove the check.

Differential Revision: https://developer.blender.org/D13021
2021-11-01 10:18:30 +01:00
Brecht Van Lommel
fd25e883e2 Cycles: remove prefix from source code file names
Remove prefix of filenames that is the same as the folder name. This used
to help when #includes were using individual files, but now they are always
relative to the cycles root directory and so the prefixes are redundant.

For patches and branches, git merge and rebase should be able to detect the
renames and move over code to the right file.
2021-10-26 15:37:04 +02:00
Brian Savery
044a77352f Cycles: add HIP device support for AMD GPUs
NOTE: this feature is not ready for user testing, and not yet enabled in daily
builds. It is being merged now for easier collaboration on development.

HIP is a heterogenous compute interface allowing C++ code to be executed on
GPUs similar to CUDA. It is intended to bring back AMD GPU rendering support
on Windows and Linux.

https://github.com/ROCm-Developer-Tools/HIP.

As of the time of writing, it should compile and run on Linux with existing
HIP compilers and driver runtimes. Publicly available compilers and drivers
for Windows will come later.

See task T91571 for more details on the current status and work remaining
to be done.

Credits:

Sayak Biswas (AMD)
Arya Rafii (AMD)
Brian Savery (AMD)

Differential Revision: https://developer.blender.org/D12578
2021-09-28 19:18:55 +02:00
Brecht Van Lommel
d7f803f522 Fix T91641: crash rendering with 16k environment map in Cycles
Protect against integer overflow.
2021-09-23 17:48:16 +02:00
Campbell Barton
4d66cbd140 Cleanup: spelling in comments 2021-09-22 14:54:01 +10:00
Brecht Van Lommel
0803119725 Cycles: merge of cycles-x branch, a major update to the renderer
This includes much improved GPU rendering performance, viewport interactivity,
new shadow catcher, revamped sampling settings, subsurface scattering anisotropy,
new GPU volume sampling, improved PMJ sampling pattern, and more.

Some features have also been removed or changed, breaking backwards compatibility.
Including the removal of the OpenCL backend, for which alternatives are under
development.

Release notes and code docs:
https://wiki.blender.org/wiki/Reference/Release_Notes/3.0/Cycles
https://wiki.blender.org/wiki/Source/Render/Cycles

Credits:
* Sergey Sharybin
* Brecht Van Lommel
* Patrick Mours (OptiX backend)
* Christophe Hery (subsurface scattering anisotropy)
* William Leeson (PMJ sampling pattern)
* Alaska (various fixes and tweaks)
* Thomas Dinges (various fixes)

For the full commit history, see the cycles-x branch. This squashes together
all the changes since intermediate changes would often fail building or tests.

Ref T87839, T87837, T87836
Fixes T90734, T89353, T80267, T80267, T77185, T69800
2021-09-21 14:55:54 +02:00
Sergey Sharybin
ff51c2e89a Cleanup: Use named unused arguments in Cycles Device 2021-05-21 11:19:33 +02:00
Brecht Van Lommel
0456223cde Fix T87793: Cycles OptiX crash hiding objects in viewport render 2021-05-19 18:30:43 +02:00
Brecht Van Lommel
3e472d87a8 Cycles OpenCL: disable AO preview kernels
These seem to be causing some stability issues, and really are just not that
useful in practice. Compiling them is slow already, so it does not improve
the user experience much to show an AO preview if it's not nearly instant.
2021-05-19 18:30:43 +02:00
Brecht Van Lommel
91c44fe885 Cycles: disable NanoVDB for AMD OpenCL
It is causing issue with AMD OpenCL drivers, due to a potential driver bug.

Ref T84461
2021-03-30 00:00:17 +02:00
Brecht Van Lommel
10d2cbfa36 Fix T84872: OptiX GPU + CPU rendering uses branched path samples
Branched path tracing is not supported for OptiX, and it would still use the
number of AA samples from there when branched path was enabled by the user
earlier but auto disabled and hidden in the UI when using OptiX.

Ref D10159
2021-01-20 14:59:23 +01:00
Patrick Mours
bfb6fce659 Cycles: Add CPU+GPU rendering support with OptiX
Adds support for building multiple BVH types in order to support using both CPU and OptiX
devices for rendering simultaneously. Primitive packing for Embree and OptiX is now
standalone, so it only needs to be run once and can be shared between the two. Additionally,
BVH building was made a device call, so that each device backend can decide how to
perform the building. The multi-device for instance creates a special multi-BVH that holds
references to several sub-BVHs, one for each sub-device.

Reviewed By: brecht, kevindietrich

Differential Revision: https://developer.blender.org/D9718
2020-12-11 13:24:29 +01:00
Brecht Van Lommel
f75b09e7e6 Cycles: abort rendering when --cycles-device not found
Rather than just printing a message and falling back to the CPU. For render
farms it's better to avoid a potentially slow render on the CPU if the intent
was to render on the GPU.

Ref T82193, D9086
2020-10-29 16:01:38 +01:00
Stefan Werner
009971ba7a Cycles: Separate Embree device for each CPU Device.
Before, Cycles was using a shared Embree device across all instances.
This could result in crashes when viewport rendering and material
preview were using Cycles simultaneously.

Fixes issue T80042

Maniphest Tasks: T80042

Differential Revision: https://developer.blender.org/D8772
2020-09-01 21:00:55 +02:00
Kévin Dietrich
c82166ffcd Cycles: move some Scene related methods out of Session
This moves `Session::get_requested_device_features`,
`Session::load_kernels`, and `Session::update_scene` out of `Session`
and into `Scene`, as mentioned in D8544.

Reviewed By: brecht

Differential Revision: https://developer.blender.org/D8590
2020-08-18 11:50:37 +02:00
Brecht Van Lommel
93791381fe Cleanup: reduce hardcoded numbers in denoising neighbor tiles code 2020-07-10 17:10:05 +02:00
Brecht Van Lommel
0a3bde6300 Cycles: add denoising settings to the render properties
Enabling render and viewport denoising is now both done from the render
properties. View layers still can individually be enabled/disabled for
denoising and have their own denoising parameters.

Note that the denoising engine also affects how denoising data passes are
output even if no denoising happens on the render itself, to make the passes
compatible with the engine.

This includes internal refactoring for how denoising parameters are passed
along, trying to avoid code duplication and unclear naming.

Ref T76259
2020-06-24 15:17:36 +02:00
Brecht Van Lommel
207338bb58 Cycles: port curve-ray intersection from Embree for use in Cycles GPU
This keeps render results compatible for combined CPU + GPU rendering.
Peformance and quality primitives is quite different than before. There
are now two options:

* Rounded Ribbon: render hair as flat ribbon with (fake) rounded normals, for
  fast rendering. Hair curves are subdivided with a fixed number of user
  specified subdivisions.

  This gives relatively good results, especially when used with the Principled
  Hair BSDF and hair viewed from a typical distance. There are artifacts when
  viewed closed up, though this was also the case with all previous primitives
  (but different ones).

* 3D Curve: render hair as 3D curve, for accurate results when viewing hair
  close up. This automatically subdivides the curve until it is smooth.

  This gives higher quality than any of the previous primitives, but does come
  at a performance cost and is somewhat slower than our previous Thick curves.

The main problem here is performance. For CPU and OpenCL rendering performance
seems usually quite close or better for similar quality results.

However for CUDA and Optix, performance of 3D curve intersection is problematic,
with e.g. 1.45x longer render time in Koro (though there is no equivalent quality
and rounded ribbons seem fine for that scene). Any help or ideas to optimize this
are welcome.

Ref T73778

Depends on D8012

Maniphest Tasks: T73778

Differential Revision: https://developer.blender.org/D8013
2020-06-22 13:28:01 +02:00
Brecht Van Lommel
e50f1ddc65 Cycles: use TBB for task pools and task scheduler
No significant performance improvement is expected, but it means we have a
single thread pool throughout Blender. And it should make adding more
parallellization in the future easier.

After previous refactoring commits this is basically a drop-in replacement.
One difference is that the task pool had a mechanism for scheduling tasks to
the front of the queue to minimize memory usage. TBB has a smarter algorithm
to balance depth-first and breadth-first scheduling of tasks and we assume that
removes the need to manually provide hints to the scheduler.

Fixes T77533
2020-06-22 13:27:37 +02:00
Patrick Mours
9f7d84b656 Cycles: Add support for P2P memory distribution (e.g. via NVLink)
This change modifies the multi-device implementation to support memory distribution
across devices, to reduce the overall memory footprint of large scenes and allow scenes to
fit entirely into combined GPU memory that previously had to fall back to host memory.

Reviewed By: brecht

Differential Revision: https://developer.blender.org/D7426
2020-06-08 17:55:49 +02:00
Brecht Van Lommel
53981c7fb6 Cleanup: refactor adaptive sampling to more easily change some parameters
No functional changes yet, this is work towards making CPU and GPU results
match more closely.
2020-04-07 20:29:48 +02:00
Dalai Felinto
2d1cce8331 Cleanup: make format after SortedIncludes change 2020-03-19 09:33:58 +01:00
Patrick Mours
38589de10c Cycles: Add support for denoising in the viewport
The OptiX denoiser can be a great help when rendering in the viewport, since it is really fast
and needs few samples to produce convincing results. This patch therefore adds support for
using any Cycles denoiser in the viewport also (but only the OptiX one is selectable because
the NLM one is too slow to be usable currently). It also adds support for denoising on a
different device than rendering (so one can e.g. render with the CPU but denoise with OptiX).

Reviewed By: #cycles, brecht

Differential Revision: https://developer.blender.org/D6554
2020-02-11 18:03:43 +01:00
Patrick Mours
70a32adfeb Fix assert in Cycles memory statistics when using OptiX on multiple GPUs
The acceleration structure built by OptiX may be different between GPUs, so cannot assume the memory size is the same for all.
This fixes that by moving the memory management for all OptiX acceleration structures into the responsibility of each device (was already the case for BLAS previously, now for TLAS too).
2019-11-28 13:57:02 +01:00
Patrick Mours
a2b52dc571 Cycles: add Optix device backend
This uses hardware-accelerated raytracing on NVIDIA RTX graphics cards.

It is still currently experimental. Most features are supported, but a few
are still missing like baking, branched path tracing and using CPU memory.
https://wiki.blender.org/wiki/Reference/Release_Notes/2.81/Cycles#NVIDIA_RTX

For building with Optix support, the Optix SDK must be installed. See here for
build instructions:
https://wiki.blender.org/wiki/Building_Blender/CUDA

Differential Revision: https://developer.blender.org/D5363
2019-09-13 11:50:11 +02:00
Campbell Barton
e12c08e8d1 ClangFormat: apply to source, most of intern
Apply clang format as proposed in T53211.

For details on usage and instructions for migrating branches
without conflicts, see:

https://wiki.blender.org/wiki/Tools/ClangFormat
2019-04-17 06:21:24 +02:00
Brecht Van Lommel
e691929686 Merge branch 'blender2.7' 2019-03-17 12:54:19 +01:00
Brecht Van Lommel
e17f7af0ce Cleanup: remove Cycles advanced shading features toggle.
It's effectively always enabled, only not on some unsupported OpenCL devices.
For testing those it's not useful to disable these features. This is replaced
by the more fine grained feature toggles that we have now.
2019-03-17 01:58:39 +01:00
Jeroen Bakker
5051e580e4 Merge branch 'blender2.7' 2019-03-15 16:28:33 +01:00
Jeroen Bakker
2f6257fd7f Cycles/OpenCL: Compile Kernels During Scene Update
The main goals of this change is faster starting when using foreground
rendering.

This patch will build kernels in parallel to the update process of
the scene. When these optimized kernels are not available (yet) an AO
kernel will be used.

These AO kernels are fast to compile (3-7 seconds) and can be
reused by all scenes. When the final kernels become available we
will switch to these kernels.

In background mode the AO kernels will not be used.
Some kernels are being used during Scene update (displace, background
light). When these kernels are being used the process can halt until
these become available.

Reviewed By: brecht, #cycles

Maniphest Tasks: T61752

Differential Revision: https://developer.blender.org/D4428
2019-03-15 16:18:21 +01:00
Jeroen Bakker
15edae617f Merge branch 'blender2.7' 2019-02-26 14:07:57 +01:00
Jeroen Bakker
dabe5cd31a T61971: Compilation Displacement/Background Kernel
Displacement and Background kernels are selectively used, but always compiled. This patch will not compile these kernels when they are not needed.

Displacement kernel is only used for true displacement.
Background kernel is only used when there is a (Cycles)Light of type `LIGHT_BACKGROUND`.

Reviewed By: brecht, #cycles

Tags: #cycles

Maniphest Tasks: T61971

Differential Revision: https://developer.blender.org/D4412
2019-02-26 14:06:25 +01:00
Brecht Van Lommel
f4b1f1f0be Merge branch 'blender2.7' 2019-01-30 18:36:54 +01:00
Brecht Van Lommel
001414fb2f Cycles: delay CUDA and OpenCL initialization to avoid driver crashes.
We've had many reported crashes on Windows where we suspect there is a
corrupted OpenCL driver. The purpose here is to keep Blender generally
usable in such cases.

Now it always shows None / CUDA / OpenCL in the preferences, and only when
selecting one will it reveal if there are any GPUs available. This should
avoid crashes when opening the preferences or on startup.

Differential Revision: https://developer.blender.org/D4265
2019-01-29 17:00:02 +01:00
Brecht Van Lommel
63c0653170 Merge branch 'master' into blender2.8 2018-11-29 23:54:30 +01:00
Brecht Van Lommel
a8b8da5567 Fix T58183: crash with CPU + GPU rendering after profiling changes.
Multi-device was not passing along profiler to the CPU.
2018-11-29 23:43:27 +01:00
Campbell Barton
9893fee4e6 Merge branch 'master' into blender2.8 2018-11-29 12:55:58 +11:00
Lukas Stockner
7fa6f72084 Cycles: Add sample-based runtime profiler that measures time spent in various parts of the CPU kernel
This commit adds a sample-based profiler that runs during CPU rendering and collects statistics on time spent in different parts of the kernel (ray intersection, shader evaluation etc.) as well as time spent per material and object.

The results are currently not exposed in the user interface or per Python yet, to see the stats on the console pass the "--cycles-print-stats" argument to Cycles (e.g. "./blender -- --cycles-print-stats").

Unfortunately, there is no clear way to extend this functionality to CUDA or OpenCL, so it is CPU-only for now.

Reviewers: brecht, sergey, swerner

Reviewed By: brecht, swerner

Differential Revision: https://developer.blender.org/D3892
2018-11-29 02:45:24 +01:00
Sergey Sharybin
78a6689aea Merge branch 'master' into blender2.8 2018-11-09 14:34:33 +01:00
Sergey Sharybin
2330cadb0f Cycles: Cleanup, don't use strict C prototypes
Those are more like a legacy of language, which is not
needed in C++.
2018-11-09 12:04:41 +01:00
Sergey Sharybin
cb4b5e12ab Cycles: Cleanup, spacing after preprocessor
It is supposed to be two spaces before comment stating which if
else/endif statements corresponds to. Was mainly violated in the
header guards.
2018-11-09 11:34:54 +01:00
Sergey Sharybin
fc12a736bb Merge branch 'master' into blender2.8 2018-10-31 11:49:04 +01:00
Sergey Sharybin
e0cc3e9809 Cycles: Fix wrong BVH used when disabling AVX2 in debug settings
Mainly useful for debugging. Previously, when AVX2 was disabled
in the debug panel but BVH layout was kept on BVH8 nothing was
rendered.

Needed to make it so supported BVH layout mask for devices is
queried in "dynamic", so it is possible to use DebugFlags there.
2018-10-31 11:46:52 +01:00
Campbell Barton
de777ad9e6 Merge branch 'master' into blender2.8 2018-07-06 10:18:52 +02:00
Campbell Barton
1daa20ad9f Cleanup: strip trailing space for cycles 2018-07-06 10:17:58 +02:00
Campbell Barton
2bc952fdb6 Merge branch 'master' into blender2.8 2018-02-18 22:33:05 +11:00