Commit Graph

2660 Commits

Author SHA1 Message Date
Patrick Mours
f565620435 Fix T92985: CUDA errors with Cycles film convert kernels
rB3a4c8f406a3a3bf0627477c6183a594fa707a6e2 changed the macros that create the film
convert kernel entry points, but in the process accidentally changed the parameter definition
to one of those (which caused CUDA launch and misaligned address errors) and changed the
implementation as well. This restores the correct implementation from before.

In addition, the `ccl_gpu_kernel_threads` macro did not work as intended and caused the
generated launch bounds to end up with an incorrect input for the second parameter (it was
set to "thread_num_registers", rather than the result of the block number calculation). I'm
not entirely sure why, as the macro definition looked sound to me. Decided to simply go with
two separate macros instead, to simplify and solve this.

Also changed how state is captured with the `ccl_gpu_kernel_lambda` macro slightly, to avoid
a compiler warning (expression has no effect) that otherwise occurred.

Maniphest Tasks: T92985

Differential Revision: https://developer.blender.org/D13175
2021-11-10 15:49:50 +01:00
Michael Jones
3a4c8f406a Cycles: Adapt shared kernel/device/gpu layer for MSL
This patch adapts the shared kernel entrypoints so that they can be compiled as MSL (Metal Shading Language). Where possible, the adaptations avoid changes in common code.

In MSL, kernel function inputs are explicitly bound to resources. In the case of argument buffers, we declare a struct containing the kernel arguments, accessible via device pointer. This differs from CUDA and HIP where kernel function arguments are declared as traditional C-style function parameters. This patch adapts the entrypoints declared in kernel.h so that they can be translated via a new `ccl_gpu_kernel_signature` macro into the required parameter struct + kernel entrypoint pairing for MSL.

MSL buffer attribution must be applied to function parameters or non-static class data members. To allow universal access to the integrator state, kernel data, and texture fetch adapters, we wrap all of the shared kernel code in a `MetalKernelContext` class. This is achieved by bracketing the appropriate kernel headers with "context_begin.h" and "context_end.h" on Metal. When calling deeper into the kernel code, we must reference the context class (e.g. `context.integrator_init_from_camera`). This extra prefixing is performed by a set of defines in "context_end.h". These will require explicit maintenance if entrypoints change. We invite discussion on more maintainable ways to enforce correctness.

Lambda expressions are not supported on MSL, so a new `ccl_gpu_kernel_lambda` macro generates an inline function object and optionally capturing any required state. This yields the same behaviour. This approach is applied to all parallel_... implementations which are templated by operation. The lambda expressions in the film_convert... kernels don't adapt cleanly to use function objects. However, these entrypoints can be macro-generated more concisely to avoid lambda expressions entirely, instead relying on constant folding to handle the pixel/channel conversions.

A separate implementation of `gpu_parallel_active_index_array` is provided for Metal to workaround some subtle differences in SIMD width, and also to encapsulate some required thread parameters which must be declared as explicit entrypoint function parameters.

Ref T92212

Reviewed By: brecht

Maniphest Tasks: T92212

Differential Revision: https://developer.blender.org/D13109
2021-11-09 21:43:10 +00:00
Brecht Van Lommel
5f44298280 Fix T92645: Cycles OSL crash due use of uninitialized pointer
Thanks to Ilja Razinkov for identifying the problem and solution.
2021-11-09 15:29:41 +01:00
Patrick Mours
440a3475b8 Cycles: Improve OptiX denoising with dark images and fix crash when denoiser is destroyed
Adds a pass before denoising that calculates the intensity of the image, which can be
passed into the OptiX denoiser for more optimal results for very dark or very bright images.

In addition this also fixes a crash that sometimes occurred on exit. The OptiX denoiser object
has to be destroyed before the OptiX device context object (since it references that). But in
C++ the destructor function of a class is called before its fields are destructed, so
"~OptiXDevice" was always called before "OptiXDevice::~Denoiser" and therefore
"optixDeviceContextDestroy" was called before "optixDenoiserDestroy", hence the crash.

Differential Revision: https://developer.blender.org/D13160
2021-11-09 14:49:00 +01:00
Brecht Van Lommel
c56cf50bd0 Fix T92876: Cycles incorrect volume emission + absorption handling 2021-11-09 13:04:58 +01:00
Brecht Van Lommel
97ff37bf54 Cycles: perform CPU film reading in the kernel, to use AVX2 half conversion
Adds a bunch of CPU kernel function to process on row of pixels, and use those
instead of calling unoptimized implementations.

Fixes T92598
2021-11-05 22:04:36 +01:00
Brecht Van Lommel
d1a9425a2f Fix T91733, T92486: Cycles wrong shadow catcher with volumes
Changes:
* After hitting a shadow catcher, re-initialize the volume stack taking
  into account shadow catcher ray visibility. This ensures that volume objects
  are included in the stack only if they are shadow catchers.
* If there is a volume to be shaded in front of the shadow catcher, the split
  is now performed in the shade_volume kernel after volume shading is done.
* Previously the background pass behind a shadow catcher was done as part of
  the regular path, now it is done as part of the shadow catcher path.

For a shadow catcher path with volumes and visible background, operations are
done in this order now:

* intersect_closest
* shade_volume
* shadow catcher split
* intersect_volume_stack
* shade_background
* shade_surface

The world volume is currently assumed to be CG, that is it does not exist in
the footage. We may consider adding an option to control this, or change the
default. With a volume object this control is already possible.

This includes refactoring to centralize the logic for next kernel scheduling
in intersect_closest.h.

Differential Revision: https://developer.blender.org/D13093
2021-11-05 20:50:19 +01:00
Brecht Van Lommel
4b56eed0f7 Fix T92566: Cycles distant lights too dim in reflections 2021-11-05 20:24:13 +01:00
Brecht Van Lommel
f24ad274cb Fix T92503: Cycles OSL crash with object attributes
Can't cast to float4 because it might not have correct alignment.
2021-11-05 20:07:03 +01:00
Brecht Van Lommel
5c34e34195 Fix part of T91797: Cycles CPU and GPU render differences with camera inside volume 2021-11-04 19:03:49 +01:00
Brecht Van Lommel
ffe115d1a8 Fix T92450: Cycles wrong render with overlapping glass, transparency and volumes
We need to store the continuation probability used to make the termination
decision in intersect_closest, instead of recomputing it in shade_surface.
Because otherwise a shade_volume in between can change the throughput and
change the probability.
2021-11-04 16:39:49 +01:00
Brecht Van Lommel
48e2a15160 Fix T77681, T92634: noise texture artifacts with high detail
We run into float precision issues here, clamp the number of octaves to
one less, which has little to no visual difference. This was empirically
determined to work up to 16 before, but with additional inputs like
roughness only 15 appears to work.

Also adds misisng clamp for the geometry nodes implementation.
2021-11-02 18:56:25 +01:00
William Leeson
0b060905d9 Fix T92575: Cycles black pixels when rendering with > 65k samples
Differential Revision: https://developer.blender.org/D13039
2021-11-01 08:36:50 +01:00
Brecht Van Lommel
35f4d254fd Fix T92513: Cycles stereo pole merge not rotating along with camera 2021-10-28 22:38:07 +02:00
Brecht Van Lommel
f2cc38a62b Fix T92255: Cycles Christensen-Burley render errors with scaled objects 2021-10-28 21:53:30 +02:00
Brecht Van Lommel
673984b222 Fix T92158: Cycles crash with Fast GI and area light MIS 2021-10-28 21:33:52 +02:00
William Leeson
82cf25dfbf Cycles: Scrambling distance for the PMJ sampler
Adds scrambling distance to the PMJ sampler. This is based
on the work by Mathieu Menuet in D12318 who created the original
implementation for the Sobol sampler.

Reviewed By: brecht

Maniphest Tasks: T92181

Differential Revision: https://developer.blender.org/D12854
2021-10-27 14:21:15 +02:00
William Leeson
7b1c5712f8 Cycles: Replace saturate with saturatef
saturate is depricated in favour of __saturatef this replaces saturate
with __saturatef on CUDA by createing a saturatef function which replaces
all instances of saturate and are hooked up to the correct function on all
platforms.

Reviewed By: brecht

Differential Revision: https://developer.blender.org/D13010
2021-10-27 14:05:46 +02:00
Brecht Van Lommel
d89c4999a7 Fix Cycles runtime GPU kernel compilation after recent refactor 2021-10-26 16:22:50 +02:00
Brecht Van Lommel
dde11219c6 Cleanup: remove files that should not have been added in file renames 2021-10-26 16:22:50 +02:00
William Leeson
366262bef5 Distance Scrambling for for Cycles X - Sobol version
Cycles:Distance Scrambling for Cycles Sobol Sampler

This option implements micro jittering an is based on the INRIA
research paper [[ https://hal.inria.fr/hal-01325702/document | on micro jittering ]]
and work by Lukas Stockner for implementing the scrambling distance.
It works by controlling the correlation between pixels by either using
a user supplied value or an adaptive algorithm to limit the maximum
deviation of the sample values between pixels.

This is a follow up of https://developer.blender.org/D12316

The PMJ version can be found here: https://developer.blender.org/D12511

Reviewed By: leesonw

Differential Revision: https://developer.blender.org/D12318
2021-10-26 16:11:27 +02:00
Brecht Van Lommel
fd25e883e2 Cycles: remove prefix from source code file names
Remove prefix of filenames that is the same as the folder name. This used
to help when #includes were using individual files, but now they are always
relative to the cycles root directory and so the prefixes are redundant.

For patches and branches, git merge and rebase should be able to detect the
renames and move over code to the right file.
2021-10-26 15:37:04 +02:00
Brecht Van Lommel
d7d40745fa Cycles: changes to source code folders structure
* Split render/ into scene/ and session/. The scene/ folder now contains the
  scene and its nodes. The session/ folder contains the render session and
  associated data structures like drivers and render buffers.
* Move top level kernel headers into new folders kernel/camera/, kernel/film/,
  kernel/light/, kernel/sample/, kernel/util/
* Move integrator related kernel headers into kernel/integrator/
* Move OSL shaders from kernel/shaders/ to kernel/osl/shaders/

For patches and branches, git merge and rebase should be able to detect the
renames and move over code to the right file.
2021-10-26 15:36:39 +02:00
Brecht Van Lommel
75704091fc Cycles: add additive AO support through Fast GI settings
Add a Fast GI Method, either Replace for the existing behavior, or Add
to add ambient occlusion like the old world settings.

This replaces the old Ambient Occlusion settings in the world properties.
2021-10-26 14:56:43 +02:00
Brecht Van Lommel
eb1fed9d60 Cycles: restore Denoising Depth pass, when enabling Denoising Data passes
This is still useful in some cases even if not used by OpenImageDenoise. In
the future this may be replaced with a more generic system to control render
passes and filtering, but for now this just does what it did before.
2021-10-26 14:48:44 +02:00
Brecht Van Lommel
16a8d0fab0 Cycles: change Position render pass to be not antialiased
Similar to the Depth, for compositing the interpolated values between a far
and near object can be non-sensical.
2021-10-26 14:48:44 +02:00
Brecht Van Lommel
c4b02bb6bc Fix Cycles HIP binaries always recompiling 2021-10-22 14:32:24 +02:00
Brecht Van Lommel
282516e53e Cleanup: refactor float/half conversions for clarity 2021-10-22 13:03:03 +02:00
Sayak Biswas
d092933abb Cycles: various fixes for HIP and compilation of HIP binaries
* Additional structs added to the hipew loader for device props
* Adds hipRTC functions to the loader for future usage
* Enables CPU+GPU usage for HIP
* Cleanup to the adaptive kernel compilation process
* Fix for kernel compilation failures with HIP with latest master

Ref T92393, D12958
2021-10-22 12:15:29 +02:00
Brecht Van Lommel
be558d2d97 Fix T92363: OptiX fails with ambient occlusion node, after recent changes
This triggered a compiler bug where it does not handle the sub.s16 PTX
instruction. Instead refactor the code so we don't need to do uint16_t
subtraction at all.

Also update OptiX device to remove the AO pass direct callable.

Thanks Patrick Mours for figuring this out.
2021-10-21 21:25:34 +02:00
Brecht Van Lommel
df00463764 Cycles: add shadow path compaction for GPU rendering
Similar to main path compaction that happens before adding work tiles, this
compacts shadow paths before launching kernels that may add shadow paths.

Only do it when more than 50% of space is wasted.

It's not a clear win in all scenes, some are up to 1.5% slower. Likely caused
by different order of scheduling kernels having an unpredictable performance
impact. Still feels like compaction is just the right thing to avoid cases
where a few shadow paths can hold up a lot of main paths.

Differential Revision: https://developer.blender.org/D12944
2021-10-21 15:38:03 +02:00
Brecht Van Lommel
7d111f4ac2 Cleanup: remove unused code 2021-10-20 18:15:21 +02:00
Brecht Van Lommel
52c5300214 Cleanup: some renaming to better distinguish main and shadow paths 2021-10-20 17:50:31 +02:00
Brecht Van Lommel
cccfa597ba Cycles: make ambient occlusion pass take into account transparency again
Taking advantage of the new decoupled main and shadow paths. For CPU we
just store two nested structs in the integrator state, one for direct light
shadows and one for AO. For the GPU we restrict the number of shade surface
states to be executed based on available space in the shadow paths queue.

This also helps improve performance in benchmark scenes with an AO pass,
since it is no longer needed to use the shader raytracing kernel there,
which has worse performance.

Differential Revision: https://developer.blender.org/D12900
2021-10-20 17:50:31 +02:00
Sayak Biswas
ba4e227def HIP device code cleanup and fix for high VRAM usage
This patch cleans up code for HIP device and makes it more consistent with the CUDA code.
It also fixes the issue with high VRAM usage on AMD cards using HIP allowing better performance and usage on cards like 6600XT.
Added a check in intern/cycles/kernel/bvh/bvh_util.h to prevent compiler error with hipcc

Reviewed By: brecht, leesonw

Maniphest Tasks: T92124

Differential Revision: https://developer.blender.org/D12834
2021-10-20 14:04:28 +02:00
Brecht Van Lommel
fd77a28031 Cycles: bake transparent shadows for hair
These transparent shadows can be expansive to evaluate. Especially on the
GPU they can lead to poor occupancy when only some pixels require many kernel
launches to trace and evaluate many layers of transparency.

Baked transparency allows tracing a single ray in many cases by accumulating
the throughput directly in the intersection program without recording hits
or evaluating shaders. Transparency is baked at curve vertices and
interpolated, for most shaders this will look practically the same as actual
shader evaluation.

Fixes T91428, performance regression with spring demo file due to transparent
hair, and makes it render significantly faster than Blender 2.93.

Differential Revision: https://developer.blender.org/D12880
2021-10-19 15:11:09 +02:00
Brecht Van Lommel
d06828f0b8 Cycles: avoid intermediate stack array for writing shadow intersections
Helps save one OptiX payload and is a bit more efficient.

Differential Revision: https://developer.blender.org/D12909
2021-10-19 15:10:55 +02:00
Brecht Van Lommel
943e73b07e Cycles: decouple shadow paths from main path on GPU
The motivation for this is twofold. It improves performance (5-10% on most
benchmark scenes), and will help  to bring back transparency support for the
ambient occlusion pass.

* Duplicate some members from the main path state in the shadow path state.
* Add shadow paths incrementally to the array similar to what we do for
  the shadow catchers.
* For the scheduling, allow running shade surface and shade volume kernels
  as long as there is enough space in the shadow paths array. If not, execute
  shadow kernels until it is empty.

* Add IntegratorShadowState and ConstIntegratorShadowState typedefs that
  can be different between CPU and GPU. For GPU both main and shadow paths
  juse have an integer for SoA access. Bt with CPU it's a different pointer
  type so we get type safety checks in code shared between CPU and GPU.
* For CPU, add a separate IntegratorShadowStateCPU struct embedded in
  IntegratorShadowState.
* Update various functions to take the shadow state, and make SVM take either
  type of state using templates.

Differential Revision: https://developer.blender.org/D12889
2021-10-19 15:09:29 +02:00
Brecht Van Lommel
a395a1b36b Cleanup: fix compiler warnings 2021-10-19 12:59:05 +02:00
Sergey Sharybin
c107a3c4d9 Fix invalid principled diffuse in Cycles OSL
Need to initialize components for the full Diffuse BSDF.

Steps to reproduce:
- Default cube scene
- Switch to Cycles renderer
- Enable OSL backend
- Start viewport render
- Observe cube being much black

Differential Revision: https://developer.blender.org/D12921
2021-10-19 12:10:29 +02:00
Sergey Sharybin
765eba5a6e Cleanup: More readable Cycles OSL BSDF definition
A  Clang-Format configuration to make the closure definition block to
be properly recognized as such.

Also small wrapper macro to avoid comma in the actual definition code
which was causing unwanted indentation of parameters definition.

Requires Clang-Format 7 or newer. The version we ship in the libs is
12, so for recommended development setup it should all be good.

Differential Revision: https://developer.blender.org/D12920
2021-10-19 11:59:26 +02:00
Campbell Barton
695dc07cb1 Cleanup: clang-format 2021-10-19 18:31:15 +11:00
Brecht Van Lommel
41eba47a87 Revert "Cycles: optimize volume stack copying for shadow catcher/compaction"
This reverts commit 3065d26097. Causing crashes
in the spring scene.
2021-10-18 22:38:33 +02:00
Brecht Van Lommel
a9cb330815 Cleanup: minor refactoring in preparation of main and shadow path decoupling
Ref D12889
2021-10-18 19:02:10 +02:00
Brecht Van Lommel
2430f75279 Cycles: reduce GPU state memory a little
* isect Ng is no longer needed for shadows, for main path needed for SSS only
* Reduce rng_offset and queued_kernel to 16 bits

Ref D12889
2021-10-18 19:02:10 +02:00
Brecht Van Lommel
3065d26097 Cycles: optimize volume stack copying for shadow catcher/compaction
Only copy the number of items used instead of the max items.

Ref D12889
2021-10-18 19:02:10 +02:00
Brecht Van Lommel
fc4b1fede3 Cleanup: consistently use uint32_t for path flag 2021-10-18 19:02:10 +02:00
Brecht Van Lommel
1df3b51988 Cycles: replace integrator state argument macros
* Rename struct KernelGlobals to struct KernelGlobalsCPU
* Add KernelGlobals, IntegratorState and ConstIntegratorState typedefs
  that every device can define in its own way.
* Remove INTEGRATOR_STATE_ARGS and INTEGRATOR_STATE_PASS macros and
  replace with these new typedefs.
* Add explicit state argument to INTEGRATOR_STATE and similar macros

In preparation for decoupling main and shadow paths.

Differential Revision: https://developer.blender.org/D12888
2021-10-18 19:02:10 +02:00
Charlie Jolly
78b5050ff4 Cycles: Voronoi noise, fix uninitialised variable
Caused a debug crash in Windows MSVS.

Reviewed By: brecht

Differential Revision: https://developer.blender.org/D12873
2021-10-15 15:01:10 +01:00
Brecht Van Lommel
2f36762def Cleanup: refactor BVH2 shadow intersection for upcoming changes 2021-10-15 15:42:44 +02:00