Commit Graph

1915 Commits

Author SHA1 Message Date
Lukas Stockner
fa3d50af95 Cycles: Improve denoising speed on GPUs with small tile sizes
Previously, the NLM kernels would be launched once per offset with one thread per pixel.
However, with the smaller tile sizes that are now feasible, there wasn't enough work to fully occupy GPUs which results in a significant slowdown.

Therefore, the kernels are now launched in a single call that handles all offsets at once.
This has two downsides: Memory accesses to accumulating buffers are now atomic, and more importantly, the temporary memory now has to be allocated for every shift at once, increasing the required memory.
On the other hand, of course, the smaller tiles significantly reduce the size of the memory.

The main bottleneck right now is the construction of the transformation - there is nothing to be parallelized there, one thread per pixel is the maximum.
I tried to parallelize the SVD implementation by storing the matrix in shared memory and launching one block per pixel, but that wasn't really going anywhere.

To make the new code somewhat readable, the handling of rectangular regions was cleaned up a bit and commented, it should be easier to understand what's going on now.
Also, some variables have been renamed to make the difference between buffer width and stride more apparent, in addition to some general style cleanup.
2017-11-30 07:37:08 +01:00
Mathieu Menuet
83e80db56e Fix T53349: AO bounces not working correct with OpenCL. 2017-11-26 15:53:00 +01:00
Brecht Van Lommel
e50ed90e4d Fix T53348: Cycles difference between gradient texture on CPU and GPU. 2017-11-23 17:14:04 +01:00
Brecht Van Lommel
d77f1d6538 Fix T53313: bevel shader with transmission render artifacts. 2017-11-22 01:59:21 +01:00
Stefan Werner
58a15b2bfe Cycles: Fixed compilation of CUDA kernels. Follow-up fix for my last commit. 2017-11-21 10:43:40 +01:00
Mai Lavelle
d8f80fbe72 Cycles: Fix OSL brick node after recent fix 2017-11-21 04:30:12 -05:00
Stefan Werner
1febc85855 Cycles: Workaround for performance loss with the CUDA 9.0 SDK.
CUDA 9.0.176 apparently caused some slow down on high-end Pascal cards that can be mitigated by increasing the number of registers. See https://developer.blender.org/F1142667 for a detailed comparison.
2017-11-21 10:29:11 +01:00
Mai Lavelle
9325b9bf15 Fix T53365: OpenCL has wrong shading of brick texture
Looks like some weird compiler difference with signed vs unsigned ints.
2017-11-21 00:42:55 -05:00
Brecht Van Lommel
d089875c4c Fix build with OSL 1.9.x, automatically aligns to 16 bytes now. 2017-11-20 23:24:24 +01:00
Sergey Sharybin
51e2844387 Cycles: Fix wrong behavior of sharpness in Cubic SSS
Was giving difference when using sharpness of 1.0 and 0.999 even though the
result was expected to be really close to each other.

This SSS profile will probably be removed in the future in favor of more
physically bases Burley, but for the time being don't see anything wrong
fixing an existing code.
2017-11-20 11:40:55 +01:00
Lukas Stockner
40f528a7da Cycles: Add per-tile render time debug pass
Reviewers: sergey, brecht

Differential Revision: https://developer.blender.org/D2920
2017-11-17 16:40:24 +01:00
Lukas Stockner
a0c02e4d1b Cycles: Add Volume Direct and Volume Indirect passes for volume-scattered light
No color pass because it's hard to define what to use as color in a volume.

Reviewers: sergey, brecht

Differential Revision: https://developer.blender.org/D2903
2017-11-17 16:39:45 +01:00
Lukas Stockner
f78e963858 Cycles: Refactor PassType from bitflag to index in order to allow for more passes 2017-11-17 16:34:19 +01:00
Mai Lavelle
470b4cb62f Cycles: Fix crash with split branched path tracing
ShaderData memory was getting clobbered in the branched path code paths.

Was caused by 087331c495
2017-11-16 04:59:31 -05:00
Lukas Stockner
212a8d9e5a Cycles: Make per-object random value output also work for Lamps 2017-11-14 04:17:54 +01:00
Lukas Stockner
d8066fb0f1 Cycles: Refactor closure roughness detection to fix a potential bug with Denoising of specular shaders 2017-11-14 04:17:54 +01:00
Brecht Van Lommel
a466d7ae24 Cycles: better distance sampling for chromatic volume extinction.
Previously we picked one of the RGB channels with equal probability, but this
works poorly in a dense volume after many bounces. Now we take into account
the throughput and single scattering albedo.

This makes it a little more practical to do brute force SSS with volumes, but
is still very inefficient because we do direct light sampling at every volume
bounce even when inside an opaque mesh. In theory there could be a light inside
the mesh so we can't automatically disable direct lighting.
2017-11-10 01:37:10 +01:00
Brecht Van Lommel
21a535840d Fix T53270: crash with multiscatter GGX after recent refactoring.
In fact this was an existing issue when exceeding the number of available
closure, but it's more common now that we set the number to 0 for shadows
and emission
2017-11-09 20:28:00 +01:00
Mai Lavelle
087331c495 Cycles: Replace __MAX_CLOSURE__ build option with runtime integrator variable
Goal is to reduce OpenCL kernel recompilations.

Currently viewport renders are still set to use 64 closures as this seems to
be faster and we don't want to cause a performance regression there. Needs
to be investigated.

Reviewed By: brecht

Differential Revision: https://developer.blender.org/D2775
2017-11-09 01:04:06 -05:00
Brecht Van Lommel
26f39e6359 Cycles: add bevel shader, for raytrace based rounded edges.
The algorithm averages normals from nearby surfaces. It uses the same
sampling strategy as BSSRDFs, casting rays along the normal and two
orthogonal axes, and combining the samples with MIS.

The main concern here is that we are introducing raytracing inside
shader evaluation, which could be quite bad for GPU performance and
stack memory usage. In practice it doesn't seem so bad though.

Note that using this feature can easily slow down renders 20%, and
that if you care about performance then it's better to use a bevel
modifier. Mainly this is useful for baking, and for cases where the
mesh topology makes it difficult for the bevel modifier to work well.

Differential Revision: https://developer.blender.org/D2803
2017-11-07 22:35:12 +01:00
Brecht Van Lommel
f79f386731 Code refactor: rename subsurface to local traversal, for reuse. 2017-11-07 22:35:12 +01:00
Brecht Van Lommel
d0af56fe3b Cycles: antialias normal baking if the mesh has a bump map. 2017-11-07 22:35:12 +01:00
Brecht Van Lommel
e74b229342 Fix incorrect MIS weights in Cycles with multiple lights.
This causes some difference in the classroom scene, where ray visibility
tricks are used and break the MIS balance. Otherwise there doesn't seem
to be much effect, but better to use the right formulas. Problem originally
identified by Lukas.
2017-11-07 22:35:12 +01:00
Brecht Van Lommel
8a72be7697 Cycles: reduce closure memory usage for emission/shadow shader data.
With a Titan Xp, reduces path trace local memory from 1092MB to 840MB.
Benchmark performance was within 1% with both RX 480 and Titan Xp.

Original patch was implemented by Sergey.

Differential Revision: https://developer.blender.org/D2249
2017-11-05 20:48:33 +01:00
Brecht Van Lommel
c571be4e05 Code refactor: sum transparent and absorption weights outside closures. 2017-11-05 18:13:44 +01:00
Brecht Van Lommel
2c02a04c46 Code refactor: remove emission and background closures, sum directly. 2017-11-05 18:13:44 +01:00
Brecht Van Lommel
cac3d4d166 Cycles: fix inefficient attribute map storage, saves 615MB in victor scene. 2017-11-05 18:00:48 +01:00
Brecht Van Lommel
5801ef71e4 Code refactor: device memory cleanups, preparing for mapped host memory. 2017-11-05 15:22:04 +01:00
Sergey Sharybin
71f46bc367 Cycles: Add utility function to distinguish between scatter and absorption volume ID 2017-11-01 11:10:51 +01:00
Sergey Sharybin
5d7138c08a Cycles: Cleanup, make it more obvious what preprocessor belongs to 2017-11-01 11:10:10 +01:00
Sergey Sharybin
7f45acee80 Cycles: Cleanup, delete trailing whitespace 2017-11-01 11:06:55 +01:00
Brecht Van Lommel
bbc7eb8ae5 Cycles: restore SOBOL_SKIP hack, for some cases where it helps still. 2017-10-29 16:44:20 +01:00
Brecht Van Lommel
171c4e982f Cycles: use AO factor to let user adjust intensity of AO bounces.
We are already using the AO distance, so might as well offer this extra
control over the intensity. Useful when an interior scene is supposed to
be significantly darker than the background shader.
2017-10-25 21:46:23 +02:00
Brecht Van Lommel
7ad9333fad Code refactor: store device/interp/extension/type in each device_memory. 2017-10-24 01:03:59 +02:00
Brecht Van Lommel
d85a0a722e Fix part of T53038: principled BSDF clearcoat weight has no effect with 0 roughness. 2017-10-18 23:35:54 +02:00
Campbell Barton
99520e3f92 Cleanup: use 'e' prefix for enum typedefs
Convention was only followed loosely,
apply to DNA where changes aren't likely to conflict.

(Skipped ModifierType for eg).
2017-10-17 13:49:20 +11:00
Brecht Van Lommel
2e50add164 Fix OpenCL performance regression after cubic interpolation.
Reorganize code to reduce register pressure.
2017-10-15 17:46:50 +02:00
Sergey Sharybin
8d73ba58b6 Cycles: Fix compilation of sm_20 and sm_21 kernels
Was broken since the bicubic commit for GPU support.
2017-10-10 12:26:02 +05:00
Brecht Van Lommel
cdb0b3b1dc Code refactor: use DeviceInfo to enable QBVH and decoupled volume shading. 2017-10-08 13:17:33 +02:00
Brecht Van Lommel
f61c340bc1 Cycles: OpenCL bicubic and tricubic texture interpolation support. 2017-10-08 02:55:44 +02:00
Brecht Van Lommel
c040dedc12 Fix incorrect MIS with principled BSDF and specular roughness 0. 2017-10-07 22:10:02 +02:00
Brecht Van Lommel
d7eabc6765 Code cleanup: simplify cmake kernel install. 2017-10-07 15:32:20 +02:00
Brecht Van Lommel
2d92988f6b Cycles: CUDA bicubic and tricubic texture interpolation support.
While cubic interpolation is quite expensive on the CPU compared to linear
interpolation, the difference on the GPU is quite small.
2017-10-07 15:30:57 +02:00
Brecht Van Lommel
23098cda99 Code refactor: make texture code more consistent between devices.
* Use common TextureInfo struct for all devices, except CUDA fermi.
* Move image sampling code to kernels/*/kernel_*_image.h files.
* Use arrays for data textures on Fermi too, so device_vector<Struct> works.
2017-10-07 14:53:14 +02:00
Sergey Sharybin
837383ac78 Cycles: Cleanup, indendation 2017-10-06 19:33:59 +05:00
Sergey Sharybin
a950af8e24 Fix T53012: Shadow catcher creates artifacts on contact area
The issue was caused by light sample being evaluated to nan at some point.
This is root of the cause which is to be fixed, but is very hard to trace down
especially via ssh (the issue only happens on AVX2 release build). Will give it
a closer look when back to my AVX2 machine.

For until then this is a good check to have anyway, it corresponds to what's
happening in regular radiance sum.
2017-10-06 17:27:34 +05:00
Sergey Sharybin
0d3c8d0701 Cycles: Cleanup, indentation and wrapping 2017-10-06 16:54:37 +05:00
Brecht Van Lommel
4537e85584 Fix T53001: more workarounds for crash in AMD compiler with recent drivers. 2017-10-05 17:57:58 +02:00
Brecht Van Lommel
fb99ea79f8 Code refactor: split displace/background into separate kernels, remove luma. 2017-10-05 17:57:58 +02:00
Brecht Van Lommel
6da6f8d33f Cycles: CUDA faster rendering of small tiles, using multiple samples like OpenCL.
The work size is still very conservative, and this doesn't help for progressive
refine. For that we will need to render multiple tiles at the same time. But this
should already help for denoising renders that require too much memory with big
tiles, and just generally soften the performance dropoff with small tiles.

Differential Revision: https://developer.blender.org/D2856
2017-10-04 21:58:47 +02:00