Commit Graph

1920 Commits

Author SHA1 Message Date
Brecht Van Lommel
7a6967cbe6 Fix mistake in previous fix for T53600, shows we really need a smarter solution. 2017-12-29 00:07:49 +01:00
Brecht Van Lommel
948515c21a Fix T53600: Cycles shader mixing issue with principled BSDF and zero weights.
SVM nodes need to read all data to get the right offset for the following node.
This is quite weak, a more generic solution would be good in the future.
2017-12-25 23:59:20 +01:00
Lukas Stockner
bf1dc39679 Fix T53567: Negative pixel values causing artifacts with denoising
Now negative color values are clamped to zero before the actual denoising.
2017-12-21 14:24:23 +01:00
Sergey Sharybin
2e8914549b Cycles: Fix difference in image Clip extension method between CPU and GPU
Our own implementation was behaving different comparing to OSL and GPU,
namely on the border pixels OSL and CUDA was doing interpolation with
black, but we were clamping coordinate.

This partially fixes issue reported in T53452.

Similar change should also be done for 3D interpolation perhaps, but this
is to be investigated separately.
2017-12-08 12:03:11 +01:00
Sergey Sharybin
f31fb4a014 Cycles: Cleanup, split 2D interpolation function 2017-12-08 11:22:04 +01:00
Lukas Stockner
fa3d50af95 Cycles: Improve denoising speed on GPUs with small tile sizes
Previously, the NLM kernels would be launched once per offset with one thread per pixel.
However, with the smaller tile sizes that are now feasible, there wasn't enough work to fully occupy GPUs which results in a significant slowdown.

Therefore, the kernels are now launched in a single call that handles all offsets at once.
This has two downsides: Memory accesses to accumulating buffers are now atomic, and more importantly, the temporary memory now has to be allocated for every shift at once, increasing the required memory.
On the other hand, of course, the smaller tiles significantly reduce the size of the memory.

The main bottleneck right now is the construction of the transformation - there is nothing to be parallelized there, one thread per pixel is the maximum.
I tried to parallelize the SVD implementation by storing the matrix in shared memory and launching one block per pixel, but that wasn't really going anywhere.

To make the new code somewhat readable, the handling of rectangular regions was cleaned up a bit and commented, it should be easier to understand what's going on now.
Also, some variables have been renamed to make the difference between buffer width and stride more apparent, in addition to some general style cleanup.
2017-11-30 07:37:08 +01:00
Mathieu Menuet
83e80db56e Fix T53349: AO bounces not working correct with OpenCL. 2017-11-26 15:53:00 +01:00
Brecht Van Lommel
e50ed90e4d Fix T53348: Cycles difference between gradient texture on CPU and GPU. 2017-11-23 17:14:04 +01:00
Brecht Van Lommel
d77f1d6538 Fix T53313: bevel shader with transmission render artifacts. 2017-11-22 01:59:21 +01:00
Stefan Werner
58a15b2bfe Cycles: Fixed compilation of CUDA kernels. Follow-up fix for my last commit. 2017-11-21 10:43:40 +01:00
Mai Lavelle
d8f80fbe72 Cycles: Fix OSL brick node after recent fix 2017-11-21 04:30:12 -05:00
Stefan Werner
1febc85855 Cycles: Workaround for performance loss with the CUDA 9.0 SDK.
CUDA 9.0.176 apparently caused some slow down on high-end Pascal cards that can be mitigated by increasing the number of registers. See https://developer.blender.org/F1142667 for a detailed comparison.
2017-11-21 10:29:11 +01:00
Mai Lavelle
9325b9bf15 Fix T53365: OpenCL has wrong shading of brick texture
Looks like some weird compiler difference with signed vs unsigned ints.
2017-11-21 00:42:55 -05:00
Brecht Van Lommel
d089875c4c Fix build with OSL 1.9.x, automatically aligns to 16 bytes now. 2017-11-20 23:24:24 +01:00
Sergey Sharybin
51e2844387 Cycles: Fix wrong behavior of sharpness in Cubic SSS
Was giving difference when using sharpness of 1.0 and 0.999 even though the
result was expected to be really close to each other.

This SSS profile will probably be removed in the future in favor of more
physically bases Burley, but for the time being don't see anything wrong
fixing an existing code.
2017-11-20 11:40:55 +01:00
Lukas Stockner
40f528a7da Cycles: Add per-tile render time debug pass
Reviewers: sergey, brecht

Differential Revision: https://developer.blender.org/D2920
2017-11-17 16:40:24 +01:00
Lukas Stockner
a0c02e4d1b Cycles: Add Volume Direct and Volume Indirect passes for volume-scattered light
No color pass because it's hard to define what to use as color in a volume.

Reviewers: sergey, brecht

Differential Revision: https://developer.blender.org/D2903
2017-11-17 16:39:45 +01:00
Lukas Stockner
f78e963858 Cycles: Refactor PassType from bitflag to index in order to allow for more passes 2017-11-17 16:34:19 +01:00
Mai Lavelle
470b4cb62f Cycles: Fix crash with split branched path tracing
ShaderData memory was getting clobbered in the branched path code paths.

Was caused by 087331c495
2017-11-16 04:59:31 -05:00
Lukas Stockner
212a8d9e5a Cycles: Make per-object random value output also work for Lamps 2017-11-14 04:17:54 +01:00
Lukas Stockner
d8066fb0f1 Cycles: Refactor closure roughness detection to fix a potential bug with Denoising of specular shaders 2017-11-14 04:17:54 +01:00
Brecht Van Lommel
a466d7ae24 Cycles: better distance sampling for chromatic volume extinction.
Previously we picked one of the RGB channels with equal probability, but this
works poorly in a dense volume after many bounces. Now we take into account
the throughput and single scattering albedo.

This makes it a little more practical to do brute force SSS with volumes, but
is still very inefficient because we do direct light sampling at every volume
bounce even when inside an opaque mesh. In theory there could be a light inside
the mesh so we can't automatically disable direct lighting.
2017-11-10 01:37:10 +01:00
Brecht Van Lommel
21a535840d Fix T53270: crash with multiscatter GGX after recent refactoring.
In fact this was an existing issue when exceeding the number of available
closure, but it's more common now that we set the number to 0 for shadows
and emission
2017-11-09 20:28:00 +01:00
Mai Lavelle
087331c495 Cycles: Replace __MAX_CLOSURE__ build option with runtime integrator variable
Goal is to reduce OpenCL kernel recompilations.

Currently viewport renders are still set to use 64 closures as this seems to
be faster and we don't want to cause a performance regression there. Needs
to be investigated.

Reviewed By: brecht

Differential Revision: https://developer.blender.org/D2775
2017-11-09 01:04:06 -05:00
Brecht Van Lommel
26f39e6359 Cycles: add bevel shader, for raytrace based rounded edges.
The algorithm averages normals from nearby surfaces. It uses the same
sampling strategy as BSSRDFs, casting rays along the normal and two
orthogonal axes, and combining the samples with MIS.

The main concern here is that we are introducing raytracing inside
shader evaluation, which could be quite bad for GPU performance and
stack memory usage. In practice it doesn't seem so bad though.

Note that using this feature can easily slow down renders 20%, and
that if you care about performance then it's better to use a bevel
modifier. Mainly this is useful for baking, and for cases where the
mesh topology makes it difficult for the bevel modifier to work well.

Differential Revision: https://developer.blender.org/D2803
2017-11-07 22:35:12 +01:00
Brecht Van Lommel
f79f386731 Code refactor: rename subsurface to local traversal, for reuse. 2017-11-07 22:35:12 +01:00
Brecht Van Lommel
d0af56fe3b Cycles: antialias normal baking if the mesh has a bump map. 2017-11-07 22:35:12 +01:00
Brecht Van Lommel
e74b229342 Fix incorrect MIS weights in Cycles with multiple lights.
This causes some difference in the classroom scene, where ray visibility
tricks are used and break the MIS balance. Otherwise there doesn't seem
to be much effect, but better to use the right formulas. Problem originally
identified by Lukas.
2017-11-07 22:35:12 +01:00
Brecht Van Lommel
8a72be7697 Cycles: reduce closure memory usage for emission/shadow shader data.
With a Titan Xp, reduces path trace local memory from 1092MB to 840MB.
Benchmark performance was within 1% with both RX 480 and Titan Xp.

Original patch was implemented by Sergey.

Differential Revision: https://developer.blender.org/D2249
2017-11-05 20:48:33 +01:00
Brecht Van Lommel
c571be4e05 Code refactor: sum transparent and absorption weights outside closures. 2017-11-05 18:13:44 +01:00
Brecht Van Lommel
2c02a04c46 Code refactor: remove emission and background closures, sum directly. 2017-11-05 18:13:44 +01:00
Brecht Van Lommel
cac3d4d166 Cycles: fix inefficient attribute map storage, saves 615MB in victor scene. 2017-11-05 18:00:48 +01:00
Brecht Van Lommel
5801ef71e4 Code refactor: device memory cleanups, preparing for mapped host memory. 2017-11-05 15:22:04 +01:00
Sergey Sharybin
71f46bc367 Cycles: Add utility function to distinguish between scatter and absorption volume ID 2017-11-01 11:10:51 +01:00
Sergey Sharybin
5d7138c08a Cycles: Cleanup, make it more obvious what preprocessor belongs to 2017-11-01 11:10:10 +01:00
Sergey Sharybin
7f45acee80 Cycles: Cleanup, delete trailing whitespace 2017-11-01 11:06:55 +01:00
Brecht Van Lommel
bbc7eb8ae5 Cycles: restore SOBOL_SKIP hack, for some cases where it helps still. 2017-10-29 16:44:20 +01:00
Brecht Van Lommel
171c4e982f Cycles: use AO factor to let user adjust intensity of AO bounces.
We are already using the AO distance, so might as well offer this extra
control over the intensity. Useful when an interior scene is supposed to
be significantly darker than the background shader.
2017-10-25 21:46:23 +02:00
Brecht Van Lommel
7ad9333fad Code refactor: store device/interp/extension/type in each device_memory. 2017-10-24 01:03:59 +02:00
Brecht Van Lommel
d85a0a722e Fix part of T53038: principled BSDF clearcoat weight has no effect with 0 roughness. 2017-10-18 23:35:54 +02:00
Campbell Barton
99520e3f92 Cleanup: use 'e' prefix for enum typedefs
Convention was only followed loosely,
apply to DNA where changes aren't likely to conflict.

(Skipped ModifierType for eg).
2017-10-17 13:49:20 +11:00
Brecht Van Lommel
2e50add164 Fix OpenCL performance regression after cubic interpolation.
Reorganize code to reduce register pressure.
2017-10-15 17:46:50 +02:00
Sergey Sharybin
8d73ba58b6 Cycles: Fix compilation of sm_20 and sm_21 kernels
Was broken since the bicubic commit for GPU support.
2017-10-10 12:26:02 +05:00
Brecht Van Lommel
cdb0b3b1dc Code refactor: use DeviceInfo to enable QBVH and decoupled volume shading. 2017-10-08 13:17:33 +02:00
Brecht Van Lommel
f61c340bc1 Cycles: OpenCL bicubic and tricubic texture interpolation support. 2017-10-08 02:55:44 +02:00
Brecht Van Lommel
c040dedc12 Fix incorrect MIS with principled BSDF and specular roughness 0. 2017-10-07 22:10:02 +02:00
Brecht Van Lommel
d7eabc6765 Code cleanup: simplify cmake kernel install. 2017-10-07 15:32:20 +02:00
Brecht Van Lommel
2d92988f6b Cycles: CUDA bicubic and tricubic texture interpolation support.
While cubic interpolation is quite expensive on the CPU compared to linear
interpolation, the difference on the GPU is quite small.
2017-10-07 15:30:57 +02:00
Brecht Van Lommel
23098cda99 Code refactor: make texture code more consistent between devices.
* Use common TextureInfo struct for all devices, except CUDA fermi.
* Move image sampling code to kernels/*/kernel_*_image.h files.
* Use arrays for data textures on Fermi too, so device_vector<Struct> works.
2017-10-07 14:53:14 +02:00
Sergey Sharybin
837383ac78 Cycles: Cleanup, indendation 2017-10-06 19:33:59 +05:00