Commit Graph

6887 Commits

Author SHA1 Message Date
Mai Lavelle
ec8ae4d5e9 Cycles: Pack kernel textures into buffers for OpenCL
Image textures were being packed into a single buffer for OpenCL, which
limited the amount of memory available for images to the size of one
buffer (usually 4gb on AMD hardware). By packing textures into multiple
buffers that limit is removed, while simultaneously reducing the number
of buffers that need to be passed to each kernel.

Benchmarks were within 2%.

Fixes T51554.

Differential Revision: https://developer.blender.org/D2745
2017-08-08 07:12:04 -04:00
Sergey Sharybin
451ccf7396 Cycles: Cleanup, move curve intersection functions to own file
This way curve file becomes much shorter and it's also easier to write a
benchmark application to check performance before/after future changes.
2017-08-07 20:53:30 +02:00
Sergey Sharybin
77a7a7f455 Cycles: Cleanup, trailign whitespace 2017-08-07 20:53:30 +02:00
Sergey Sharybin
95fe9b2617 Cycles: Cleanup, remove bvh prefix from curve functions
Those are nothing to do with BVH, and can be used separately.
2017-08-07 20:53:30 +02:00
Sergey Sharybin
a4bbce8949 Cycles: Fix compilation error on NVidia OpenCL after recent refactor
Still need to verify this is proper thing to do for AMD OpenCL. At least now
i can compile OpenCL kernel on my laptop with sm21 card.
2017-08-07 20:52:24 +02:00
Brecht Van Lommel
fc38276d74 Fix Cycles shadow catcher objects influencing each other.
Since all the shadow catchers are already assumed to be in the footage,
the shadows they cast on each other are already in the footage too. So
don't just let shadow catchers skip self, but all shadow catchers.

Another justification is that it should not matter if the shadow catcher
is modeled as one object or multiple separate objects, the resulting
render should be the same.

Differential Revision: https://developer.blender.org/D2763
2017-08-07 17:54:26 +02:00
Brecht Van Lommel
dc4d850d10 Fix Windows build errors with recent Cycles SIMD refactoring. 2017-08-07 17:54:26 +02:00
Sergey Sharybin
580741b317 Cycles: Cleanup, space after keyword 2017-08-07 14:47:51 +02:00
Brecht Van Lommel
ee77c1e917 Code refactor: use float4 instead of intrinsics for CPU denoise filtering.
Differential Revision: https://developer.blender.org/D2764
2017-08-07 14:01:24 +02:00
Brecht Van Lommel
a24fbf3323 Code refactor: add, remove, optimize various SSE functions.
* Remove some unnecessary SSE emulation defines.
* Use full precision float division so we can enable it.
* Add sqrt(), sqr(), fabs(), shuffle variations, mask().
* Optimize reduce_add(), select().

Differential Revision: https://developer.blender.org/D2764
2017-08-07 14:01:24 +02:00
Brecht Van Lommel
a8cc0d707e Code refactor: split defines into separate header, changes to SSE type headers.
I need to use some macros defined in util_simd.h for float3/float4, to emulate
SSE4 instructions on SSE2. But due to issues with order of header includes this
was not possible, this does some refactoring to make it work.

Differential Revision: https://developer.blender.org/D2764
2017-08-07 14:01:24 +02:00
Brecht Van Lommel
5e4bad2c00 Cycles: remove option to disable transparent shadows globally.
We already detect this automatically based on shading nodes and per shader
settings, and performance of this option is ok now all devices.

Differential Revision: https://developer.blender.org/D2767
2017-08-07 14:01:24 +02:00
Brecht Van Lommel
2a74f36dac Fix Cycles CUDA adaptive megakernel build error. 2017-08-07 00:27:08 +02:00
Brecht Van Lommel
45dcd20ca9 Cycles: CUDA split performance tweaks, still far from megakernel.
On Pabellon, 25.8s mega, 35.4s split before, 32.7s split after.
2017-08-05 14:32:59 +02:00
Brecht Van Lommel
cd023b6cec Cycles: remove min bounces, modify RR to terminate less.
Differential Revision: https://developer.blender.org/D2766
2017-08-04 23:11:03 +02:00
Sergey Sharybin
0d01cf4488 Cycles: Extra tweaks to performance of header expansion
Two main things here:

1. Replace all unsafe for #line directive characters into a single loop,
   avoiding multiple iterations and multiple temporary strings created.

2. Don't merge token char by char but calculate start and end point and
   then copy all substring at once.

This gives about 15% speedup of source processing time. At this point
(with all previous commits from today) we've shrinked down compiled
sources size from 108 MB down to ~5.5 MB and lowered processing time
from 4.5 sec down to 0.047 sec on my laptop running Linux (this was a
constant time which Blender will always spent first time loading kernel,
even if we've got compiled clbin).
2017-08-03 08:07:06 +02:00
Campbell Barton
ba98f06acc mikktspace: minor optimization
Add a safe version of normalize since all uses of normalize
did zero length checks, move this into a function.

Also avoid unnecessary conversion.

Gives minor speedup here (approx 3-5%).
2017-08-03 07:03:59 +10:00
Sergey Sharybin
f879cac032 Cycles: Avoid some expensive operations in header expansions
Basically gather lines as-is during traversal, avoiding allocating
memory for all the lines in headers.

Brings additional performance improvement abut 20%.
2017-08-02 20:59:19 +02:00
Sergey Sharybin
a280697e77 Cycles: Support "precompiled" headers in include expansion algorithm
The idea here is that it is possible to mark certain include statements
as "precompiled" which means all subsequent includes of that file will
be replaced with an empty string.

This is a way to deal with tricky include pattern happening in single
program OpenCL split kernel which was including bunch of headers about
10 times.

This brings preprocessing time from ~1sec to ~0.1sec on my laptop.
2017-08-02 20:59:19 +02:00
Sergey Sharybin
4ad39964fd Cycles: Speed up #include expansion algorithm
The idea is to re-use files which were already processed. Gives about 4x speedup
of processing time (~4.5sec vs 1.0sec) on my laptop for the whole OpenCL kernel.

For users it will mean lower delay before OpenCL rendering might start.
2017-08-02 20:59:19 +02:00
Campbell Barton
5c963128ea Cleanup: remove check for old GCC&PPC 2017-07-27 07:29:16 +10:00
Jeff Knox
e93804318f Fix T51450: viewport render time keeps increasing after render is done.
Reviewed By: brecht

Differential Revision: https://developer.blender.org/D2747
2017-07-25 01:47:04 +02:00
James Fulop
c3c0495b30 Fix T51948: pen pressure not detected with some Wacom tablets.
Generalizes current conditions, QT implements it the same way.
2017-07-24 13:54:36 +02:00
Brecht Van Lommel
9e929c911e Fix Cycles multi scatter GGX different render results with Clang and GCC.
The order of evaluation of function arguments is undefined, and the order
was reversed between these compilers. This was causing regressions tests
to give different results between Linux and macOS.
2017-07-23 23:25:12 +02:00
Brecht Van Lommel
e982ebd6d4 Fix T52152: allow zero roughness for Cycles principled BSDF, don't clamp. 2017-07-22 23:58:51 +02:00
Brecht Van Lommel
ec831ee7d1 Fix Cycles denoising NaNs with a 1 sample renders.
This was causing different render results with different compilers. We
can't do much useful with 1 sample, but better for debugging.
2017-07-22 23:58:51 +02:00
Brecht Van Lommel
b4528d8897 Fix use of uninitialized value in Cycles, probably did not cause a bug. 2017-07-22 21:29:10 +02:00
Brecht Van Lommel
db8bc1d982 Fix a few harmless maybe uninitialized warnings with GCC 5.4.
GCC seems to detect uninitialized into function calls now, but then isn't
always smart enough to see that it is actually initialized. Disabling this
warning entirely seems a bit too much, so initialize a bit more now.
2017-07-21 00:54:58 +02:00
Brecht Van Lommel
2b132fc3f7 Fix T52135: Cycles should not keep generated/packed images in memory after render. 2017-07-20 23:47:05 +02:00
Brecht Van Lommel
a4cd7b7297 Fix potential memory leak in Cycles loading of packed/generated images. 2017-07-20 23:46:58 +02:00
Brecht Van Lommel
3b12a71972 Fix T52125: principled BSDF missing with macOS OpenCL. 2017-07-20 15:15:43 +02:00
Stefan Werner
c1ca3c8038 Cycles: fixed the SM_2x CUDA kernel build that I broke in my previous commit 2017-07-20 13:28:34 +02:00
Stefan Werner
4bc6faf9c8 Fix T52107: Color management difference when using multiple and different GPUs together
This commit unifies the flattened texture slot names for bindless and regular CUDA textures. Texture indices are now identical across all CUDA architectures, where before Fermi used different indices, which lead to problems when rendering on multi-GPU setups mixing Fermi with newer hardware.
2017-07-20 10:03:27 +02:00
Brecht Van Lommel
8b2785bda5 Fix T49498: continuous grab issues on macOS, particularly with gaming mouses.
Change the implementation so it no longer takes over the mouse cursor motion
from the OS, instead only move it when warping, similar to Windows and X11.
Probably the reason it was not done this way originally is that you then get
a 500ms delay after warping, but we can use a trick to avoid that and get much
smoother mouse motion than before.
2017-07-18 16:11:33 +02:00
Sergey Sharybin
5f35682f3a Fix T52021: Shadow catcher renders wrong when catcher object is behind transparent object
Tweaked the path radiance summing and alpha to accommodate for possible contribution of
light by transparent surface bounces happening prior to shadow catcher intersection.

This commit will change the way how shadow catcher results looks when was behind semi
transparent object, but the old result seemed to be fully wrong: there were big artifacts
when alpha-overing the result on some actual footage.
2017-07-18 09:46:21 +02:00
Sergey Sharybin
d8906f30d3 Cycles: Remove meaningless camera ray check
In branched path tracing main loop is always a camera ray, with varying
number of transparent bounces.
2017-07-18 09:27:36 +02:00
Mai Lavelle
87164114a3 Cycles: Enable SSS from Principled BSDF only when actually in use
This gives speed up for the split kernel in scenes using the principled BSDF
but without subsurface scattering.
2017-07-12 04:40:24 -04:00
Mai Lavelle
1f933c94a7 Cycles: Fix comparison in principled BSDF
Could have lead to black pixels.
2017-07-11 23:41:22 -04:00
Brecht Van Lommel
29ec0b1162 Fix T52027: OSL getattribute() crash, when optimizer calls it before rendering. 2017-07-11 22:39:51 +02:00
Sergey Sharybin
e26f61a2b5 Cycles: Disable OpenCL clFlush workarounds
This is something which was reported to work fine by Mai, Benjamin and
confirmed by myself. Disabling this workaround gains us some speedup:

                      Before           Now
bmw27                04:28.42        04:07.79
classroom            09:26.48        08:54.53
fishy_cat            08:44.01        08:18.70
koro                 09:17.98        08:57.18
pavillon_barcelone   12:26.64        11:52.81

Test environment is:
- Ubuntu 16.04, with all updates installed
- AMD RX 480 GPU
- amdgpu pro driver version 17.10-450821
2017-07-11 12:16:58 +02:00
Sergey Sharybin
1dfc4be6ab Opensubdiv: Fix compilation error with older Opensubdiv versions 2017-07-11 11:05:39 +02:00
Brecht Van Lommel
0584c5ba8e Fix T51959: Windows + Intel GPU offset between UI drawing and mouse.
Unfortunately this means disabling the code that ensures the title
bar is properly scaled with DPI, however better to have that as a
cosmetic issue than Blender being unusable with a lot of Intel GPUs.
2017-07-08 22:11:07 +02:00
Brecht Van Lommel
3361f2107b Fix T51967: OSL crash after rendering finished (mainly on Windows). 2017-07-08 02:46:06 +02:00
Sergey Sharybin
fee7f688c3 Cycles: Fix ambiguity in call of min() function 2017-07-07 10:40:19 +02:00
Sergey Sharybin
aaec4ed07d Fix T51980: Motion Tracking - png image files appear in the Blender program directory when using refine
Residue of debug code remained form some older bug fix.
2017-07-07 09:27:24 +02:00
Mai Lavelle
9c3f1ad003 Cycles: Add artificial memory limit debug option for OpenCL 2017-07-06 05:25:46 -04:00
Mai Lavelle
95b345b2fe Revert "Cycles: use std::min and max for extra overloads"
We already have this in util_algorithm.h

This reverts commit cff172c762.
2017-07-06 04:21:29 -04:00
Mai Lavelle
f9963f29e8 Cycles: Dont allow global size to fall to zero 2017-07-05 20:19:15 -04:00
Mai Lavelle
222b96e5c7 Cycles: Detect out of memory before buffer allocation in OpenCL devices 2017-07-05 20:19:12 -04:00
Mai Lavelle
cff172c762 Cycles: use std::min and max for extra overloads 2017-07-05 19:43:34 -04:00