All the changes are mainly giving explicit tips on inlining functions,
so they match how inlining worked with previous toolkit.
This make kernel compiled by CUDA 8 render in average with same speed
as previous kernels. Some scenes are somewhat faster, some of them are
somewhat slower. But slowdown is within 1% so far.
On a positive side it allows us to enable newer generation cards on
buildbots (so GTX 10x0 will be officially supported soon).
This adds support for ngons and attributes on subdivision meshes. Ngons are
needed for proper attribute interpolation as well as correct Catmull-Clark
subdivision. Several changes are made to achieve this:
- new primitive `SubdFace` added to `Mesh`
- 3 more textures are used to store info on patches from subd meshes
- Blender export uses loop interface instead of tessface for subd meshes
- `Attribute` class is updated with a simplified way to pass primitive counts
around and to support ngons.
- extra points for ngons are generated for O(1) attribute interpolation
- curves are temporally disabled on subd meshes to avoid various bugs with
implementation
- old unneeded code is removed from `subd/`
- various fixes and improvements
Reviewed By: brecht
Differential Revision: https://developer.blender.org/D2108
This commit adds a new distribution to the Glossy, Anisotropic and Glass BSDFs that implements the
multiple-scattering microfacet model described in the paper "Multiple-Scattering Microfacet BSDFs with the Smith Model".
Essentially, the improvement is that unlike classical GGX, which only models single scattering and assumes
the contribution of multiple bounces to be zero, this new model performs a random walk on the microsurface until
the ray leaves it again, which ensures perfect energy conservation.
In practise, this means that the "darkening problem" - GGX materials becoming darker with increasing
roughness - is solved in a physically correct and efficient way.
The downside of this model is that it has no (known) analytic expression for evalation. However, it can be
evaluated stochastically, and although the correct PDF isn't known either, the properties of MIS and the
balance heuristic guarantee an unbiased result at the cost of slightly higher noise.
Reviewers: dingto, #cycles, brecht
Reviewed By: dingto, #cycles, brecht
Subscribers: bliblubli, ace_dragon, gregzaal, brecht, harvester, dingto, marcog, swerner, jtheninja, Blendify, nutel
Differential Revision: https://developer.blender.org/D2002
The original quad intersection test works by just testing against the two triangles that define the quad.
However, in this case it's actually faster to use the same test that's also used for portals: Determining
the distance to the plane in which the quad lies, calculating the hitpoint and checking whether it's in the
quad by projecting onto the sides.
Reviewers: brecht, sergey, dingto
Reviewed By: dingto
Differential Revision: https://developer.blender.org/D2045
Seems particular CUDA implementations has some precision issues,
which made integer coordinate (which was expected to always be
positive) to go negative.
This patch adds the "Hilbert Spiral", a custom-designed continuous space-filling curve, as a tile order for rendering in Cycles.
It essentially works by dividing the tiles into tile blocks which are processed in a spiral outwards from the center. Inside each
block, the tiles are processed in a regular Hilbert curve pattern. By rotating that pattern according to the spiral direction,
a continuous curve is obtained, which helps with cache coherency and therefore rendering speed.
The curve is a compromise between the faster-rendering Bottom-to-Top etc. orders and the Center order, which is a bit slower,
but starts with the more important areas. The Hilbert Spiral also starts in the center (unless huge tiles are used) and is still
marginally slower than Bottom-to-Top, but noticeably faster than Center.
Reviewers: sergey, #cycles, dingto
Reviewed By: #cycles, dingto
Subscribers: iscream, gregzaal, sergey, mib2berlin
Differential Revision: https://developer.blender.org/D1166
Recent changes to kernel broke compilation of the kernels again, need some
other kind of solution for this issue.
Don't have much time for this currently, but will be addressed before the
release.
Meanwhile it's better to have some buildbot builds instead of totally failing
one.
This more a workaround for CUDA optimizer which can't optimize clamp(x, 0, 1)
into a single instruction and uses 4 instructions instead.
Original patch by @lockal with own modification:
Don't make changes outside of the kernel. They don't make any difference
anyway and term saturate() has a bit different meaning outside of kernel.
This gives around 2% of speedup in Barcelona file, but in more complex shader
setups with lots of math nodes with clamping speedup could be much nicer.
Subscribers: dingto
Projects: #cycles
Differential Revision: https://developer.blender.org/D1224
This inconsistency drove me totally crazy, it's really confusing
when it's inconsistent especially when you work on both Cycles and
Blender sides.
Shouldn;t cause merge PITA, it's whitespace changes only, Git should
be able to merge it nicely.
OpenCL doesn't let you to get address of vector components, which
is kinda annoying. On the other hand, maybe now compiler will have
more chances to optimize something out.
This way Cycles finally becomes feature-full on image projections
compared to Blender Internal and Gooseberry Project Team could
finally finish the movie.
OpenCL apparently does not support templates, so the idea of generic
function for swapping is a bit of a failure. Now it is either inlined
into the code (in triangle intersection) or has specific implementation
for QBVH.
This is probably even better, because we can't create QBVH-specific
function in util_math anyway.
Most of them are not currently used but are essential for the further work.
- CPU kernels with SSE2 support will now have sse3b, sse3f and sse3i
- Added templatedversions of min4, max4 which are handy to use with register
variables.
- Added util_swap function which gets arguments by pointers.
So hopefully it'll be a portable version of std::swap.
Using this paper: Sven Woop, Watertight Ray/Triangle Intersection
http://jcgt.org/published/0002/01/05/paper.pdf
This change is expected to address quite reasonable amount of reports from the
bug tracker, plus it might help reducing the noise in some scenes.
Unfortunately, it's currently about 7% slower than the previous solution with
pre-computed triangle plane equations, but maybe with some smart tweaks to the
code (tests reshuffle, using SIMD in a nice way or so) we can avoid the speed
regression.
But perhaps smartest thing to do here would be to change single triangle / ray
intersection with multiple triangles / ray intersections. That's how Embree does
this and it's watertight single ray intersection is not any faster that this.
Currently only triangle intersection is modified accordingly to the paper, in
the future we would also want to modify the node / ray intersection.
Reviewers: brecht, juicyfruit
Subscribers: dingto, ton
Differential Revision: https://developer.blender.org/D819
This basically records all volumes steps, which can then later be used multiple
time to take scattering samples, without having to step through the volume
again. From the paper:
"Importance Sampling Techniques for Path Tracing in Participating Media"
This works only on the CPU, due to usage of malloc/free.
This is mostly work towards enabling the __KERNEL_SSE__ option to start using
SIMD operations for vector math operations. This 4.1 kernel performes about 8%
faster with that option but overall is still slower than without the option.
WITH_CYCLES_OPTIMIZED_KERNEL_SSE41 is the cmake flag for testing this kernel.
Alignment of int3, int4, float3, float4 to 16 bytes seems to give a slight 1-2%
speedup on tested systems with the current kernel already, so is enabled now.
This to avoids build conflicts with libc++ on FreeBSD, these __ prefixed values
are reserved for compilers. I apologize to anyone who has patches or branches
and has to go through the pain of merging this change, it may be easiest to do
these same replacements in your code and then apply/merge the patch.
Ref T37477.
New features:
* Bump mapping now works with SSS
* Texture Blur factor for SSS, see the documentation for details:
http://wiki.blender.org/index.php/Doc:2.6/Manual/Render/Cycles/Nodes/Shaders#Subsurface_Scattering
Work in progress for feedback:
Initial implementation of the "BSSRDF Importance Sampling" paper, which uses
a different importance sampling method. It gives better quality results in
many ways, with the availability of both Cubic and Gaussian falloff functions,
but also tends to be more noisy when using the progressive integrator and does
not give great results with some geometry. It works quite well for the
non-progressive integrator and is often less noisy there.
This code may still change a lot, so unless you're testing it may be best to
stick to the Compatible falloff function.
Skin test render and file that takes advantage of the gaussian falloff:
http://www.pasteall.org/pic/show.php?id=57661http://www.pasteall.org/pic/show.php?id=57662http://www.pasteall.org/blend/23501
RGB color components gave non-grey results when you might no expect it.
What happens is that some of the color channels are zero in the direct light
pass because their channel is zero in the color pass. The direct light pass is
defined as lighting divided by the color pass, and we can't divide by zero. We
do a division after all samples are added together to ensure that multiplication
in the compositor gives the exact combined pass even with antialiasing, DoF, ..
Found a simple tweak here, instead of setting such channels to zero it will set
it to the average of other non-zero color channels, which makes the results look
like the expected grey.
* Revert r57203 (len() renaming)
There seems to be a problem with nVidia OpenCL after this and I haven't figured out the real cause yet.
Better to selectively enable native length() later, after figuring out what's wrong.
This fixes [#35612].