This replaces sequential ray moving followed with scene intersection with
single BVH traversal, which gives us all possible intersections.
Only implemented for CPU, due to qsort and a bigger memory usage on GPU
which we rather avoid. GPU still uses the regular bvh volume intersection code, while CPU now uses the new code.
This improves render performance for scenes with:
a) Camera inside volume mesh
b) SSS mesh intersecting a volume mesh/domain
In simple volume files (not much geometry) performance is roughly the same
(slightly faster). In files with a lot of geometry, the performance
increase is larger. bmps.blend with a volume shader and camera inside the
mesh, it renders ~10% faster here.
Patch by Sergey and myself.
Differential Revision: https://developer.blender.org/D1264
This patch adds support for light portals: objects that help sampling the
environment light, therefore improving convergence. Using them tor other
lights in a unidirectional pathtracer is virtually useless.
The sampling is done with the area-preserving code already used for area lamps.
MIS is used both for combination of different portals and for combining portal-
and envmap-sampling.
The direction of portals is considered, they aren't used if the sampling point
is behind them.
Reviewers: sergey, dingto, #cycles
Reviewed By: dingto, #cycles
Subscribers: Lapineige, nutel, jtheninja, dsisco11, januz, vitorbalbio, candreacchio, TARDISMaker, lichtwerk, ace_dragon, marcog, mib2berlin, Tunge, lopataasdf, lordodin, sergey, dingto
Differential Revision: https://developer.blender.org/D1133
This more a workaround for CUDA optimizer which can't optimize clamp(x, 0, 1)
into a single instruction and uses 4 instructions instead.
Original patch by @lockal with own modification:
Don't make changes outside of the kernel. They don't make any difference
anyway and term saturate() has a bit different meaning outside of kernel.
This gives around 2% of speedup in Barcelona file, but in more complex shader
setups with lots of math nodes with clamping speedup could be much nicer.
Subscribers: dingto
Projects: #cycles
Differential Revision: https://developer.blender.org/D1224
This way we can get rid of inefficient memory usage caused by BVH boundbox
part being unused by leaf nodes but still being allocated for them. Doing
such split allows to save 6 of float4 values for QBVH per leaf node and 3
of float4 values for regular BVH per leaf node.
This translates into following memory save using 01.01.01.G rendered
without hair:
Device memory size Device memory peak Global memory peak
Before the patch: 4957 5051 7668
With the patch: 4467 4562 7332
The measurements are done against current master. Still need to run speed tests
and it's hard to predict if it's faster or not: on the one hand leaf nodes are
now much more coherent in cache, on the other hand they're not so much coherent
with regular nodes anymore.
Reviewers: brecht, juicyfruit
Subscribers: venomgfx, eyecandy
Differential Revision: https://developer.blender.org/D1236
Issue was caused by MSVC not being able to optimize some code out in the same
way as GCC/Clang does, so now that parts of code are explicitly unfolded in
order to help compilers out.
This makes speed loss much less drastic on my laptop. That's probably as good
as we can do with MSVC without investing infinite amount of time looking trying
to workaround the optimizer.
Originally we thought it's needed in order to distinguish builtin file from
filename which starts with '@', but the filepath is actually full path there
and it's unlikely to have file system where '@' is a proper root character.
Surprisingly this does not give visible speed differences, but it's still
nice to get rid of redundant check.
Our own implementation is in fact the same performance as in fast_math from
OpenShadingLanguage, but implementation from fast_math is using explicit madd
function, which increases chance of compiler deciding to use intrinsics.
This patch is based on some work done in D788 and re-formulation from Beckmann
implementation in OpenShadingLanguage.
Skipping texture lookup helps a lot on GPUs where it's more expensive to access
texture memory than to do some extra calculation in threads.
CPU code still uses lookup-table based approach since this seems to be still
faster (at least on computers i've got access to).
This change gives about 2% speedup on BMW scene with GTX560TI.
It did not preserve stratification too well and lookup-table approach was
working much better. There are now also some more interesting forumlation
from Wenzel and OpenShadingLanguage which should work better than old code.
This argument was unused and got nicely optimized out. But once it
starts to be using registers are getting stressed really crazy,
causing slow down of render.
It's not that bad because this typo could only caused not really
efficient BVH traversal, causing higher render times. Not as if
it was causing render artifacts.
This inconsistency drove me totally crazy, it's really confusing
when it's inconsistent especially when you work on both Cycles and
Blender sides.
Shouldn;t cause merge PITA, it's whitespace changes only, Git should
be able to merge it nicely.
Issue was caused by accident in c8a9a56 which not only disabled glossy
reflection if Glossy visibility is disabled, but also Diffuse reflection.
Quite safe and should go to final release branch.
Issue was introduced in 01ee21f where i didn't notice *_setup()
function only doing partial initialization, and some of parameters
are expected to be initialized by callee function.
This was hitting only some setups, so tests with benchmark scenes
didn't unleash issues. Now it should all be fine.
This is to go to the 2.74 branch and we actually might re-AHOY.
Fix T44007: Cycles Volumetrics: block artifacts with overlapping volumes
The issue was caused by uninitialized parameters of some closures, which
lead to unpredictable behavior of shader_merge_closures().
The fix was really flacky, in terms during speed benchmarks i had
abort() in the fallback block to be sure it never runs in production
scenes, but that affected on the optimization as well. Without this
abort there's quite bad slowdown of 5-7% on the renders even tho
the Pleucker fallback was never run.
This is all weird and for now reverting the change which affects on
all the production scenes and will look into alternative fixes for
the original issue with precision loss on huge planes.
This reverts commit 9489205c5c.
This merges consecutive empty steps in the decoupled record function,
which can lead to fewer iterations in the scatter functions.
Only helps slightly though (1%), but doesn't hurt to have this.
Differential Revision: https://developer.blender.org/D873
The issue was caused by numerical instability whrn having ray origin close to a huge
triangle, which could have aused bad ray distance check.
Watertight Woop intersection isn't really addressing such cases, it's dealing with
small triangles far away from the ray origin instead, so it's a bit tricky yo make
it working reliably.
While we're quite close to the release it's safer to do check in Pleaucker coordinates
if ray close to a huge triangle. Likely this additional check combined with some other
tweaks to the code doesn't cause measurable slowdown in the scenes tested here.
After the release we can play a bit more with this code in order to make it more
stable without Pleucker fallback.
Seems it's just another issue with the compiler, worked around by explicitly
telling not to inline some function.
In theory we can unify this with CPU, but we're quite close to the release
so better be safe than sorry.
The following commits were supposed to add anti-alias and help with OSL
baking:
7b16fda3791b92dfa961
However they introduced other issues (artifacts mostly), see T43550 .
Leaving the code ifdef'ed for now.