The issue here was mainly coming from minimal pixel width feature
which is quite commonly enabled in production shots.
This feature will use some probabilistic heuristic in the curve
intersection function to check whether we need to return intersection
or not. This probability is calculated for every intersection check.
Now, when we use multiple BVH nodes for curve primitives we increase
probability of that primitive to be considered a good intersection
for us. This is similar to increasing minimal width of curve.
What is worst here is that change in the intersection probability
fully depends on exact layout of BVH, meaning probability might
change differently depending on a view angle, the way how builder
binned the primitives and such. This makes it impossible to do
simple check like dividing probability by number of BVH steps.
Other solution might have been to split BVH into fully independent
trees, but that will increase memory usage of all the static
objects in the scenes, which is also not something desirable.
For now used most simple but robust approach: store BVH primitives
time and test it in curve intersection functions. This solves the
regression, but has two downsides:
- Uses more memory.
which isn't surprising, and ANY solution to this problem will
use more memory.
What we still have to do is to avoid this memory increase for
cases when we don't use BVH motion steps.
- Reduces number of maximum available textures on pre-kepler cards.
There is not much we can do here, hardware gets old but we need
to move forward on more modern hardware..
The change was delivering broken topology for certain cases.
The assumption that new edge only connects new vertices was
wrong.
Reverting to a commit which was giving correct render results
but was using more memory.
This reverts commit af1e48e8ab.
The bug T46099 no longer applies since the addition of `dist_squared_to_projected_aabb_simple`
Has also been added comments that relates to an occlusion bug with the ruler. I'll investigate this.
When importing an Alembic file with grouped transforms, it would badly name the transforms, taking the name of the parent instead of its own.
Patch by @maxime.robinot
Differential Revision: https://developer.blender.org/D2507
Quite simple fix for now which only deals with this case. Maybe we want to do
some "clipping" on image load time so regular textures wouldn't give NaN as
well.
This allows us to use faster math and still have reliable
isnan/isfinite tests.
Only do it for host side, kernels stays unchanged.
Thanks Lukas Stockner for the tip!
The issue was caused by usage of non-initialized image user, which
could have different settings, causing some random image being loaded
or not loaded at all.
This caused non-deterministic behavior of Cycles image loading because
it was querying image information from several places.
This fixes crash reported in T50616, but it's not a complete fix
because preview rendering in material is wrong (same wrong as in
2.78a release).
We now assert that we now file version of libraries (needed for
do_version after linking step), so for missing libraries, set dummy
numbers (using version of main .blend file actually).
Works similar to regular Cycles tests, just does OpenGL render to
get output image.
Seems to work fine with the only funny effect: Blender window will
pop up for each of the tests. This is current limitation of our
OpenGL context. Might be changed in the future.
Seems CUDA failed to de-duplicate the array across multiple inlined
versions of the shadow_blocked(). Helped it a bit with that now.
Gives about 100MB memory improvement on a scenes after previous
commit and brings up memory "regression" to only 100MB comparing to
the master branch now.
This commit enables record-all behavior of transparent shadows
rays.
Render times difference goes as following:
GTX 1080 render time
BMW -0.5%
Fishy Cat -0.0%
Pabellon Barcelona -11.6%
Classroom +1.2%
Koro -58.6%
Kernel will now use some extra VRAM memory to store the intersection
array (200MB on my configuration). This we can optimize out with some
further commits.
The idea is to record all possible transparent intersections when
shooting transparent ray on GPU (similar to what we were doing on
CPU already).
This avoids need of doing whole ray-to-scene intersections queries
for each intersection and speeds up a lot cases like transparent
hair in the cost of extra memory.
This commit is a base ground for now and this feature is kept
disabled for until some further tweaks.
Now we break the traversal cycle and then perform volume attenuation
and check with zero throughput. Not sure it makes any measurable sense
at this moment, but in the future it might help de-duplicating some
extra logic here.