Seems CUDA failed to de-duplicate the array across multiple inlined
versions of the shadow_blocked(). Helped it a bit with that now.
Gives about 100MB memory improvement on a scenes after previous
commit and brings up memory "regression" to only 100MB comparing to
the master branch now.
This commit enables record-all behavior of transparent shadows
rays.
Render times difference goes as following:
GTX 1080 render time
BMW -0.5%
Fishy Cat -0.0%
Pabellon Barcelona -11.6%
Classroom +1.2%
Koro -58.6%
Kernel will now use some extra VRAM memory to store the intersection
array (200MB on my configuration). This we can optimize out with some
further commits.
The idea is to record all possible transparent intersections when
shooting transparent ray on GPU (similar to what we were doing on
CPU already).
This avoids need of doing whole ray-to-scene intersections queries
for each intersection and speeds up a lot cases like transparent
hair in the cost of extra memory.
This commit is a base ground for now and this feature is kept
disabled for until some further tweaks.
Now we break the traversal cycle and then perform volume attenuation
and check with zero throughput. Not sure it makes any measurable sense
at this moment, but in the future it might help de-duplicating some
extra logic here.
Removed unnecessary call to DM_update_tessface_data(). This call is
already performed by DM_ensure_tessface(dm). The call being performed
twice caused a failing BLI_assert().
Reviewed by: Kévin Dietrich
With the new names the arguments (yup, zup) are in the same order as
they appear in the function name. The old names used copy_src_dst(dst,
src), which I found very confusing. Furthermore, now it is clear from
where to where the copy is made.
This makes the function names a little bit longer, though. If that is
a real issue, we can just name them zup_from_yup(zup, yup).
Reviewed by: Kévin Dietrich
Speedup is mainly gained by multi-threading. Gives about 3x
fps gain on an edit shot file.
There is still some room for improvements, will happen in one
of the upcoming commits.
Derived mesh for particles did not include tessellated faces when it
was expected to. Now added explicit function to copy CDDM with tess
faces without need to re-tessellate the result.
It is not necessary to add MOTO library dependency when we use
WITH_IK_SOLVER (now it uses Eigen) or we use WITH_MOD_BOOLEAN (it was
used by bsp intern library some time ago but it is not present in the
code anymore).
Reviewers: mont29, sergey
Subscribers: mont29, sergey
Differential Revision: https://developer.blender.org/D2477
No reason to not make this private to this file, and it gave conflict
when using bpy as module and loading it in a GLib application (which
also has a g_atexit var).
The title says it all actually. Use BLI task to loop over vertices
and distort their locations. Gives 2x FPS increase in a file with
just time-dependent displace modifier on my desktop.
This version will give less spin locks and now well-tested by render engines.
This should reduce amount of threading overhead when having multiple objects
with displace modifier enabled.
In the future this will also help us threading the modifier.
There are more modifiers which could benefit from this, but let's first
investigate the new behavior with one of them.
We (the Microsoft C++ team) use the Blender project as part of our "Real world code" tests.
I noticed a place in WIN32 specific code (dvpapi.cpp:85) where a string literal is losing
its const-ness when being passed to BLI_dynlib_open(). This is not permitted when using the
/permissive- conformance compiler switch (see our blog
https://blogs.msdn.microsoft.com/vcblog/2016/11/16/permissive-switch/)
My suggested fix is to add const and propagate it where needed. Another possible fix would be
to explicitly cast away the const.
Reviewers: mont29, sergey, LazyDodo
Subscribers: Blendify, sergey, mont29, LazyDodo
Tags: #platform:_windows
Differential Revision: https://developer.blender.org/D2495
Instead of reference the vertex first and test the bitmap afterwards. Test the bitmap first and reference the vertex after.
In a mesh with 31146 vertices and the entire bitmap disabled, the loop time is 243% faster
With all bitmap enabled, the time becomes 463473% faster!!!
One possible reason for this huge difference in peformance is that maybe the compiler is not putting the function "BM_vert_at_index" inline (I dont know if buildbot do this, but it's good to investigate).
Looks like `object_map` and `mem_arena` may be NULL sometimes...
Also, cleaned up function pointers declaration of Nearest2dUserData,
those were warning out in gcc. Please, *always* use typdef defined
prototypes for function pointers, it is sooooo much cleaner and clearer
that way. And easy to convert from compatible functions too.
BKE_lamp_free was somehow missing the refactor of datablocks handling
(which, among other things, completely separated ID refcounting and
linking management from ID freeing itself).
Either forgot during development, or lost during merge...
The code looks for the closest element between its centers. In the case of islands, the center of each vertex is the center of the island.
The solution here is to skip the search for islands when the operation is translation