Commit Graph

66651 Commits

Author SHA1 Message Date
Sergey Sharybin
da31a82832 Cycles: Solve speed regression by casting opaque ray first 2017-02-08 14:00:48 +01:00
Sergey Sharybin
04cf1538b5 Cycles: Fix compilation error on OpenCL 2017-02-08 14:00:48 +01:00
Sergey Sharybin
31a025f51e Cycles: Split shadow functions to avoid some duplicated calculations 2017-02-08 14:00:48 +01:00
Sergey Sharybin
dde40989f3 Cycles: Store shadow intersections in the kernel globals
Seems CUDA failed to de-duplicate the array across multiple inlined
versions of the shadow_blocked(). Helped it a bit with that now.

Gives about 100MB memory improvement on a scenes after previous
commit and brings up memory "regression" to only 100MB comparing to
the master branch now.
2017-02-08 14:00:48 +01:00
Sergey Sharybin
7447950bc3 Cycles: Speedup transparent shadows on CUDA
This commit enables record-all behavior of transparent shadows
rays.

Render times difference goes as following:

               GTX 1080 render time
BMW                  -0.5%
Fishy Cat            -0.0%
Pabellon Barcelona   -11.6%
Classroom            +1.2%
Koro                 -58.6%

Kernel will now use some extra VRAM memory to store the intersection
array (200MB on my configuration). This we can optimize out with some
further commits.
2017-02-08 14:00:48 +01:00
Sergey Sharybin
9830eeb44b Cycles: Implement record-all transparent shadow function for GPU
The idea is to record all possible transparent intersections when
shooting transparent ray on GPU (similar to what we were doing  on
CPU already).

This avoids need of doing whole ray-to-scene intersections queries
for each intersection and speeds up a lot cases like transparent
hair in the cost of extra memory.

This commit is a base ground for now and this feature is kept
disabled for until some further tweaks.
2017-02-08 14:00:48 +01:00
Sergey Sharybin
9c3d202e56 Cycles: Use an utility function to sort intersections array 2017-02-08 14:00:48 +01:00
Sergey Sharybin
58a10122d0 Cycles: Make GPU version of shadow_blocked() closer to CPU
Now we break the traversal cycle and then perform volume attenuation
and check with zero throughput. Not sure it makes any measurable sense
at this moment, but in the future it might help de-duplicating some
extra logic here.
2017-02-08 14:00:48 +01:00
Sergey Sharybin
98a1855803 Cycles: De-duplicate transparent shadows attenuation
Fair amount of code was duplicated for CPU and GPU, now we are
using inlined function to avoid such duplication.
2017-02-08 14:00:48 +01:00
Sybren A. Stüvel
8cda364d6f Fix T49249: Alembic export with multiple hair systems crash blender
Removed unnecessary call to DM_update_tessface_data(). This call is
already performed by DM_ensure_tessface(dm). The call being performed
twice caused a failing BLI_assert().

Reviewed by: Kévin Dietrich
2017-02-08 12:26:36 +01:00
Sybren A. Stüvel
ac38d5652b Alembic export: avoid infinite loops trying to find parent objects.
Also added some assertions for debugging purposes

Reviewed by: Kévin Dietrich
2017-02-08 12:26:36 +01:00
Sybren A. Stüvel
95e7f93fa2 Alembic export: only create transform writer if the object should be exported
Reviewed by: Kévin Dietrich
2017-02-08 12:26:36 +01:00
Sybren A. Stüvel
b320873382 Alembic: #undef'ed the correct macro
TEST_RET is not defined anywhere in Blender's sources, and LAYER_CMP
is no longer used after this function ends.
2017-02-08 12:26:36 +01:00
Sybren A. Stüvel
ce9df09067 Alembic: Use getXForm() in check, because it's used in rest of the function too
This makes the code within the function consistent.
2017-02-08 12:26:36 +01:00
Sybren A. Stüvel
82df7100c8 Alembic: Renamed copy_zup_yup to copy_yup_from_zup (and same for zup_from_yup)
With the new names the arguments (yup, zup) are in the same order as
they appear in the function name. The old names used copy_src_dst(dst,
src), which I found very confusing. Furthermore, now it is clear from
where to where the copy is made.

This makes the function names a little bit longer, though. If that is
a real issue, we can just name them zup_from_yup(zup, yup).

Reviewed by: Kévin Dietrich
2017-02-08 12:26:36 +01:00
Sergey Sharybin
69dbeeca48 Cleanup: Use const qualifier in some of color management code 2017-02-07 17:49:54 +01:00
Sergey Sharybin
b641d016e1 Sequencer: Some extra speedup in color space conversion
Use the new utility from coloranagement which multi-threads byte to
float conversion.

Gives extra 10% speedup from quick tests.
2017-02-07 17:49:54 +01:00
Sergey Sharybin
ce629c5dd9 Color management: Add utility function to convert byte to float with processor applied 2017-02-07 17:49:54 +01:00
Sergey Sharybin
e5bb005369 Sequencer: Speedup conversion to sequencer space
Speedup is mainly gained by multi-threading. Gives about 3x
fps gain on an edit shot file.

There is still some room for improvements, will happen in one
of the upcoming commits.
2017-02-07 17:49:54 +01:00
Sergey Sharybin
5d6177111d Color management: Implement threaded byte buffer conversion
The title says it all actually: now we can convert byte buffer
directly, without need of temporary float buffer.
2017-02-07 17:49:54 +01:00
Germano Cavalcante
03be3102c7 Param is_cached not being used in bvhtree_from_mesh_edges_setup_data
This could cause bugs in the memory release
2017-02-07 11:03:10 -03:00
Sergey Sharybin
03544eccb4 Fix missing hair after rendering with different viewport/render settings
Derived mesh for particles did not include tessellated faces when it
was expected to. Now added explicit function to copy CDDM with tess
faces without need to re-tessellate the result.
2017-02-07 14:21:29 +01:00
Sergey Sharybin
53896d4235 Fix T49253: Cycles blackbody is wrong on AVX2 CPU on Windows
Seems to be bug in optimizer, but managed to reshuffle in a way
which should also give some speedup.
2017-02-07 13:05:19 +01:00
Bastien Montagne
1158800d1b PIL_time_utildefines: also show total time in TIMEIT_AVERAGED. 2017-02-07 10:14:46 +01:00
Bastien Montagne
f7eaaf35b4 Fix (unreported) Object previews being written even for skipped objects. 2017-02-06 20:58:18 +01:00
Bastien Montagne
e217839fd3 Cleanup writefile code a bit.
Modernize some of it a bit, saves quite some lines of blabla (using
shile instead of for loops... tsssts...).
2017-02-06 20:43:14 +01:00
Jorge Bernal
dbdc346e9f CMake: Remove MOTO library dependency when it is not needed
It is not necessary to add MOTO library dependency when we use
WITH_IK_SOLVER (now it uses Eigen) or we use WITH_MOD_BOOLEAN (it was
used by bsp intern library some time ago but it is not present in the
code anymore).

Reviewers: mont29, sergey

Subscribers: mont29, sergey

Differential Revision: https://developer.blender.org/D2477
2017-02-06 19:29:42 +01:00
Germano Cavalcante
0170c682fe Specify the correct size of the BVHTree of edges
~edge_num~ edges_num_active
Not always all the edges enter in the build
2017-02-06 14:59:31 -03:00
Germano Cavalcante
e3f99329d8 Standardization and style for BKE_bvhutils
Add `bvhtree_from_mesh_edges_ex` and callbacks to nearest_to_ray (Similar to the other functions of this code)
2017-02-06 14:11:06 -03:00
Bastien Montagne
ac8348d033 Fix 'public' global 'g_atexit' var in Blender.
No reason to not make this private to this file, and it gave conflict
when using bpy as module and loading it in a GLib application (which
also has a g_atexit var).
2017-02-06 17:42:30 +01:00
Sergey Sharybin
9e97b00873 Fix compilation error after recent change 2017-02-06 15:29:13 +01:00
Sergey Sharybin
c7f40caa2c Add shortcuts for unsigned int, short, long and char
Feel free to use those in the new code.

And stay away from simple "unsigned".
2017-02-06 15:04:13 +01:00
Sergey Sharybin
c5cc9e046d Use hash instead of linear lookup in armature deform
This avoids calling linear lookup 100s of time when dealing with
real-life character.

Still some tweaks possible.
2017-02-06 14:47:36 +01:00
Sergey Sharybin
d0015cba02 Multi-thread displace modifier
The title says it all actually. Use BLI task to loop over vertices
and distort their locations. Gives 2x FPS increase in a file with
just time-dependent displace modifier on my desktop.
2017-02-06 14:21:29 +01:00
Sergey Sharybin
89f3837d68 Displace modifier: Use special version of texture sampling
This version will give less spin locks and now well-tested by render engines.

This should reduce amount of threading overhead when having multiple objects
with displace modifier enabled.

In the future this will also help us threading the modifier.

There are more modifiers which could benefit from this, but let's first
investigate the new behavior with one of them.
2017-02-06 12:37:08 +01:00
Sergey Sharybin
385fe4f0ce Add special texture sampling function which takes image pool argument
Using image pool will reduce number of thread locks when acquiring image.
Useful when it's needed to sample texture fewzillion times a second.
2017-02-06 12:23:03 +01:00
Sergey Sharybin
223aff987a Fix memory leak when building without audaspace 2017-02-06 11:18:20 +01:00
Phil Christensen
351c409317 C++ conformance fixes (MSVC /permissive-)
We (the Microsoft C++ team) use the Blender project as part of our "Real world code" tests.
I noticed a place in WIN32 specific code (dvpapi.cpp:85) where a string literal is losing
its const-ness when being passed to BLI_dynlib_open().  This is not permitted when using the
/permissive- conformance compiler switch (see our blog
https://blogs.msdn.microsoft.com/vcblog/2016/11/16/permissive-switch/)

My suggested fix is to add const and propagate it where needed.  Another possible fix would be
to explicitly cast away the const.

Reviewers: mont29, sergey, LazyDodo

Subscribers: Blendify, sergey, mont29, LazyDodo

Tags: #platform:_windows

Differential Revision: https://developer.blender.org/D2495
2017-02-06 10:44:56 +01:00
Germano Cavalcante
22156d951d fix T50602: Avoid crash when executing transform_snap_context_project_view3d_mixed with dist_px NULL 2017-02-06 01:01:39 -03:00
Germano Cavalcante
da08aa4b96 Cleaning of the last commit: lack of attention with the debug of time X(
This was a stupid mistake
2017-02-04 19:06:41 -03:00
Germano Cavalcante
75aa866211 Optimize BVHTree creation of vertices that have BLI_bitmap test
Instead of reference the vertex first and test the bitmap afterwards. Test the bitmap first and reference the vertex after.

In a mesh with 31146 vertices and the entire bitmap disabled, the loop time is 243% faster
With all bitmap enabled, the time becomes 463473% faster!!!

One possible reason for this huge difference in peformance is that maybe the compiler is not putting the function "BM_vert_at_index" inline (I dont know if buildbot do this, but it's good to investigate).
2017-02-04 19:01:29 -03:00
Germano Cavalcante
47caf343c0 fix T50592: Scene.raycast not working
Ray_start and ray_normal values were being ignored
2017-02-04 18:17:15 -03:00
Bastien Montagne
a2c469edc2 Fix (unreported) crash in new snap code.
Looks like `object_map` and `mem_arena` may be NULL sometimes...

Also, cleaned up function pointers declaration of Nearest2dUserData,
those were warning out in gcc. Please, *always* use typdef defined
prototypes for function pointers, it is sooooo much cleaner and clearer
that way. And easy to convert from compatible functions too.
2017-02-04 21:51:27 +01:00
Bastien Montagne
6663099810 Fix T50590: BI lamp doesn't hold a texture in this case.
BKE_lamp_free was somehow missing the refactor of datablocks handling
(which, among other things, completely separated ID refcounting and
linking management from ID freeing itself).

Either forgot during development, or lost during merge...
2017-02-04 21:31:52 +01:00
Germano Cavalcante
c367e23d46 Snap System: Use callbaks to differentiate how referenced vertives of DerivedMeshs and Bmeshs
Before it was informed the type of object in the `userdata`, and a same function ran between the types to obtain the coordinates of the vertices
2017-02-03 20:08:57 -03:00
Germano Cavalcante
a0561a05ef Remove flag: SNAP_OBJECT_USE_CACHE from snap_context
Since the cache is created in one way or another, this flag is not really making a difference
More details here: D2496
2017-02-03 19:03:31 -03:00
Germano Cavalcante
21f3767809 fix T46892: snap to closest point now works with Individual Origins
The code looks for the closest element between its centers. In the case of islands, the center of each vertex is the center of the island.
The solution here is to skip the search for islands when the operation is translation
2017-02-03 13:15:44 -03:00
Germano Cavalcante
0b4a9caf51 Forgotten in committee ddf99214dc
In obect mode, the rotation matrix need to be restored to the initial value if a snap point is not found
2017-02-03 12:57:02 -03:00
Sergey Sharybin
0e459ad1a3 Buildbot: Re-enable cuda support for OSX 2017-02-03 16:11:05 +01:00
Bastien Montagne
52696a0d3f Fix T50125: Shortcut keys missing in menus for Clear Location, Rotation, and Scale.
Menu entries and shortcuts did not have exact same behavior, now they do
(using shortcuts' behavior).
2017-02-03 16:10:00 +01:00