This save a little memory and copying in the kernel by storing only a 4x3
matrix instead of a 4x4 matrix. We already did this in a few places, and
those don't need to be special exceptions anymore now.
This is in preparation of making Transform affine only, and also gives us
a little extra type safety so we don't accidentally treat it as a regular
4x4 matrix.
The purpose of the previous code refactoring is to make the code more readable,
but combined with this change benchmarks also render about 2-3% faster with an
NVIDIA Titan Xp.
This is an issue with which value to trust: fps vs. tbr. They both cam be
somewhat broken. Currently the idea is:
- If file was saved with FFmpeg AND we are decoding with FFmpeg we trust tbr.
- If we are decoding with Libav we use fps (there does not seem to be tbr in
Libav, unless i'm missing something).
- All other cases we use fps.
Seems to work all good for files from T53857, T54148 and T51153. Ideally we
would need to collect some amount of regression files to make further tweaks
more scientific.
Reviewers: mont29
Reviewed By: mont29
Differential Revision: https://developer.blender.org/D3083
Instead of creating a new instancing shading group without attrib, we now have instancing calls. The benefits is that they can be culled.
They can be used in conjuction with the standard and generate calls but shader must support it (which is generally not the case).
We store a pointer to the actual count so that the number can be tweaked between redraw.
This will makes multi layer rendering more efficient.
around the volume.
We generate a tight mesh around the active voxels of the volume in order
to effectively skip empty space, and start volume ray marching as close
to interesting volume data as possible. See code comments for details on
how the mesh generation algorithm works.
This gives up to 2x speedups in some scenes.
Reviewed by: brecht, dingto
Reviewers: #cycles
Subscribers: lvxejay, jtheninja, brecht
Differential Revision: https://developer.blender.org/D3038
This is more important now that we will have tigther volume bounds that
we hit multiple times. It also avoids some noise due to RR previously
affecting these surfaces, which shouldn't have been the case and should
eventually be fixed for transparent BSDFs as well.
For non-volume scenes I found no performance impact on NVIDIA or AMD.
For volume scenes the noise decrease and fixed artifacts are worth the
little extra render time, when there is any.
Now the only missing bit seems to be in Cycles to pass depsgraph to
builtin_image_float_pixels().
Ideally we could get evaluation context instead of using depsgraph + settings.
But for the other rna EvaluationContext functions this is how we are doing.
Reviewers: sergey, brecht
Differential Revision: https://developer.blender.org/D3087
This still does not make point density to work in Cycles, but at least it pass
the depsgraph down the line.
Note this was working fine before the depsgraph/render refactor to pass
evaluated depsgraph to the engines.
User notes
----------
Compositing, rendering of multi-layers in Eevee should be fully working now.
Development notes
-----------------
Up until now we were still using the same depsgraph for rendering and viewport
evaluation. And we had to go out of our ways to be sure the depsgraphs were
updated.
Now we iterate over the (to be rendered) view layers and create a depsgraph to
each one, fully evaluated and call the render engines (Cycles, Eevee, ...) with
this viewlayer/depsgraph/evaluation context.
At this time we are not handling data persistency, Depsgraph is created from
scratch prior to rendering each frame. So I got rid of most of the partial
update calls we had during the render pipeline.
Cycles: Brecht Van Lommel did a patch to tackle some of the required Cycles
changes but this commit mark these changes as TODOs. Basically Cycles needs to
render one layer at a time.
Reviewers: sergey, brecht
Differential Revision: https://developer.blender.org/D3073
Offscreen contexts are not attached to a window and can only be used for rendering to frambuffer objects.
CGL implementation : Brecht Van Lommel (brecht)
GLX implementation : Clément Foucault (fclem)
WGL implementation : Germano Cavalcante (mano-wii)
Other implementation are just place holder for now.
Turns out to be the call that was destroying performance.
I get 18ms->6ms improvement of drawing time with 10 000 unique objects.
And we can still improve upon this!