This is only a (hacky) partial fix, actually, since `RNA_property_animated()` will still
not work in those cases... Better that than nothing, though.
Thanks to Campbell for review.
Pretty much straightforward change which gives around 30%
speedup on my laptop and around 2x speedup on desktop in
the BI (which uses gts580). Tested with huge blurs (like
10% of blur) which was rather common during Caminandes.
For now OpenCL is only limited for blur size more than
100 pixels.
This is a bit experimental still, feedback is welcome.
Reviewers: jbakker, lukastoenne
Subscribers: ton
Differential Revision: https://developer.blender.org/D576
Ray actually should have infinite length, so we can detect camera in a volume
which is bigger that the far clipping of the camera.
This might also give some speedup (wouldn't expect much tho) because we don't
need to re-calculate ray direction and length after every bounce now.
basically we skip all non-volume objects now in the volume stack function.
Depending on the show it might give some percent of speedup.
Most of the speedup would be gained in the scenes when having SSS object
intersecting the volume and taking a reasonable amount of frame space.
Single precision exponent on 64bit linux tends to be order of magnitude slower
than double precision version even with single<->double precision conversion.
Some feedback in the mailing lists also suggests that logf() is also slow, but
this i didn't confirm here in the studio yet.
Depending on the shader setup it gives ~3% with the secret agent shot and up to
around 15% with the bmw scene here.
Quite straightforward change, the only annoying thing is that we can't use
indentation for include directive just because of the way headers inlineing
works for OpenCL.
Might do smarter job in path_source_replace_includes() but don't want to
spend time on this yet.
* On sm_30 and above there is no change (was not inlined already before), this just fixes a speed regression from yesterday. 6359c36ba4
* On sm_2x (tested with sm_21), I get a nice 8% speedup in the bmw scene with this. As a bonus, cubin compilation time and memory usage is significantly reduced. Regular cubin size went from 2.5MB to 2.0MB, Experimental one from 3.8MB to 2.5MB.
Currently only summed number of traversal steps and intersections used by the
camera ray intersection pass is implemented, but in the future we will support
more debug passes which would help checking what things makes the scene slow.
Example of such extra passes could be number of bounces, time spent on the
shader tree evaluation and so.
Implementation from the Cycles side is pretty much straightforward, could only
mention here that it's a build-time option disabled by default.
From the blender side it's implemented as a PASS_DEBUG with several subtypes
possible. This way we don't need to create an extra DNA pass type for each of
the debug passes, saving us a bits.
Reviewers: campbellbarton
Reviewed By: campbellbarton
Differential Revision: https://developer.blender.org/D813