Single precision exponent on 64bit linux tends to be order of magnitude slower
than double precision version even with single<->double precision conversion.
Some feedback in the mailing lists also suggests that logf() is also slow, but
this i didn't confirm here in the studio yet.
Depending on the shader setup it gives ~3% with the secret agent shot and up to
around 15% with the bmw scene here.
Quite straightforward change, the only annoying thing is that we can't use
indentation for include directive just because of the way headers inlineing
works for OpenCL.
Might do smarter job in path_source_replace_includes() but don't want to
spend time on this yet.
* On sm_30 and above there is no change (was not inlined already before), this just fixes a speed regression from yesterday. 6359c36ba4
* On sm_2x (tested with sm_21), I get a nice 8% speedup in the bmw scene with this. As a bonus, cubin compilation time and memory usage is significantly reduced. Regular cubin size went from 2.5MB to 2.0MB, Experimental one from 3.8MB to 2.5MB.
Currently only summed number of traversal steps and intersections used by the
camera ray intersection pass is implemented, but in the future we will support
more debug passes which would help checking what things makes the scene slow.
Example of such extra passes could be number of bounces, time spent on the
shader tree evaluation and so.
Implementation from the Cycles side is pretty much straightforward, could only
mention here that it's a build-time option disabled by default.
From the blender side it's implemented as a PASS_DEBUG with several subtypes
possible. This way we don't need to create an extra DNA pass type for each of
the debug passes, saving us a bits.
Reviewers: campbellbarton
Reviewed By: campbellbarton
Differential Revision: https://developer.blender.org/D813
Was hooked up last year for testing purposes, as we already had some code for it, but the closure itself is not really good nor really useful, so let's remove it.
This way there's much less cross-references between objects and meshes
device update functions.
The only thing remained s the object bounds calculation which is needed
by bvh update. This could also be decoupled, but it's not that crucial
yet because its's how it used to be for ages now.