This gives few percent extra memory saving for the CUDA kernel when
using regular path tracing.
Still more like an experiment, but will be handy in the future.
Gives few percent of memory improvement for regular feature set kernel
and could give significant memory improvement for Experimental kernel.
It could also give some degree of performance improvement, but this I
didn't really measure reliably yet.
Code is ifdef-ed for now, since it's only working on Linux and requires
CUDA toolkit to be installed (other platform only use precompiled
kernels).
This is just an experiment for now and a base for the proper feature
support in the future (with runtime compilation using CUDA 7?).
This just replaces internal argument `experimental` with `requested_features`
making it possible to access particular requested settings when building
kernels.
Currently only image loading benefits of this and will give magenta color
when image manager detects it's running out of memory.
This isn't ideal solution and can't handle all cases. For example, OOM
killer might kill process before it realized it run out of memory, but
in other cases this could prevent some crashes.
Reviewers: juicyfruit, dingto
Differential Revision: https://developer.blender.org/D1502
Previous fix didn't work well enough because on Windows Python has different
environment than Blender ans setting variables in there made no effect from
Blender point of view.
Currently only two mappings are supported by API, which is Repeat (old behavior)
and new Clip behavior. Internally this extension is being converted to periodic
flag which was already supported but wasn't exposed.
There's no support for OpenCL yet because of the way how we pack images into a
single texture.
Those settings are not exposed to UI or anywhere else and there should be no
functional changes so far.
This means render devices now might skip building baking kernels in cases when
only actual render-related functionality is used.
For now it's only implemented for OpenCL split kernel device and mainly needed
to work around some compiler-specific bugs which crashes on building the kernel.
Using OpenCL for baking might still crash the driver, but at least there is now
higher probability of that GPU will be usable to render the scene.
Real fix should actually be done in the driver side.
The idea is to make all kernels as small as possible to work around possible
issues with buggy drivers which might fail building feature-complete kernels.
It's indeed just a workaround to make at last simple test scenes to render
on OpenCL. Real fix should happen from the driver side.
Requires having latest El Capitan beta 3 OSX due to ome crucial fixes made in the
compiler. Supports same features as NVidia OpenCL apart from CMJ (there's no
experimental feature set support in megakernel yet).
Uses megakernel internally, which works much better than the split kernel. Split
kernel is not supported on OSX still, needs to be investigated still.
Some more details can be found there:
http://wiki.blender.org/index.php/Dev:2.6/Source/Render/Cycles/OpenCL#AMD_on_OSX
This is not really supported by OpenCL but might happen in certain
configurations. There might be some remained cases when this happens
but so far can not find any,
it is the same issue as described in the previous commit, original changes
in this area were wrong and only worked on a bugger optimus driver which
simply appeared to work by co-incident and in fact used wrong device..
It was annoying copy-paste happened across OpenCL device constructor, device
enumeration and split kernel checks. Now those areas are using an utility
function which returns pairs of platform and device IDs for devices which are
supported by Cycles and enumeration is happening inside that list.
This makes it so filtering is happening in a single place, so there's no need
to keep 3 different functions in sync.
This commit also fixes a bug with wrong enumeration of devices caused by recent
fixes. Those fixes were in fact wrong and only happened to appear to be working
on laptop with optimus card on Linux. Root of those issues is in fact in bad
Linux driver for optimus cards.
They'll be checked for the version later and that check will fail anyway,
so better to not allow user to see unsupported device in the list.
Also corrected one more issue with the device enumeration.
Skipped devices did not reflect in the device number, which might result in bad
array indices.
This might also resolve T45037, and need to be ported to a release branch.
Round-up was only enabled for viewport render, which was for a long time hardcoded to
use 64 closures. This was done in order to avoid unnecessary kernel re-compilations
when tweaking the shader tree.
We could enable selective closure compilation in the viewport later if it'll give
measurable speed improvements, but even then round-up is to happen outside of the
device level,
This commit also removes early output which happened in cases when max closure did
not change. It was wrong because other requested kernel features might have been
changed.
This features are now based on the scene settings, so scenes without those features
used are rendered even faster.
This gives about 30% speedup on the AMD A10 APU here, but at the same time it does
not mean such an improvement will happen on all the hardware. That being said, the
Tonga device here seems to have no measurable difference.
In any case it seems handy to have for the future, when we'll want to support SSS
in the kernel or to port selective compilation/split kernel to CUDA devices.