there seems to be some sort of compiler bug in CUDA toolkit 4.2, uninlining a few functions seems to avoid it.