Nice article. Small suggestion: the behavior of the default stream with respect to synchronization has changed throught different CUDA versions sync the article has been written (e.g. no more implicit sync with CUDA 7). It would be useful to add a small recap of the behavior according to the version.