关于gpu:优雅的在NVIDIA-GPU上实现sleep

当咱们在测试或者其余状况下，兴许须要GPU执行完某步后sleep一会儿。这时咱们就能够通过cuda所提供的C编程接口clock64()这个函数来实现。这里摘录一段cuda手册中对clock64()函数的阐明：

when executed in device code, returns the value of a per-multiprocessor counter that is incremented every clock cycle. Sampling this counter at the beginning and at the end of a kernel, taking the difference of the two samples, and recording the result per thread provides a measure for each thread of the number of clock cycles taken by the device to completely execute the thread, but not of the number of clock cycles the device actually spent executing thread instructions. The former number is greater than the latter since threads are time sliced.

clock64()这个函数将返回线程所处的SM上的时钟周期数。如果在线程的开始和完结进行采样，并获取差值，将取得线程执行所破费的总时钟周期数，这将比线程理论运行的时钟周期数稍大，因为SM上多个线程之间是分工夫片执行的。

因而为了优雅的实现设施上的延时函数，咱们将在设施上调用clock64()这个函数，其函数原型为long long int clock64()，具体实现如下：

#define CLOCK_RATE 1695000  /* modify for different device */
__device__ void sleep(float t) {    
    clock_t t0 = clock64();
    clock_t t1 = t0;
    while ((t1 - t0)/(CLOCK_RATE*1000.0f) < t)
        t1 = clock64();
}

以上代码中的CLOCK_RATE可通过如下形式取得：

cudaDeviceProp  prop;
cudaGetDeviceProperties(&prop, 0); 
clock_t clock_rate = prop.clockRate;

此处取得的时钟频率单位为kilohertz，因而sleep函数中为取得以秒为单位的延时，须要采纳CLOCK_RATE*1000.0f这种形式。

残缺代码可见。

关于gpu:优雅的在NVIDIA-GPU上实现sleep

评论

发表回复取消回复

更多文章

苹果iOS打包的ipa应用无法安装？一篇文章带你了解可能的原因及排查方法

图解Golang：从零开始实现简易版过期LRU缓存

深入解析：基于Delta的线性数据结构模型，打造高效富文本编辑器

轻松管理社交媒体：使用Automa插件实现一键拉黑功能

关于gpu:优雅的在NVIDIA-GPU上实现sleep

评论

发表回复 取消回复

更多文章

苹果iOS打包的ipa应用无法安装？一篇文章带你了解可能的原因及排查方法

图解Golang：从零开始实现简易版过期LRU缓存

深入解析：基于Delta的线性数据结构模型，打造高效富文本编辑器

轻松管理社交媒体：使用Automa插件实现一键拉黑功能

发表回复取消回复