共计 3594 个字符,预计需要花费 9 分钟才能阅读完成。
本文首发于个人博客 https://kezunlin.me/post/8d877e63/,欢迎阅读!
cpp caffe net run in multiple threads
Guide
set_mode
Caffe fails to use GPU in a new thread???
see here
the `Caffe::mode_` variable that controls this is thread-local,
so ensure you’re calling `caffe.set_mode_gpu()` in each thread
before running any Caffe functions. That should solve your issue.
Caffe set_mode GPU 在多线程下失效
在 main thread 中设置 GPU 模式,在 worker thread 中调用网络进行检测,GPU 模式不起效,默认仍然使用 CPU 模式,所以速度很慢,和 GPU 相比慢了
10 倍左右。解决方案:在子线程中 set_mode, 然后调用网络进行检测。(1) 创建网络在 main thread。static 网络存储在全局静态数据区。worker thread 可以直接使用。(2) 在 worker thread 中检测,需要在子线程中 set_mode, 然后调用网络进行检测。结论:(1)caffe 的 set_mode 所在的线程必须和使用 nets 进行 forward 的线程相同。否则默认使用 CPU 模式,速度会很慢。(2)caffe 的 nets 初始化可以在 main thread 也可以在 worker thread。
code example
#include <iostream>
#include <string>
#include <thread>
#include <gtest/gtest.h>
#include <glog/logging.h>
#include <boost/date_time/posix_time/posix_time.hpp>
// opencv
#include <opencv2/core.hpp>
#include <opencv2/highgui.hpp>
#include <opencv2/imgproc.hpp>
using namespace std;
#include "algorithm/algorithm.h"
using namespace kezunlin::algorithm;
#pragma region net-demo
void topwire_demo(bool run_in_worker_thread)
{if (run_in_worker_thread) {CaffeApi::set_mode(true, 0, 1234);// set in worker thread-1, use GPU-0
}
// do net detect
// ...
}
void railway_demo(bool run_in_worker_thread)
{if (run_in_worker_thread) {CaffeApi::set_mode(true, 0, 1234);// set in worker thread-1, use GPU-0
}
// do net detect
// ...
}
void sidewall_demo(bool run_in_worker_thread)
{if (run_in_worker_thread) {CaffeApi::set_mode(true, 0, 1234);// set in worker thread-1, use GPU-0
}
// do net detect
// ...
}
void lockcatch_demo(bool run_in_worker_thread)
{if (run_in_worker_thread) {CaffeApi::set_mode(true, 0, 1234);// set in worker thread-1, use GPU-0
}
// do net detect
// ...
}
#pragma endregion
#pragma region worker-thread-demo
void worker_thread_topwire_demo(bool run_in_worker_thread)
{std::thread thr(topwire_demo, run_in_worker_thread);
thr.join();}
void worker_thread_railway_demo(bool run_in_worker_thread)
{std::thread thr(railway_demo, run_in_worker_thread);
thr.join();}
void worker_thread_sidewall_demo(bool run_in_worker_thread)
{std::thread thr(sidewall_demo, run_in_worker_thread);
thr.join();}
void worker_thread_lockcatch_demo(bool run_in_worker_thread)
{std::thread thr(lockcatch_demo, run_in_worker_thread);
thr.join();}
#pragma endregion
enum DETECT_TYPE {
SET_IN_MAIN_DETECT_IN_MAIN, // 主线程 set_mode,主线程检测,40ms 左右,使用 GPU
SET_IN_WORKER_DETECT_IN_WORKER, // 子线程 set_mode,子线程检测,40ms 左右,使用 GPU
SET_IN_MAIN_DETECT_IN_WORKER // 主线程 set_mode,子线程检测,400ms 左右,慢了 10 倍左右,没有使用 GPU
};
void thread_demo()
{
DETECT_TYPE detect_type = SET_IN_MAIN_DETECT_IN_MAIN;
detect_type = SET_IN_WORKER_DETECT_IN_WORKER;
detect_type = SET_IN_MAIN_DETECT_IN_WORKER;
init_algorithm_api();
switch (detect_type)
{
case SET_IN_MAIN_DETECT_IN_MAIN:
topwire_demo(false);
railway_demo(false);
sidewall_demo(false);
lockcatch_demo(false);
break;
case SET_IN_WORKER_DETECT_IN_WORKER:
worker_thread_topwire_demo(true);
worker_thread_railway_demo(true);
worker_thread_sidewall_demo(true);
worker_thread_lockcatch_demo(true);
break;
case SET_IN_MAIN_DETECT_IN_WORKER:
worker_thread_topwire_demo(false);
worker_thread_railway_demo(false);
worker_thread_sidewall_demo(false);
worker_thread_lockcatch_demo(false);
break;
default:
break;
}
free_algorithm_api();}
void test_algorithm_api()
{thread_demo();
}
TEST(algorithn_test, test_algorithm_api) {test_algorithm_api();
}
- SET_IN_MAIN_DETECT_IN_MAIN, // 主线程 set_mode,主线程检测,40ms 左右,使用 GPU
- SET_IN_WORKER_DETECT_IN_WORKER, // 子线程 set_mode,子线程检测,40ms 左右,使用 GPU
- SET_IN_MAIN_DETECT_IN_WORKER // 主线程 set_mode,子线程检测,400ms 左右,慢了 10 倍左右,没有使用 GPU
Reference
- Caffe fails to use GPU in a new thread
History
- 20180712: created.
Copyright
- Post author: kezunlin
- Post link: https://kezunlin.me/post/8d877e63/
- Copyright Notice: All articles in this blog are licensed under CC BY-NC-SA 3.0 unless stating additionally.
正文完