使用GPU在Colab上通过TuriCreate训练对象检测模型

我已经尝试了几天在Google Colab上使用GPU和TuriCreate来训练一个对象检测模型。

根据TuriCreate的存储库，要在训练过程中使用GPU，必须遵循以下说明：

https://github.com/apple/turicreate/blob/main/LinuxGPU.md

然而，每次我开始训练时，shell会在开始训练前输出以下内容：

"Using CPU to create model."

我的Colab结构如下：

设置CUDA环境

!wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1804/x86_64/cuda-ubuntu1804.pin!sudo mv cuda-ubuntu1804.pin /etc/apt/preferences.d/cuda-repository-pin-600!sudo apt-key adv --fetch-keys https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1804/x86_64/7fa2af80.pub!sudo add-apt-repository "deb https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1804/x86_64/ /"!sudo apt-get update!wget http://developer.download.nvidia.com/compute/machine-learning/repos/ubuntu1804/x86_64/nvidia-machine-learning-repo-ubuntu1804_1.0.0-1_amd64.deb!sudo apt install ./nvidia-machine-learning-repo-ubuntu1804_1.0.0-1_amd64.deb!sudo apt-get update!wget https://developer.download.nvidia.com/compute/machine-learning/repos/ubuntu1804/x86_64/libnvinfer7_7.1.3-1+cuda11.0_amd64.deb!sudo apt install ./libnvinfer7_7.1.3-1+cuda11.0_amd64.deb!sudo apt-get update# Install development and runtime libraries (~4GB)!sudo apt-get install --no-install-recommends \    cuda-11-0 \    libcudnn8=8.0.4.30-1+cuda11.0  \    libcudnn8-dev=8.0.4.30-1+cuda11.0# Install TensorRT. Requires that libcudnn8 is installed above.!sudo apt-get install -y --no-install-recommends libnvinfer7=7.1.3-1+cuda11.0 \    libnvinfer-dev=7.1.3-1+cuda11.0 \    libnvinfer-plugin7=7.1.3-1+cuda11.0tc.config.set_num_gpus(-1)model = tc.object_detector.create(train_sf)scores = model.evaluate(valid_sf)print(scores['mean_average_precision'])model.export_coreml('model.mlmodel')

使用nvidia-smi检查安装

+-----------------------------------------------------------------------------+| NVIDIA-SMI 470.57.02    Driver Version: 460.32.03    CUDA Version: 11.2     ||-------------------------------+----------------------+----------------------+| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC || Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. ||                               |                      |               MIG M. ||===============================+======================+======================||   0  Tesla K80           Off  | 00000000:00:04.0 Off |                    0 || N/A   33C    P8    27W / 149W |      0MiB / 11441MiB |      0%      Default ||                               |                      |                  N/A |+-------------------------------+----------------------+----------------------+                                                                               +-----------------------------------------------------------------------------+| Processes:                                                                  ||  GPU   GI   CI        PID   Type   Process name                  GPU Memory ||        ID   ID                                                   Usage      ||=============================================================================||  No running processes found                                                 |+-----------------------------------------------------------------------------+

依赖安装

!pip install turicreate!pip uninstall -y tensorflow!pip install tensorflow-gpu

设置bash环境变量

!echo export LD_LIBRARY_PATH=/usr/local/cuda/lib64:$LD_LIBRARY_PATH >> ~/.bashrc

训练

tc.config.set_num_gpus(-1)model = tc.object_detector.create(train_sf)scores = model.evaluate(valid_sf)print(scores['mean_average_precision'])model.export_coreml('model.mlmodel')

这是输出

TuriCreate currently only supports using one GPU. Setting 'num_gpus' to 1.Using 'image' as feature columnUsing 'annotations' as annotations columnUsing CPU to create model.Setting 'batch_size' to 32

我无法理解我遗漏了什么。

回答：

我设法解决了这个问题：问题是由于Colab机器上预装的tensorflow版本引起的。

!pip uninstall -y tensorflow!pip uninstall -y tensorflow-gpu!pip install turicreate!pip install tensorflow==2.4.0

学技术

使用GPU在Colab上通过TuriCreate训练对象检测模型

发表回复取消回复

相关文章：

Related Posts

使用LSTM在Python中预测未来值

如何在gensim的word2vec模型中查找双词组的相似性

dask_xgboost.predict 可以工作但无法显示 – 数据必须是一维的

ML Tuning – Cross Validation in Spark

如何在React JS中使用fetch从REST API获取预测

如何分析ML.NET中多类分类预测得分数组？

发表回复 取消回复

发表回复取消回复