如何为图像分类准备训练数据

我是机器学习的新手，在图像分类方面遇到了一些问题。我正在尝试使用简单的分类技术K最近邻来区分猫和狗。

到目前为止我的代码如下：

import pandas as pdimport numpy as npimport seaborn as snsimport matplotlib.pyplot as plt%matplotlib inlineDATADIR = "/Users/me/Desktop/ds2/ML_image_classification/kagglecatsanddogs_3367a/PetImages"CATEGORIES = ['Dog', 'Cat']IMG_SIZE = 30data = []categories = []for category in CATEGORIES:    path = os.path.join(DATADIR, category)     categ_id = CATEGORIES.index(category)    for img in os.listdir(path):        try:            img_array = cv2.imread(os.path.join(path,img), 0)            new_array = cv2.resize(img_array, (IMG_SIZE, IMG_SIZE))            data.append(new_array)            categories.append(categ_id)        except Exception as e:            # print(e)            passprint(data[0])s1 = pd.Series(data)s2 = pd.Series(categories)frame = {'Img array': s1, 'category': s2}df = pd.DataFrame(frame) from sklearn.model_selection import train_test_splitfrom sklearn.neighbors import KNeighborsClassifierX_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3)knn = KNeighborsClassifier()knn.fit(X_train, y_train)

当我尝试拟合数据时，这里出现了错误：

   ---------------------------------------------------------------------------ValueError                                Traceback (most recent call last)<ipython-input-76-9d98d7b11202> in <module>      2 from sklearn.neighbors import KNeighborsClassifier      3 ----> 4 X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3)      5       6 print(X_train)~/opt/anaconda3/lib/python3.7/site-packages/sklearn/model_selection/_split.py in train_test_split(*arrays, **options)   2094         raise TypeError("Invalid parameters passed: %s" % str(options))   2095 -> 2096     arrays = indexable(*arrays)   2097    2098     n_samples = _num_samples(arrays[0])~/opt/anaconda3/lib/python3.7/site-packages/sklearn/utils/validation.py in indexable(*iterables)    228         else:    229             result.append(np.array(X))--> 230     check_consistent_length(*result)    231     return result    232 ~/opt/anaconda3/lib/python3.7/site-packages/sklearn/utils/validation.py in check_consistent_length(*arrays)    203     if len(uniques) > 1:    204         raise ValueError("Found input variables with inconsistent numbers of"--> 205                          " samples: %r" % [int(l) for l in lengths])    206     207 ValueError: Found input variables with inconsistent numbers of samples: [24946, 22451400]

如何正确地准备训练数据？顺便说一下，我不想使用深度学习。这将是我下一步要做的。

在这里任何帮助都将不胜感激..

回答：

如果你不为图像分类使用深度学习，你必须准备适合监督学习分类的数据。

步骤

1) 将所有图像调整到相同大小。你可以遍历每个图像，调整大小并保存。

2) 获取每张图像的像素向量并创建数据集。例如，如果你的猫图像在“Cat”文件夹中，狗图像在“Dog”文件夹中，遍历文件夹中的所有图像并获取像素值。同时将数据标记为“cat”（cat=1）和“non-cat”（non-cat=0）

3) 合并catdf和dogdf，并打乱数据框

data = pd.concat([catdf,dogdf])      data = data.sample(frac=1)

现在你有了一个带有图像标签的数据集。

4) 将数据集分成训练集和测试集，并拟合到模型中。

学技术

如何为图像分类准备训练数据

发表回复取消回复

相关文章：

Related Posts

使用LSTM在Python中预测未来值

如何在gensim的word2vec模型中查找双词组的相似性

dask_xgboost.predict 可以工作但无法显示 – 数据必须是一维的

ML Tuning – Cross Validation in Spark

如何在React JS中使用fetch从REST API获取预测

如何分析ML.NET中多类分类预测得分数组？

发表回复 取消回复

发表回复取消回复