我正在尝试使用knn算法实现数字分类。
#伪代码 注意:(数字的顺序遵循标签的顺序) 即 train={digit0,digit0,digit1,digit2...} label={0,0,1,2...} labels.shape = (10000,) train.shape=(784*1000)
我有一个包含10000个从0到9的数字的大数据集,这些数字是28×28像素的图像,并且带有对应的标签。标签和数字按照相同的顺序排列。因此,我需要从数据集中提取数字0和1,并对28×28像素的数字0和1进行knn分类,k的值分别为{1,2,3,4,5}。我需要帮助来提取这些数字。
任何建议都将不胜感激
回答:
对于numpy
array
,你可以使用
selected = train[ (label == 0) | (label == 1) ]
import numpy as nptrain = np.array(['digit0-1', 'digit0-2', 'digit1-1', 'digit2-1'])label = np.array([0, 0, 1, 2])selected = train[ (label == 0) | (label == 1) ]print(selected)
对于pandas
DataFrame
,类似地
selected = train['item'][ (label['val'] == 0) | (label['val'] == 1) ]
import pandas as pdtrain = pd.DataFrame({'item': ['digit0-1', 'digit0-2', 'digit1-1', 'digit2-1']})label = pd.DataFrame({'val': [0, 0, 1, 2]})selected = train['item'][ (label['val'] == 0) | (label['val'] == 1) ]print(selected)
或者如果你将所有数据保存在一个DataFrame
中
import pandas as pddf = pd.DataFrame({ 'train': ['digit0-1', 'digit0-2', 'digit1-1', 'digit2-1'], 'label': [0, 0, 1, 2]})selected = df['train'][ (df['label'] == 0) | (df['label'] == 1) ]print(selected)
对于普通的list
,你可以使用zip()
来创建配对
train = ['digit0-1', 'digit0-2', 'digit1-1', 'digit2-1']label = [0, 0, 1, 2]selected = []for t, l in zip(train, label): if l in (0, 1): selected.append(t)print(selected)
使用列表解析式同样可以
train = ['digit0-1', 'digit0-2', 'digit1-1', 'digit2-1']label = [0, 0, 1, 2]selected = [t for t, l in zip(train, label) if l in (0, 1)]print(selected)