分层洗牌分割 ValueError: y 中最少的类别只有 1 个成员,太少了

我正在努力让我的分层洗牌分割正常工作。我有两组数据,featureslabels,我想返回一个名为 results 的列表,其中应该包含所有准确率/精确率/召回率/F1 分数的列表。

然而,我觉得我只是在如何返回结果方面感到困惑和迷茫。有人能发现我在这里做错了什么吗?

from sklearn.tree import DecisionTreeClassifierfrom sklearn.ensemble import RandomForestClassifierfrom sklearn.naive_bayes import GaussianNBfrom sklearn.ensemble import AdaBoostClassifierfrom sklearn.neighbors import KNeighborsClassifierfrom sklearn.model_selection import StratifiedShuffleSplitfrom sklearn.metrics import make_scorer, accuracy_score, precision_score, recall_score, f1_score,confusion_matrixsss = StratifiedShuffleSplit(n_splits=1, random_state=42, test_size=0.33)clf_obj = RandomForestClassifier(n_estimators=10)scoring = {'accuracy' : make_scorer(accuracy_score),            'precision' : make_scorer(precision_score),           'recall' : make_scorer(recall_score),            'f1_score' : make_scorer(f1_score)}results = cross_validate(estimator=clf_obj,                            X=features,                            y=labels,                            cv=sss,                            scoring=scoring)

我想让我感到困惑的是,我收到了这个错误:

ValueError: The least populated class in y has only 1 member, which is too few. The minimum number of groups for any class cannot be less than 2.

但我不明白我的 x 和 y 值发生了什么。我能看到的第一个错误似乎与 scoring 参数有关:

---> 29 scoring=scoring)

… 但据我所见,我认为我已经正确填写了 cross_validate() 函数的参数?

完整的错误跟踪:

---------------------------------------------------------------------------ValueError                                Traceback (most recent call last)<ipython-input-27-2af4c433ccc9> in <module>     27                             y=labels,     28                             cv=sss,---> 29                             scoring=scoring)/anaconda3/lib/python3.7/site-packages/sklearn/model_selection/_validation.py in cross_validate(estimator, X, y, groups, scoring, cv, n_jobs, verbose, fit_params, pre_dispatch, return_train_score, return_estimator, error_score)    238             return_times=True, return_estimator=return_estimator,    239             error_score=error_score)--> 240         for train, test in cv.split(X, y, groups))    241     242     zipped_scores = list(zip(*scores))/anaconda3/lib/python3.7/site-packages/sklearn/externals/joblib/parallel.py in __call__(self, iterable)    915             # remaining jobs.    916             self._iterating = False--> 917             if self.dispatch_one_batch(iterator):    918                 self._iterating = self._original_iterator is not None    919 /anaconda3/lib/python3.7/site-packages/sklearn/externals/joblib/parallel.py in dispatch_one_batch(self, iterator)    752             tasks = BatchedCalls(itertools.islice(iterator, batch_size),    753                                  self._backend.get_nested_backend(),--> 754                                  self._pickle_cache)    755             if len(tasks) == 0:    756                 # No more tasks available in the iterator: tell caller to stop./anaconda3/lib/python3.7/site-packages/sklearn/externals/joblib/parallel.py in __init__(self, iterator_slice, backend_and_jobs, pickle_cache)    208     209     def __init__(self, iterator_slice, backend_and_jobs, pickle_cache=None):--> 210         self.items = list(iterator_slice)    211         self._size = len(self.items)    212         if isinstance(backend_and_jobs, tuple):/anaconda3/lib/python3.7/site-packages/sklearn/model_selection/_validation.py in <genexpr>(.0)    233                         pre_dispatch=pre_dispatch)    234     scores = parallel(--> 235         delayed(_fit_and_score)(    236             clone(estimator), X, y, scorers, train, test, verbose, None,    237             fit_params, return_train_score=return_train_score,/anaconda3/lib/python3.7/site-packages/sklearn/model_selection/_split.py in split(self, X, y, groups)   1313         """   1314         X, y, groups = indexable(X, y, groups)-> 1315         for train, test in self._iter_indices(X, y, groups):   1316             yield train, test   1317 /anaconda3/lib/python3.7/site-packages/sklearn/model_selection/_split.py in _iter_indices(self, X, y, groups)   1693         class_counts = np.bincount(y_indices)   1694         if np.min(class_counts) < 2:-> 1695             raise ValueError("The least populated class in y has only 1"   1696                              " member, which is too few. The minimum"   1697                              " number of groups for any class cannot"ValueError: The least populated class in y has only 1 member, which is too few. The minimum number of groups for any class cannot be less than 2.

回答:

错误消息实际上已经说明了一切:

ValueError: y 中最少的类别只有 1 个成员,太少了。任何一个类别的最小分组数不能少于 2。

你可能在你的 y 中有一个只有一个样本的类别,因此任何分层分割实际上都是不可能的。

你可以做的就是从你的数据中删除那个(单个)样本 – 无论如何,仅由单个样本代表的类别对分类没有任何用处…

Related Posts

为什么我们在K-means聚类方法中使用kmeans.fit函数?

我在一个视频中使用K-means聚类技术,但我不明白为…

如何获取Keras中ImageDataGenerator的.flow_from_directory函数扫描的类名?

我想制作一个用户友好的GUI图像分类器,用户只需指向数…

如何查看每个词的tf-idf得分

我试图了解文档中每个词的tf-idf得分。然而,它只返…

如何修复 ‘ValueError: Found input variables with inconsistent numbers of samples: [32979, 21602]’?

我在制作一个用于情感分析的逻辑回归模型时遇到了这个问题…

如何向神经网络输入两个不同大小的输入?

我想向神经网络输入两个数据集。第一个数据集(元素)具有…

逻辑回归与机器学习有何关联

我们正在开会讨论聘请一位我们信任的顾问来做机器学习。一…

发表回复

您的邮箱地址不会被公开。 必填项已用 * 标注