Scikit-learn: “y中最少的类别只有1个成员”

我试图使用Scikit-learn进行随机森林回归。加载数据后使用Pandas的第一步是将数据分成测试集和训练集。然而,我得到了以下错误:

y中最少的类别只有1个成员

我在谷歌上搜索了这个错误,发现了各种实例,但我仍然无法理解这个错误的含义。

training_file = "training_data.txt"data = pd.read_csv(training_file, sep='\t')y = data.ResultX = data.drop('Result', axis=1)X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=123, stratify=y)pipeline = make_pipeline(preprocessing.StandardScaler(), RandomForestRegressor(n_estimators=100))hyperparameters = { 'randomforestregressor__max_features' : ['auto', 'sqrt', 'log2'],                'randomforestregressor__max_depth' : [None, 5, 3, 1] }model = GridSearchCV(pipeline, hyperparameters, cv=10)model.fit(X_train, y_train)prediction = model.predict(X_test)joblib.dump(model, 'ms5000.pkl')

train_test_split方法产生了以下堆栈跟踪:

Traceback (most recent call last):    File "/Users/justin.shapiro/Desktop/IPML_Model/model_definition.py", line 18, in <module>    X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.22, random_state=123, stratify=y)  File "/Library/Python/2.7/site-packages/sklearn/model_selection/_split.py", line 1700, in train_test_splittrain, test = next(cv.split(X=arrays[0], y=stratify))  File "/Library/Python/2.7/site-packages/sklearn/model_selection/_split.py", line 953, in splitfor train, test in self._iter_indices(X, y, groups):  File "/Library/Python/2.7/site-packages/sklearn/model_selection/_split.py", line 1259, in _iter_indicesraise ValueError("The least populated class in y has only 1"ValueError: The least populated class in y has only 1 member, which is too few. The minimum number of groups for any class cannot be less than 2.

这是我的数据集的一个样本:

var1    var2    var3    var4    var5    var6    var7    var8    Resulthigh    5000.0  0       60      1000    75      0.23    0.75    17912.0mid     5000.0  0       60      1000    50      0.23    0.75    18707.0low     5000.0  0       60      1000    25      0.23    0.75    17912.0high    5000.0  5       60      1000    75      0.23    0.75    18577.0mid     5000.0  5       60      1000    50      0.23    0.75    19407.0low     5000.0  5       60      1000    25      0.23    0.75    18577.0

这个错误是什么意思,我如何解决它?


回答:

错误发生在这一行:

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.22, random_state=123, stratify=y)

尝试删除stratify=y

Related Posts

使用LSTM在Python中预测未来值

这段代码可以预测指定股票的当前日期之前的值,但不能预测…

如何在gensim的word2vec模型中查找双词组的相似性

我有一个word2vec模型,假设我使用的是googl…

dask_xgboost.predict 可以工作但无法显示 – 数据必须是一维的

我试图使用 XGBoost 创建模型。 看起来我成功地…

ML Tuning – Cross Validation in Spark

我在https://spark.apache.org/…

如何在React JS中使用fetch从REST API获取预测

我正在开发一个应用程序,其中Flask REST AP…

如何分析ML.NET中多类分类预测得分数组?

我在ML.NET中创建了一个多类分类项目。该项目可以对…

发表回复

您的邮箱地址不会被公开。 必填项已用 * 标注