Home IT技术如何在Scikit-Learn中对训练和测试数据进行分层？

如何在Scikit-Learn中对训练和测试数据进行分层？

IT技术 xiaolong · 2025年5月27日 · 0 Comment

我正在尝试为从Kaggle下载的Iris数据集实现分类算法。在物种列中，类别（Iris-setosa、Iris-versicolor、Iris-virginica）按排序顺序排列。我如何使用Scikit-Learn对训练和测试数据进行分层？

回答：

如果你想以0.3的测试比例来打乱并分割你的数据，你可以使用

sklearn.model_selection import train_test_splitX_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, shuffle=True)

其中X是你的数据，y是相应的标签，test_size是应该保留用于测试的数据百分比，shuffle=True会在分割前打乱数据

为了确保数据根据某一列均匀分割，你可以将该列传递给stratify参数。

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3,                                                     shuffle=True,                                 stratify = X['YOUR_COLUMN_LABEL'])

geopandas machine-learning multiclass-classification python scikit-learn

发表回复取消回复