如何在Python中以层次方式从已预测类的集群中预测子类

假设我有以下数据框:

     Student_Id  Math  Physical  Arts Class Sub_Class0        id_1     6         7     9     A         x1        id_2     9         7     1     A         y2        id_3     3         5     5     C         x3        id_4     6         8     9     A         x4        id_5     6         7    10     B         z5        id_6     9         5    10     B         z6        id_7     3         5     6     C         x7        id_8     3         4     6     C         x8        id_9     6         8     9     A         x9       id_10     6         7    10     B         z10      id_11     9         5    10     B         z11      id_12     3         5     6     C         x12      id_13     3         4     6     C         x

我想使用RandomForestClassifier分类器,首先以为目标变量进行训练,并在测试数据集中预测

    Student_Id Class Sub_Class predicted_class11      id_12     C         x               C8        id_9     A         x               A3        id_4     A         x               A

然后,它会获取测试数据集中每个预测的类,并仅针对该特定类别组的训练数据集进行训练,通过逐一添加每个组来预测子类

  1. 首先,它会选择一个类’C’,仅在类’C’上进行训练并预测子类
   Student_Id Class Sub_Class predicted_class preicted_Sub_Class11      id_12     C         x               C    x

2)接下来,它会选择类’A’,仅在类’A’上进行训练并预测子类

   Student_Id Class Sub_Class predicted_class preicted_Sub_Class8        id_9     A         x               A    x3        id_4     A         x               A    y

3)最后,它会将它们全部组合起来

   Student_Id Class Sub_Class predicted_class preicted_Sub_Class11      id_12     C         x               C    x8        id_9     A         x               A    x3        id_4     A         x               A    y

总结,我不想分别训练和预测类/子类。我想先预测类,然后使用该预测按类别集群地训练模型,因为我认为这样可以改善结果。

我无法理解如何进行第二部分的循环和针对每个类训练模型以获得子类的方法。

目前没有第二部分的示例代码

import pandas as pdfrom sklearn.metrics import classification_reportfrom sklearn import metrics from sklearn.metrics import confusion_matrixfrom sklearn.metrics import accuracy_scorefrom sklearn import metrics from sklearn.ensemble import RandomForestClassifierfrom sklearn.model_selection import train_test_split#Ceate dataframedata = [    ["id_1",6,7,9, "A", "x"],    ["id_2",9,7,1, "A","y" ],    ["id_3",3,5,5, "C", "x"],    ["id_4",6,8,9, "A","x" ],    ["id_5",6,7,10, "B", "z"],    ["id_6",9,5,10,"B", "z"],    ["id_7",3,5,6, "C", "x"],    ["id_8",3,4,6, "C", "x"],    ["id_9",6,8,9, "A","x" ],    ["id_10",6,7,10, "B", "z"],    ["id_11",9,5,10,"B", "z"],    ["id_12",3,5,6, "C", "x"],    ["id_13",3,4,6, "C", "x"]]df = pd.DataFrame(data, columns = ['Student_Id', 'Math', 'Physical','Arts', 'Class', 'Sub_Class'])#Split into test and traintraining_data, testing_data = train_test_split(df, test_size=0.2, random_state=25)# First predict(classify) the Class--------------------------------------------#Create train dataX_train = training_data[['Math', 'Physical','Arts']]y_train = training_data[['Class']]#Create testX_test = testing_data[['Math', 'Physical','Arts']]y_test = testing_data[['Class']]#Ranom Forest classifier for  predicting class rfc = RandomForestClassifier(n_estimators=50).fit(X_train, y_train) predictions = rfc.predict(X_test)rfc_table = testing_data[['Student_Id', 'Class', 'Sub_Class']]rfc_table = rfc_table.assign(predicted_class=predictions)#Next train for Sub_Class------------------------------------------------------

回答:

你可以这样做

# 我们创建一个训练函数,它接收一个df并返回在其上的预测子类def train_sub(df):    # 一个模型字典,用于返回训练后的模型    models = {}    # 现在我们将选择df中所有唯一的类并遍历它们    for i in df['Class'].unique():        # 从df中选择类等于i的索引        temp_idx = df[df['Class'] == i].index        train_idx, test_idx = train_test_split(temp_idx, test_size=0.2, random_state=25)        X_train = df.loc[train_idx, ['Math', 'Physical','Arts']]        y_train = df.loc[train_idx, ['Sub_Class']]        X_test = df.loc[test_idx, ['Math', 'Physical','Arts']]        y_test = df.loc[test_idx, ['Sub_Class']]                # 训练模型以分类该类下的子类        temp_model = RandomForestClassifier(n_estimators=50).fit(X_train, y_train)                # 将预测值添加到整个df中,属于相应的类        df.loc[temp_idx, 'Predicted_subClass'] = temp_model.predict(df.loc[temp_idx, ['Math', 'Physical','Arts']])        # 将模型添加到字典中        models[i] = temp_model    return models# 调用函数models = train_sub(df)# 查看结果df

Related Posts

使用LSTM在Python中预测未来值

这段代码可以预测指定股票的当前日期之前的值,但不能预测…

如何在gensim的word2vec模型中查找双词组的相似性

我有一个word2vec模型,假设我使用的是googl…

dask_xgboost.predict 可以工作但无法显示 – 数据必须是一维的

我试图使用 XGBoost 创建模型。 看起来我成功地…

ML Tuning – Cross Validation in Spark

我在https://spark.apache.org/…

如何在React JS中使用fetch从REST API获取预测

我正在开发一个应用程序,其中Flask REST AP…

如何分析ML.NET中多类分类预测得分数组?

我在ML.NET中创建了一个多类分类项目。该项目可以对…

发表回复

您的邮箱地址不会被公开。 必填项已用 * 标注