我试图转换多个包含大量分类值数据的列,但在使用OneHotEncoder时遇到了错误
1) 将列分离到X_census和Y_census中(X_census包含分类值):
X_census = df[['workclass', 'education', 'marital-status', 'occupation', 'relationship', 'race', 'sex', 'native-country']]Y_census = df['income']
2) 使用LabelEncoder处理X_census中的分类值:
from sklearn.preprocessing import LabelEncoder
le = LabelEncoder()X_1 = X_census.apply(le.fit_transform)X_2 = X_1.to_numpy()
3) 现在对X_2使用OneHotEncoder将分类值转换为数值:
from sklearn.preprocessing import OneHotEncoderfrom sklearn.compose import ColumnTransformeroh = OneHotEncoder()onehotencoder_census = ColumnTransformer(transformers=[('OneHot', oh, X_2[:])],remainder='passthrough')X_census = onehotencoder_census.fit_transform(X_census) # 错误出现在这里!
回答:
你可以使用pandas.get_dummies
df = pd.DataFrame({“marital_status”:[‘S’,’M’,’D’,’S’,’M’,’D’,’S’,’M’,’D’],”sex”:[“male”,”female”,”male”,”female”,”male”,”female”,”male”,”female”,”male”],”education”:[‘grad’,’post-grad’,’grad’,’post-grad’,’grad’,’post-grad’,’grad’,’post-grad’,’grad’],”income”:[125,135,120,110,90,150,180,130,110]})
pd.get_dummies(df)