如果我想在我的逻辑回归模型中加入二次多项式(该模型有两个预测变量),就像我尝试过的下面的方法:
df_poly = df[['Y','x0','x1']].copy()X_train, X_test, Y_train, Y_test = train_test_split(df_poly.drop('Y',axis=1), df_poly['Y'], test_size=0.20, random_state=10)poly = PolynomialFeatures(degree = 2, interaction_only=False, include_bias=False)lr = LogisticRegression()pipe = Pipeline([('polynomial_features',poly), ('logistic_regression',lr)])pipe.fit(X_train, Y_train)
我会得到x0, x1, x0^2, x1^2, x0*x1的系数。
然而,我希望调整这个过程,只拟合x0, x1, x0^2和x0*x1。也就是说,我希望排除x1^2项的可能性。通过sklearn库有办法做到这一点吗?
回答:
我会使用ColumnTransformer
、PolynomialFeatures
和FunctionTransformer
的组合
array([[ 0., 10., 0., 0.], [ 1., 11., 11., 1.], [ 2., 12., 24., 4.], [ 3., 13., 39., 9.], [ 4., 14., 56., 16.], [ 5., 15., 75., 25.], [ 6., 16., 96., 36.], [ 7., 17., 119., 49.], [ 8., 18., 144., 64.], [ 9., 19., 171., 81.]])