当我想保存一个多项式模型时,如何处理多项式次数,因为这个信息没有被保存!
现在,如果我尝试预测:
themodel.predict(X_val)
,我会收到:
ValueError: matmul: Input operand 1 has a mismatch in its core dimension 0, with gufunc signature (n?,k),(k,m?)->(n?,m?) (size 6 is different from 2)
我必须这样做:
pol_feat = PolynomialFeatures(degree=2)
themodel.predict(pol_feat.fit_transform(X_val))
才能正常工作。所以,如何存储这个信息,以便能够使用模型进行预测?
回答:
你还需要pickle训练好的PolynomialFeatures:
# train and picklepoly_reg = PolynomialFeatures(degree = 2)X_poly = poly_reg.fit_transform(X_train)poly_reg_model = LinearRegression().fit(X_poly, y_train)joblib.dump(poly_reg_model, 'themodel')joblib.dump(poly_reg, 'poilynomia_features_model')# load and predictpoilynomia_features_model = joblib.load('poilynomia_features_model')themodel = joblib.load('themodel')X_val_prep = poilynomia_features_model.transform(X_val)predictions = themodel.predict(X_val_prep)
但更好的做法是将所有步骤包装在一个单一的pipeline中:
pipeline = Pipeline(steps=[('poilynomia', PolynomialFeatures()), ('lr', LinearRegression())])pipeline.fit(X_train, y_train)pipeline.predict(X_val)