我有以下代码,尝试根据非价格基础特征来估值股票。
price = df.loc[:,'regularMarketPrice']features = df.loc[:,feature_list]# X_train, X_test, y_train, y_test = train_test_split(features, price, test_size = 0.15, random_state = 1)if len(X_train.shape) < 2: X_train = np.array(X_train).reshape(-1,1) X_test = np.array(X_test).reshape(-1,1)# model = LinearRegression()model.fit(X_train,y_train)# print('Train Score:', model.score(X_train,y_train))print('Test Score:', model.score(X_test,y_test))# y_predicted = model.predict(X_test)
在我的df中(非常大),从来没有’regularMarketPrice’小于0的实例。然而,我偶尔会收到一些点在y_predicted中的值小于0的情况。
在Scikit中有没有一种方法可以说任何小于0的值都是无效预测?我希望这能使我的模型更准确。
如果需要进一步解释,请评论。
回答:
为了使更多预测值大于0,你不应该使用线性回归。你应该考虑使用广义线性回归(glm),比如泊松回归。
from sklearn.linear_model import PoissonRegressorprice = df.loc[:,'regularMarketPrice']features = df.loc[:,feature_list]# X_train, X_test, y_train, y_test = train_test_split(features, price, test_size = 0.15, random_state = 1)if len(X_train.shape) < 2: X_train = np.array(X_train).reshape(-1,1) X_test = np.array(X_test).reshape(-1,1)# model = PoissonRegressor()model.fit(X_train,y_train)# print('Train Score:', model.score(X_train,y_train))print('Test Score:', model.score(X_test,y_test))# y_predicted = model.predict(X_test)
所有预测值都大于或等于0