我想在我的数据集上尝试所有回归算法并选择最佳的。我决定从线性回归开始。但我得到了一些错误。我尝试做缩放但也得到了另一个错误。
这是我的代码:
import pandas as pdfrom sklearn.preprocessing import StandardScalerfrom sklearn.linear_model import LogisticRegressiontrain_df = pd.read_csv('train.csv', index_col='ID')train_df.head()target = 'Result'X = train_df.drop(target, axis=1)y = train_df[target]# Trying to scale and get even worse error#ss = StandardScaler()#df_scaled = pd.DataFrame(ss.fit_transform(train_df),columns = train_df.columns)#X = df_scaled.drop(target, axis=1)#y = df_scaled[target]model = LogisticRegression() model.fit(X, y) LogisticRegression(C=1.0, class_weight=None, dual=False, fit_intercept=True, intercept_scaling=1, l1_ratio=None, max_iter=10000, multi_class='auto', n_jobs=None, penalty='l2', random_state=None, solver='lbfgs', tol=0.0001, verbose=10, warm_start=False) print(X.iloc[10])print(model.predict([X.iloc[10]]))print(y[10])
这是错误信息:
ConvergenceWarning: lbfgs failed to converge (status=1):STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.htmlPlease also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result(A 0B -19C -19D -19E 0F -19Name: 10, dtype: int64[0]-19
这是数据集的一个示例:
ID,A,B,C,D,E,F,Result0,-18,18,18,-2,-12,-3,-191,-19,-8,0,18,18,1,02,0,-11,18,0,-19,18,183,18,-15,-12,18,-11,-4,-174,-17,18,-11,-17,-18,-19,185,18,-14,-19,-14,-15,-19,186,18,-17,18,18,18,-2,-17,-1,-11,0,18,18,18,188,18,-19,-18,-19,-19,18,189,18,18,0,0,18,18,010,0,-19,-19,-19,0,-19,-1911,-19,0,-19,18,-19,-19,-612,-6,18,0,0,0,18,-1513,-15,-19,-6,-19,-19,0,014,0,-15,0,18,18,-19,1815,18,-19,18,-8,18,-2,-416,-4,-4,18,-19,18,18,1817,18,0,18,-4,-10,0,1818,18,0,18,18,18,18,-19
我做错了什么?
回答:
你使用的是LogisticRegression,这是一种用于分类因变量的线性回归的特殊情况。
这不一定是错误的,因为你可能有意这样做,但这意味着你需要每个类别有足够的数据和足够的迭代次数让模型收敛(你的错误指出,它还没有做到)。
然而,我猜测你真正想使用的是来自sklearn库的LinearRegression(用于连续因变量)。