我尝试训练一个LinearSVC模型,并用我创建的线性可分数据集通过cross_val_score
进行评估,但出现了错误。
以下是一个可复现的例子:
from sklearn.model_selection import cross_val_score, train_test_splitfrom sklearn.preprocessing import StandardScalerfrom sklearn.svm import LinearSVCimport matplotlib.pyplot as pltimport numpy as npimport pandas as pd# creating the datasetx1 = 2 * np.random.rand(100, 1)y1 = 5 + 3 * x1 + np.random.randn(100, 1)lable1 = np.zeros((100, 1))x2 = 2 * np.random.rand(100, 1)y2 = 15 + 3 * x2 + np.random.randn(100, 1)lable2 = np.ones((100, 1))x = np.concatenate((x1, x2))y = np.concatenate((y1, y2))lable = np.concatenate((lable1, lable2))x = np.reshape(x, (len(x),))y = np.reshape(y, (len(y),))lable = np.reshape(lable, (len(lable),))d = {'x':x, 'y':y, 'lable':lable}df = pd.DataFrame(data=d)df.plot(kind="scatter", x="x", y="y")# preparing data and modeltrain_set, test_set = train_test_split(df, test_size=0.2, random_state=42)X = train_set.drop("lable", axis=1)y = train_set["lable"].copy()scaler = StandardScaler()scaler.fit_transform(X)linear_svc = LinearSVC(C=5, loss="hinge", random_state=42)linear_svc.fit(X, y)# evaluationscores = cross_val_score(linear_svc, X, y, scoring="neg_mean_squared_error", cv=10)rmse_scores = np.sqrt(-scores)print("Mean:", rmse_scores.mean())
输出结果:
Mean: 0.0
/usr/local/lib/python3.7/dist-packages/sklearn/svm/_base.py:947: ConvergenceWarning: Liblinear failed to converge, increase the number of iterations.”the number of iterations.”, ConvergenceWarning)
回答:
这不是一个错误,而是一个警告,并且它已经包含了一些建议:
增加迭代次数
默认的迭代次数是1000(文档)。
此外,LinearSVC
是一个分类器,因此在cross_val_score
中使用scoring="neg_mean_squared_error"
(即回归指标)是没有意义的;请查看文档以了解每种问题类型相关指标的大致列表。
因此,通过以下更改:
linear_svc = LinearSVC(C=5, loss="hinge", random_state=42, max_iter=100000)scores = cross_val_score(linear_svc, X, y, scoring="accuracy", cv=10)
你的代码可以正常运行,没有任何错误或警告。