from sklearn.model_selection import train_test_split from sklearn.preprocessing import StandardScalerfrom sklearn.linear_model import SGDRegressorfrom sklearn.model_selection import RandomizedSearchCVtrain,test,train_label,test_label=train_test_split(feature_data,target_data,test_size=0.20,random_state=3)sc=StandardScaler()train_std=sc.fit_transform(train)test_Std=sc.transform(test)pipe=SGDRegressor()parameters = {'sgd__loss':['squared_loss','huber'], 'sgd__n_iter':np.ceil(106/len(train_label)), 'sgd__alpha':10.0**np.arange(1,7), }g_search=RandomizedSearchCV(pipe,param_distributions=parameters,random_state=2)g_fit=g_search.fit(train_std,train_label)
训练数据:
train_std Out[46]:array([[ 1.99470848, 2.39114909, 0.96705 , ..., 0.23698853, 0.89215521, -0.74111955], [-0.50742363, -0.54567689, -0.29516734, ..., 0.00491999, -0.73959331, 0.42680023], [-0.46965669, -0.10483307, 0.90566027, ..., -0.34272278, 0.69705485, 0.56151837], ..., [-0.05849323, 0.11803686, 0.45737245, ..., 0.24026818, 0.75026404, -0.3829142 ], [ 0.83045625, 0.66257208, -0.01582026, ..., 0.32870492, -0.27844698, -0.83648146], [-0.0886727 , 0.46158079, 1.36521081, ..., -0.10050365, -0.68638412, -0.04006983]])
训练标签
train_labelOut[47]: 24429 1.86332179 18.29642715 1.4176486 6.56239407 18.669 ...42602 6.0026557 2.92130305 11.8354718 1.212
错误:类型为 ‘numpy.float64’ 的对象没有长度()
在拟合训练数据时,g_fit 引发了错误
我试图使用 RandomizedSearchCV 来使用 SGDRegressor,但在拟合训练数据时遇到了错误
回答:
因此,错误是由键 'sgd__n_iter'
对应的值 np.ceil(10**6/len(train_label))
引起的。
因此,您有两种方法来修复这个问题:
- 将其转换为列表:
[np.ceil(10**6/len(train_label))]
- 直接将其添加到
SGDRegressor
的构造函数中,而不是将其放入param_distributions
字典中。
我还注意到您的代码中存在一些不一致之处,因此请查看以下一个更简洁和稍微清理过的版本
import numpy as npfrom sklearn.pipeline import Pipelinefrom sklearn.linear_model import SGDRegressorfrom sklearn.model_selection import RandomizedSearchCV, train_test_splitfrom sklearn.preprocessing import StandardScalerfrom sklearn.datasets import make_regressionn_samples = 1000n_features = 50X, y = make_regression(n_samples=n_samples, n_features=n_features)X_train, X_test, y_train, y_test = train_test_split(X, y)pipe = Pipeline([('scaler', StandardScaler()), ('sgd', SGDRegressor())])parameters = {'sgd__loss': ['squared_loss','huber'], 'sgd__n_iter': [np.ceil(10**6 / n_samples)], 'sgd__alpha': 10.0**np.arange(1,7)} g_search = RandomizedSearchCV(pipe, param_distributions=parameters, random_state=2)g_search.fit(X_train, y_train)