我正在尝试对一个数据集进行多元线性回归。我已经准备好了数据集,train_test_split已经完成,当我尝试将模型拟合到线性回归器时,我得到了以下错误:
我还附上了下面的代码。请查看并帮助我解决这个错误。
import numpy as npimport matplotlib.pyplot as pltimport pandas as pd%matplotlib inlinedataset = pd.read_csv('50_Startups.csv');dataset.head()x = dataset.iloc[:,:-1]y = dataset.iloc[:,:4]states = pd.get_dummies(x['State'], drop_first=True)states.head()x = x.drop('State', axis=1)x.head()x = pd.concat([x, states], axis=1)from sklearn.model_selection import train_test_splitx_train, X_test, y_train, y_test = train_test_split(x, y, test_size=0.2, random_state=0)from sklearn.linear_model import LinearRegressionregressor = LinearRegression()regressor.fit(x_train, y_train)
回答:
您遇到错误是因为选择了错误的Y值(目标值)。这样做可以工作 –
import numpy as npimport matplotlib.pyplot as pltimport pandas as pd%matplotlib inlinedataset = pd.read_csv('50_Startups.csv');dataset.head()x = dataset.iloc[:,:-1]y = dataset['Profit']x = pd.get_dummies(dataset, prefix=['State'])from sklearn.model_selection import train_test_splitx_train, X_test, y_train, y_test = train_test_split(x, y, test_size=0.2, random_state=0)from sklearn.linear_model import LinearRegressionregressor = LinearRegression()regressor.fit(x_train, y_train)