这是我将CSV文件转换为pandas对象后的文件。我使用多变量线性回归来进行预测。
area bedrooms age price0 2600 3.0 20 5500001 3000 4.0 15 5650002 3200 NaN 18 6100003 3600 3.0 30 5950004 4000 5.0 8 760000import pandas as pdimport numpy as npfrom sklearn import linear_modelimport mathdf = pd.read_csv("/home/alie/Documents/house.csv",delimiter=",",converters={"price":int})d = math.floor(df['bedrooms'].mean())df.bedrooms = df.bedrooms.fillna(d)reg = linear_model.LinearRegression()df.columns = df.columns.str.strip().str.lower().str.replace(' ', '_').str.replace('(', '').str.replace(')', '')reg.fit(df[['area', 'bedrooms', 'age'],df.price])
当我执行reg.fit时,出现了这个错误,任何关于解决这个问题的帮助都会很有用。
TypeError Traceback (most recent call last)<ipython-input-51-05a6adc5f668> in <module> 9 reg = linear_model.LinearRegression() 10 df.columns = df.columns.str.strip().str.lower().str.replace(' ', '_').str.replace('(', '').str.replace(')', '')---> 11 reg.fit(df[['area', 'bedrooms', 'age'],df.price])~/anaconda3/lib/python3.7/site-packages/pandas/core/frame.py in __getitem__(self, key) 2925 if self.columns.nlevels > 1: 2926 return self._getitem_multilevel(key)-> 2927 indexer = self.columns.get_loc(key) 2928 if is_integer(indexer): 2929 indexer = [indexer]~/anaconda3/lib/python3.7/site-packages/pandas/core/indexes/base.py in get_loc(self, key, method, tolerance) 2655 'backfill or nearest lookups') 2656 try:-> 2657 return self._engine.get_loc(key) 2658 except KeyError: 2659 return self._engine.get_loc(self._maybe_cast_indexer(key))pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc()pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc()TypeError: '(['area', 'bedrooms', 'age'], 0 5500001 5650002 6100003 5950004 760000Name: price, dtype: int64)' is an invalid key
回答:
无效键错误类型通常指的是在尝试访问数据框架时索引不正确。检查第11行,确保你正确声明了df。是否应该是这样:
reg.fit(df['area', 'bedrooms', 'age'], df.price)
这样,fit方法就能得到两个独立的对象。