saleprice_scaled = /StandardScaler().fit_transform(df_train['SalePrice'][:,np.newaxis]);
为什么这里使用newaxis
?我知道newaxis
,但我无法理解它在这种特定情况下的用途。
回答:
df_train['SalePrice']
是一个Pandas.Series(向量/一维数组),形状为:(N个元素,)
现代(版本:0.17+)的SKLearn方法不喜欢一维数组(向量),它们期望二维数组。
df_train['SalePrice'][:,np.newaxis]
将一维数组(形状:N个元素)转换为二维数组(形状:N行,1列)。
演示:
In [21]: df = pd.DataFrame(np.random.randint(10, size=(5, 3)), columns=list('abc'))In [22]: dfOut[22]: a b c0 4 3 81 7 5 62 1 3 93 7 5 74 7 0 6In [23]: from sklearn.preprocessing import StandardScalerIn [24]: df['a'].shapeOut[24]: (5,) # <--- 一维数组In [25]: df['a'][:, np.newaxis].shapeOut[25]: (5, 1) # <--- 二维数组
还有Pandas的方法可以做到同样的事情:
In [26]: df[['a']].shapeOut[26]: (5, 1) # <--- 二维数组In [27]: StandardScaler().fit_transform(df[['a']])Out[27]:array([[-0.5 ], [ 0.75], [-1.75], [ 0.75], [ 0.75]])
如果我们传递一维数组会发生什么:
In [28]: StandardScaler().fit_transform(df['a'])C:\Users\Max\Anaconda4\lib\site-packages\sklearn\utils\validation.py:429: DataConversionWarning: Data with input dtype int32 was converted to float64 by StandardScaler. warnings.warn(msg, _DataConversionWarning)C:\Users\Max\Anaconda4\lib\site-packages\sklearn\preprocessing\data.py:586: DeprecationWarning: Passing 1d arrays as data is deprecated in 0.17 and will raise ValueError in 0.19. Reshape your data either using X.reshape(-1, 1) if your data has a single feature or X.reshape(1, -1) if it contains a single sample. warnings.warn(DEPRECATION_MSG_1D, DeprecationWarning)C:\Users\Max\Anaconda4\lib\site-packages\sklearn\preprocessing\data.py:649: DeprecationWarning: Passing 1d arrays as data is deprecated in 0.17 and will raise ValueError in 0.19. Reshape your data either using X.reshape(-1, 1) if your data has a single feature or X.reshape(1, -1) if it contains a single sample. warnings.warn(DEPRECATION_MSG_1D, DeprecationWarning)Out[28]: array([-0.5 , 0.75, -1.75, 0.75, 0.75])