我在学习这个教程:http://ahmedbesbes.com/how-to-score-08134-in-titanic-kaggle-challenge.html
一切顺利,直到我到了中间部分的最后一个部分:
如你所见,特征的范围在不同的区间。我们需要将它们都标准化到单位区间。除了我们需要用于提交的PassengerId之外的所有特征
In [48]:>>> def scale_all_features():>>> global combined>>> features = list(combined.columns)>>> features.remove('PassengerId')>>> combined[features] = combined[features].apply(lambda x: x/x.max(), axis=0)>>> print 'Features scaled successfully !'In [49]:>>> scale_all_features()
特征缩放成功!
尽管我在Python脚本中逐字逐句地输入了这些代码:
#Cell 48GreatDivide.split()def scale_all_features(): global combined features = list(combined.columns) features.remove('PassengerId') combined[features] = combined[features].apply(lambda x: x/x.max(), axis=0) print 'Features scaled successfully !'#Cell 49GreatDivide.split()scale_all_features()
它还是不断报错:
--------------------------------------------------48----------------------------------------------------------------------------------------------------49--------------------------------------------------Traceback (most recent call last): File "KaggleTitanic[2-FE]--[01].py", line 350, in <module> scale_all_features() File "KaggleTitanic[2-FE]--[01].py", line 332, in scale_all_features combined[features] = combined[features].apply(lambda x: x/x.max(), axis=0) File "/usr/local/lib/python2.7/dist-packages/pandas/core/frame.py", line 4061, in apply return self._apply_standard(f, axis, reduce=reduce) File "/usr/local/lib/python2.7/dist-packages/pandas/core/frame.py", line 4157, in _apply_standard results[i] = func(v) File "KaggleTitanic[2-FE]--[01].py", line 332, in <lambda> combined[features] = combined[features].apply(lambda x: x/x.max(), axis=0) File "/usr/local/lib/python2.7/dist-packages/pandas/core/ops.py", line 651, in wrapper return left._constructor(wrap_results(na_op(lvalues, rvalues)), File "/usr/local/lib/python2.7/dist-packages/pandas/core/ops.py", line 592, in na_op result[mask] = op(x[mask], y)TypeError: ("unsupported operand type(s) for /: 'str' and 'str'", u'occurred at index Ticket')
这里的问题是什么?前49个部分都运行得很顺利,所以如果有错误,应该早就显示出来了,对吗?
回答:
你可以使用以下方法来确保数学变换只在数值列上进行。
numeric_cols = combined.columns[combined.dtypes != 'object']combined.loc[:, numeric_cols] = combined[numeric_cols] / combined[numeric_cols].max()
不需要使用那个apply函数。