这是数据集的一部分:
a b c result0 1 1 positive0 0 1 negative0 1 1 negative0 0 0 positiveresult = [1 if v=='positive' else 0 for v in data['result'].tolist()]Output = resultX = data["a", "b", "c"]y = np.reshape(Output, (X.shape[0], 1))
我尝试使用sklearn中的交叉验证方法来预测X数据的类别: 这部分代码是有效的:
logreg = LogisticRegression('l2')y_pred_class = cross_val_predict(logreg, X, y, cv=10, method= 'predict' )
但是当我想用以下代码计算一个类的概率时:
y_pred_prob = cross_val_predict(logreg, X, y, cv=10, method='predict_proba')
出现了这个错误:
index 1 is out of bounds for axis 1 with size 1
你知道问题出在哪里吗?
回答:
当你调用 method="predict"
时,你会收到一个警告:
DataConversionWarning: A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples, ), for example using ravel().return f(**kwargs)/usr/local/lib/python3.8/dist-packages/sklearn/utils/validation.py:72: DataConversionWarning: A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples, ), for example using ravel().return f(**kwargs)
如果你遵循这个警告,它将解决 method="predict_proba"
中的错误。你只需要更改这一行
y = np.reshape(Output, (X.shape[0], 1))
为
y = np.reshape(Output, (X.shape[0],))
或者甚至
y = np.array(result)
或者根本不用列表解析,继续使用pandas:
y = data["result"].replace({"positive": 1, "negative": 0})