使用pandas数据框时遇到错误(需要一维数据)

我正在尝试使用我的数据来确定模型的符合预测,但我在icp.calibrate处遇到了以下错误:

Exception: Data must be 1-dimensional

下面您可以找到关于此问题的最近的回溯错误。不幸的是,根据上面的代码,我不确定这实际上意味着什么。我正在使用pandas数据框来处理这个问题。

代码:

from sklearn.tree import DecisionTreeRegressorfrom nonconformist.cp import IcpRegressorfrom nonconformist.base import RegressorAdapterfrom nonconformist.nc import RegressorNc, AbsErrorErrFunc, RegressorNormalizer, NcFactoryfrom sklearn.model_selection import train_test_splitimport numpy as npimport pandas as pd# -----------------------------------------------------------------------------# Setup training, calibration and test data# -----------------------------------------------------------------------------df = pd.read_csv ("prepared_data.csv")# Initial split into train/test datatrain = df.loc[df['split']== 'train']valid = df.loc[df['split']== 'valid']# Proper Validation Set (Split the Validation set into features and target)X_valid = valid.drop(['expression'], axis = 1)y_valid = valid.drop(columns = ['new_host', 'split', 'sequence'])# Create Training Set (Split the Training set into features and target)X_train = valid.drop(['expression'], axis = 1)y_train = valid.drop(columns = ['new_host', 'split', 'sequence'])# Split Training set into further training set and calibration setX_train, X_cal, y_train, y_cal = train_test_split(X_train, y_train, test_size =0.2)# -----------------------------------------------------------------------------# Train and calibrate underlying model# -----------------------------------------------------------------------------underlying_model = RegressorAdapter(DecisionTreeRegressor(min_samples_leaf=5))print("Underlying model loaded")model = RegressorAdapter(underlying_model)nc = RegressorNc(model, AbsErrorErrFunc())print("Nonconformity Function Applied")icp = IcpRegressor(nc)  # Create an inductive conformal Regressorprint("ICP Regressor Created")#Dataset Reviewprint('{} instances, {} features, {} classes'.format(y_train.size,                                                   X_train.shape[1],                                                   np.unique(y_train).size))icp.fit(X_train, y_train)icp.calibrate(X_cal, y_cal)

#示例数据框

new_host  split     sequence    expressionFALSE     train     AQVPYGVS    0.039267878FALSE     train     ASVPYGVSI   0.039267878FALSE     train     STNLYGSGR   0.261456561FALSE     valid     NLYGSGLVR   0.265188519FALSE     valid     SLGPSNLYG   0.419680588FALSE     valid     ATSLGTTNG   0.145710993

我尝试了多种方式来分割数据集,但这个问题仍然存在。在这种情况下,我希望根据观察的数据分割值将数据分割成训练集和测试集。之后,我将在第二步中将训练集进一步分割成训练集和校准集,其中我的特征是X_train,我的目标是y_train

#回溯错误

Traceback (most recent call last)<ipython-input-68-083e5dd0b0b6> in <module>      4 print(type(y_cal))      5 print(y_cal.index)----> 6 icp.calibrate(X_cal, y_cal)      7 print("ICP Calibrated")~/.local/lib/python3.8/site-packages/nonconformist/icp.py in calibrate(self, x, y, increment)    102                 else:    103                         self.categories = np.array([0])--> 104                         cal_scores = self.nc_function.score(self.cal_x, self.cal_y)    105                         self.cal_scores = {0: np.sort(cal_scores)[::-1]}    106 ~/.local/lib/python3.8/site-packages/nonconformist/nc.py in score(self, x, y)    370                         norm = np.ones(n_test)    371 --> 372                 return self.err_func.apply(prediction, y) / norm    373     374 ~/.local/lib/python3.8/site-packages/nonconformist/nc.py in apply(self, prediction, y)    156     157         def apply(self, prediction, y):--> 158                 return np.abs(prediction - y)    159     160         def apply_inverse(self, nc, significance):~/.local/lib/python3.8/site-packages/pandas/core/series.py in __array_ufunc__(self, ufunc, method, *inputs, **kwargs)    633     634         # for binary ops, use our custom dunder methods--> 635         result = ops.maybe_dispatch_ufunc_to_dunder_op(    636             self, ufunc, method, *inputs, **kwargs    637         )pandas/_libs/ops_dispatch.pyx in pandas._libs.ops_dispatch.maybe_dispatch_ufunc_to_dunder_op()~/.local/lib/python3.8/site-packages/pandas/core/ops/common.py in new_method(self, other)     62         other = item_from_zerodim(other)     63 ---> 64         return method(self, other)     65      66     return new_method~/.local/lib/python3.8/site-packages/pandas/core/ops/__init__.py in wrapper(left, right)    503         result = arithmetic_op(lvalues, rvalues, op, str_rep)    504 --> 505         return _construct_result(left, result, index=left.index, name=res_name)    506     507     wrapper.__name__ = op_name~/.local/lib/python3.8/site-packages/pandas/core/ops/__init__.py in _construct_result(left, result, index, name)    476     # We do not pass dtype to ensure that the Series constructor    477     #  does inference in the case where `result` has object-dtype.--> 478     out = left._constructor(result, index=index)    479     out = out.__finalize__(left)    480 ~/.local/lib/python3.8/site-packages/pandas/core/series.py in __init__(self, data, index, dtype, name, copy, fastpath)    303                     data = data.copy()    304             else:--> 305                 data = sanitize_array(data, index, dtype, copy, raise_cast_failure=True)    306     307                 data = SingleBlockManager(data, index, fastpath=True)~/.local/lib/python3.8/site-packages/pandas/core/construction.py in sanitize_array(data, index, dtype, copy, raise_cast_failure)    480     elif subarr.ndim > 1:    481         if isinstance(data, np.ndarray):--> 482             raise Exception("Data must be 1-dimensional")    483         else:    484             subarr = com.asarray_tuplesafe(data, dtype=dtype)Exception: Data must be 1-dimensional

回答:

pandas.DataFrame.drop()返回一个pandas.DataFrame对象,它本质上是二维的。因此,当您分配y_train = valid.drop()时,您仍然有一个二维数组(尽管只包含一列)。另一方面,pandas.Series对象是一维的,您可以通过引用特定列来获得pandas.Series(即valid['expression']将返回一个一维的pandas.Series)。

y_train = valid.drop()改为y_train = valid['expression']应该没问题了。

另外,供您参考,您正在使用valid数据框来处理X_train和y_train(我以为您可能想使用train数据框)

Related Posts

使用LSTM在Python中预测未来值

这段代码可以预测指定股票的当前日期之前的值,但不能预测…

如何在gensim的word2vec模型中查找双词组的相似性

我有一个word2vec模型,假设我使用的是googl…

dask_xgboost.predict 可以工作但无法显示 – 数据必须是一维的

我试图使用 XGBoost 创建模型。 看起来我成功地…

ML Tuning – Cross Validation in Spark

我在https://spark.apache.org/…

如何在React JS中使用fetch从REST API获取预测

我正在开发一个应用程序,其中Flask REST AP…

如何分析ML.NET中多类分类预测得分数组?

我在ML.NET中创建了一个多类分类项目。该项目可以对…

发表回复

您的邮箱地址不会被公开。 必填项已用 * 标注