我正在尝试使用我的数据来确定模型的符合预测,但我在icp.calibrate处遇到了以下错误:
Exception: Data must be 1-dimensional
下面您可以找到关于此问题的最近的回溯错误。不幸的是,根据上面的代码,我不确定这实际上意味着什么。我正在使用pandas数据框来处理这个问题。
代码:
from sklearn.tree import DecisionTreeRegressorfrom nonconformist.cp import IcpRegressorfrom nonconformist.base import RegressorAdapterfrom nonconformist.nc import RegressorNc, AbsErrorErrFunc, RegressorNormalizer, NcFactoryfrom sklearn.model_selection import train_test_splitimport numpy as npimport pandas as pd# -----------------------------------------------------------------------------# Setup training, calibration and test data# -----------------------------------------------------------------------------df = pd.read_csv ("prepared_data.csv")# Initial split into train/test datatrain = df.loc[df['split']== 'train']valid = df.loc[df['split']== 'valid']# Proper Validation Set (Split the Validation set into features and target)X_valid = valid.drop(['expression'], axis = 1)y_valid = valid.drop(columns = ['new_host', 'split', 'sequence'])# Create Training Set (Split the Training set into features and target)X_train = valid.drop(['expression'], axis = 1)y_train = valid.drop(columns = ['new_host', 'split', 'sequence'])# Split Training set into further training set and calibration setX_train, X_cal, y_train, y_cal = train_test_split(X_train, y_train, test_size =0.2)# -----------------------------------------------------------------------------# Train and calibrate underlying model# -----------------------------------------------------------------------------underlying_model = RegressorAdapter(DecisionTreeRegressor(min_samples_leaf=5))print("Underlying model loaded")model = RegressorAdapter(underlying_model)nc = RegressorNc(model, AbsErrorErrFunc())print("Nonconformity Function Applied")icp = IcpRegressor(nc) # Create an inductive conformal Regressorprint("ICP Regressor Created")#Dataset Reviewprint('{} instances, {} features, {} classes'.format(y_train.size, X_train.shape[1], np.unique(y_train).size))icp.fit(X_train, y_train)icp.calibrate(X_cal, y_cal)
#示例数据框
new_host split sequence expressionFALSE train AQVPYGVS 0.039267878FALSE train ASVPYGVSI 0.039267878FALSE train STNLYGSGR 0.261456561FALSE valid NLYGSGLVR 0.265188519FALSE valid SLGPSNLYG 0.419680588FALSE valid ATSLGTTNG 0.145710993
我尝试了多种方式来分割数据集,但这个问题仍然存在。在这种情况下,我希望根据观察的数据分割值将数据分割成训练集和测试集。之后,我将在第二步中将训练集进一步分割成训练集和校准集,其中我的特征是X_train,我的目标是y_train
#回溯错误
Traceback (most recent call last)<ipython-input-68-083e5dd0b0b6> in <module> 4 print(type(y_cal)) 5 print(y_cal.index)----> 6 icp.calibrate(X_cal, y_cal) 7 print("ICP Calibrated")~/.local/lib/python3.8/site-packages/nonconformist/icp.py in calibrate(self, x, y, increment) 102 else: 103 self.categories = np.array([0])--> 104 cal_scores = self.nc_function.score(self.cal_x, self.cal_y) 105 self.cal_scores = {0: np.sort(cal_scores)[::-1]} 106 ~/.local/lib/python3.8/site-packages/nonconformist/nc.py in score(self, x, y) 370 norm = np.ones(n_test) 371 --> 372 return self.err_func.apply(prediction, y) / norm 373 374 ~/.local/lib/python3.8/site-packages/nonconformist/nc.py in apply(self, prediction, y) 156 157 def apply(self, prediction, y):--> 158 return np.abs(prediction - y) 159 160 def apply_inverse(self, nc, significance):~/.local/lib/python3.8/site-packages/pandas/core/series.py in __array_ufunc__(self, ufunc, method, *inputs, **kwargs) 633 634 # for binary ops, use our custom dunder methods--> 635 result = ops.maybe_dispatch_ufunc_to_dunder_op( 636 self, ufunc, method, *inputs, **kwargs 637 )pandas/_libs/ops_dispatch.pyx in pandas._libs.ops_dispatch.maybe_dispatch_ufunc_to_dunder_op()~/.local/lib/python3.8/site-packages/pandas/core/ops/common.py in new_method(self, other) 62 other = item_from_zerodim(other) 63 ---> 64 return method(self, other) 65 66 return new_method~/.local/lib/python3.8/site-packages/pandas/core/ops/__init__.py in wrapper(left, right) 503 result = arithmetic_op(lvalues, rvalues, op, str_rep) 504 --> 505 return _construct_result(left, result, index=left.index, name=res_name) 506 507 wrapper.__name__ = op_name~/.local/lib/python3.8/site-packages/pandas/core/ops/__init__.py in _construct_result(left, result, index, name) 476 # We do not pass dtype to ensure that the Series constructor 477 # does inference in the case where `result` has object-dtype.--> 478 out = left._constructor(result, index=index) 479 out = out.__finalize__(left) 480 ~/.local/lib/python3.8/site-packages/pandas/core/series.py in __init__(self, data, index, dtype, name, copy, fastpath) 303 data = data.copy() 304 else:--> 305 data = sanitize_array(data, index, dtype, copy, raise_cast_failure=True) 306 307 data = SingleBlockManager(data, index, fastpath=True)~/.local/lib/python3.8/site-packages/pandas/core/construction.py in sanitize_array(data, index, dtype, copy, raise_cast_failure) 480 elif subarr.ndim > 1: 481 if isinstance(data, np.ndarray):--> 482 raise Exception("Data must be 1-dimensional") 483 else: 484 subarr = com.asarray_tuplesafe(data, dtype=dtype)Exception: Data must be 1-dimensional
回答:
pandas.DataFrame.drop()返回一个pandas.DataFrame对象,它本质上是二维的。因此,当您分配y_train = valid.drop()
时,您仍然有一个二维数组(尽管只包含一列)。另一方面,pandas.Series对象是一维的,您可以通过引用特定列来获得pandas.Series(即valid['expression']
将返回一个一维的pandas.Series)。
将y_train = valid.drop()
改为y_train = valid['expression']
应该没问题了。
另外,供您参考,您正在使用valid数据框来处理X_train和y_train(我以为您可能想使用train数据框)