我正在尝试将数据分成训练和验证数据集,为此我使用了来自 cuml.preprocessing.model_selection
模块的 train_test_split
函数。
但我遇到了一个错误:
---------------------------------------------------------------------------AttributeError Traceback (most recent call last)<ipython-input-317-4e11838456ea> in <module>----> 1 X_train, X_test, y_train, y_test = train_test_split(train_dfIF,train_y, test_size=0.20, random_state=42)/opt/conda/lib/python3.7/site-packages/cuml/preprocessing/model_selection.py in train_test_split(X, y, test_size, train_size, shuffle, random_state, seed, stratify) 454 X_train = X.iloc[0:train_size] 455 if y is not None:--> 456 y_train = y.iloc[0:train_size] 457 458 if hasattr(X, "__cuda_array_interface__") or \AttributeError: 'cupy.core.core.ndarray' object has no attribute 'iloc'
尽管我并没有使用 iloc。
这是我的代码:
from cuml.preprocessing.model_selection import train_test_splitX_train, X_test, y_train, y_test = train_test_split(train_dfIF,train_y, test_size=0.20, random_state=42)
这里 train_dfIF
是一个 cudf DataFrame,而 train_y
是一个 cupy 数组。
回答:
你不能(目前)将一个数组传递给 y
参数,如果你的 X
参数是一个数据框。我建议传递两个数据框或者两个数组,而不是一个数据框和一个数组。
from cuml.preprocessing.model_selection import train_test_splitimport cudfimport cupy as cpdf = cudf.DataFrame({ "a":range(5), "b":range(5)})y = cudf.Series(range(5))# train_test_split(df, y.values, test_size=0.20, random_state=42) # failX_train, X_test, y_train, y_test = train_test_split(df, y, test_size=0.20, random_state=42) # succeedX_train, X_test, y_train, y_test = train_test_split(df.values, y.values, test_size=0.20, random_state=42) # succeed