我在尝试从一个 TSV 文件中读取数据以用于 Hyperas,但无论我怎么做,似乎都得到同样的错误:
Traceback (most recent call last): File "/path/to/cnn_search.py", line 233, in <module> trials=trials) File "~/miniconda3/lib/python3.6/site-packages/hyperas/optim.py", line 67, in minimize verbose=verbose) File "~/miniconda3/lib/python3.6/site-packages/hyperas/optim.py", line 133, in base_minimizer return_argmin=True), File "~/miniconda3/lib/python3.6/site-packages/hyperopt/fmin.py", line 312, in fmin return_argmin=return_argmin, File "~/miniconda3/lib/python3.6/site-packages/hyperopt/base.py", line 635, in fmin return_argmin=return_argmin) File "~/miniconda3/lib/python3.6/site-packages/hyperopt/fmin.py", line 325, in fmin rval.exhaust() File "~/miniconda3/lib/python3.6/site-packages/hyperopt/fmin.py", line 204, in exhaust self.run(self.max_evals - n_done, block_until_done=self.async) File "~/miniconda3/lib/python3.6/site-packages/hyperopt/fmin.py", line 178, in run self.serial_evaluate() File "~/miniconda3/lib/python3.6/site-packages/hyperopt/fmin.py", line 97, in serial_evaluate result = self.domain.evaluate(spec, ctrl) File "~/miniconda3/lib/python3.6/site-packages/hyperopt/base.py", line 840, in evaluate rval = self.fn(pyll_rval) File "~/temp_model.py", line 218, in keras_fmin_fnctAttributeError: 'list' object has no attribute 'shape'
根据我看到的其他问题,这个错误是由于使用了普通数组而不是 NumPy 数组引起的。因此,我尝试在每一步都将我读取的 TSV 转换为 NumPy 数组:
from hyperas import optim...import numpy as npimport csvdef data(): dataPath="/path/to/fm.labeled.10m.txt" X = [] Y = [] with open(dataPath) as dP: reader = csv.reader(dP, delimiter="\t") for row in reader: #跳过前两列,最后一列是标签 X.append(np.array(row[2:-1])) #标签 Y.append(row[-1]) encoder = LabelBinarizer() Y_categorical = encoder.fit_transform(Y) #将数据分成测试和训练集 X_train, X_test, Y_train, Y_test = train_test_split(X, Y_categorical, test_size=0.25) X_train_np = np.array(X_train) X_test_np = np.array(X_test) Y_train_np = np.array([np.array(y) for y in Y_train]) Y_test_np = np.array([np.array(y) for y in Y_test]) return X_train_np, Y_train_np, X_test_np, Y_test_np...trials = Trials()best_run, best_model = optim.minimize(model=model_name, data=data, algo=tpe.suggest, max_evals=numRuns, trials=trials)
我还认为,有一种更有效的方法来做这件事,不需要创建这么多中间数组——这将是很好的,因为我将要读取数百万行的数据。
我做错了什么?
编辑:Hyperopt 维基描述了 Trials
。
回答:
你有没有考虑使用 np.genfromtxt('your_file.tsv')
?它在读取 csv 和 tsv 数据方面效果很好,我最近有很好的体验。另外,如果你需要更详细的回答,你应该提供更多关于你具体问题的信息(数据类型、布局等)。