我在机器学习领域还是个新手,正在尝试对数据集X_Train
和Y_Train
进行随机化,以便稍后从中构建小批量数据。X_Train
的随机化运行正常,但Y_Train
的随机化总是给我返回错误信息:
can only concatenate list (not “int”) to list
import numpy as npimport mathdef create_datasets(): dataset = np.genfromtxt('winequality-white.csv', delimiter=';') dataset = dataset[1:,:] X_Test=dataset[:64,:-1] X_Train=dataset[64:,:-1] m = X_Train.shape[0] Y_Test=dataset[:64,-1:] Y_Train=dataset[64:,-1:].reshape(m,1) return X_Train, Y_Train, X_Test, Y_Test, mdef shuffling(X_Train,Y_Train,m,minibatch_size): permutation=list(np.random.permutation(m)) shuffled_X=X_Train[permutation,:].T shuffled_Y=Y_Train[permutation,:] n_comp_minibatches=math.floor(m/minibatch_size) #Total n. of minibatches with 64 elements minibatches=[]
有谁能告诉我哪里出了问题吗?
回答:
你的代码在我看来是没问题的。以下是一个完整版本:
def shuffling(X_Train, Y_Train, m, minibatch_size): permutation = list(np.random.permutation(m)) shuffled_X = X_Train[permutation, :] shuffled_Y = Y_Train[permutation, :] n_comp_minibatches = math.floor(m / minibatch_size) minibatches = [(shuffled_X[i*minibatch_size:(i+1)*minibatch_size], shuffled_Y[i*minibatch_size:(i+1)*minibatch_size]) for i in range(n_comp_minibatches)] return minibatches
如果你也想包含最后一个不完整的小批量数据,只需在列表推导式中使用n_comp_minibatches + 1
。此外,通常处理[rows, features]
格式的数据比[features, data]
更方便,这就是我跳过了转置操作的原因,但这取决于你。