这是我的训练数据,我希望使用Keras库通过X_data预测’y’。我已经遇到了很多次错误,我知道这与数据形状有关,但我已经卡住了一段时间。希望你们能帮帮我。
X_data =0 [construction, materials, labour, charges, con...1 [catering, catering, lunch]2 [passenger, transport, local, transport, passe...3 [goods, transport, road, transport, goods, inl...4 [rental, rental, aircrafts]5 [supporting, transport, cargo, handling, agenc...6 [postal, courier, postal, courier, local, deli...7 [electricity, charges, reimbursement, electric...8 [facility, management, facility, management, p...9 [leasing, leasing, aircrafts]10 [professional, technical, business, selling, s...11 [telecommunications, broadcasting, information...12 [support, personnel, search, contract, tempora...13 [maintenance, repair, installation, maintenanc...14 [manufacturing, physical, inputs, owned, other...15 [accommodation, hotel, accommodation, hotel, i...16 [leasing, rental, leasing, renting, motor, veh...17 [real, estate, rental, leasing, involving, pro...18 [rental, transport, vehicles, rental, road, ve...19 [cleaning, sanitary, pad, vending, machine]20 [royalty, transfer, use, ip, intellectual, pro...21 [legal, accounting, legal, accounting, legal, ...22 [veterinary, clinic, health, care, relation, a...23 [human, health, social, care, inpatient, medic...Name: Data, dtype: object
这是我的训练预测器
y = 0 11 12 13 14 15 16 17 18 19 110 111 112 113 114 115 1016 217 1018 219 220 1021 1022 1023 10
我正在使用这个模型:
top_words = 5000length= len(X_data)embedding_vecor_length = 32model = Sequential()model.add(Embedding(embedding_vecor_length, top_words, input_length=length))model.add(LSTM(100))model.add(Dense(1, activation='sigmoid'))model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])print(model.summary())model.fit(X_data, y, epochs=3, batch_size=32)ValueError: Error when checking input: expected embedding_8_input to have shape (None, 24) but got array with shape (24, 1)
使用这些数据在这个模型中有什么问题?我想使用输入X_data来预测’y’?
回答:
你需要将你的pandas数据框转换为numpy数组,这些数组将是不规则的,因此你需要填充它们。你还需要设置一个词向量字典,因为你不能直接将单词传递给神经网络。一些例子在这里,这里,和这里。你需要自己进行研究,根据你提供的数据样本无法做太多事情。
length = len(X_data)
表示你有多少个数据样本,Keras对此不关心,它想知道你有多少个单词作为输入(每个必须相同,这就是为什么前面提到填充)
所以你输入网络的是你有多少列
#假设你正确地将X_data转换为numpy数组和词向量model.add(Embedding(embedding_vecor_length, top_words, input_length=X_data.shape[1]))
你的分类值需要是二进制的。
from keras.utils import to_categoricaly = to_categorical(y)
你的最后一个密集层现在是10,假设你有10个类别,并且正确的激活函数是softmax
,适用于多类问题
model.add(Dense(10, activation='softmax'))
你的损失函数现在必须是categorical_crossentropy
,因为这是多类问题
model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])