如何使用Keras库训练NLP分类？

这是我的训练数据，我希望使用Keras库通过X_data预测’y’。我已经遇到了很多次错误，我知道这与数据形状有关，但我已经卡住了一段时间。希望你们能帮帮我。

X_data =0     [construction, materials, labour, charges, con...1     [catering, catering, lunch]2     [passenger, transport, local, transport, passe...3     [goods, transport, road, transport, goods, inl...4     [rental, rental, aircrafts]5     [supporting, transport, cargo, handling, agenc...6     [postal, courier, postal, courier, local, deli...7     [electricity, charges, reimbursement, electric...8     [facility, management, facility, management, p...9     [leasing, leasing, aircrafts]10    [professional, technical, business, selling, s...11    [telecommunications, broadcasting, information...12    [support, personnel, search, contract, tempora...13    [maintenance, repair, installation, maintenanc...14    [manufacturing, physical, inputs, owned, other...15    [accommodation, hotel, accommodation, hotel, i...16    [leasing, rental, leasing, renting, motor, veh...17    [real, estate, rental, leasing, involving, pro...18    [rental, transport, vehicles, rental, road, ve...19    [cleaning, sanitary, pad, vending, machine]20    [royalty, transfer, use, ip, intellectual, pro...21    [legal, accounting, legal, accounting, legal, ...22    [veterinary, clinic, health, care, relation, a...23    [human, health, social, care, inpatient, medic...Name: Data, dtype: object

这是我的训练预测器

y = 0      11      12      13      14      15      16      17      18      19      110     111     112     113     114     115    1016     217    1018     219     220    1021    1022    1023    10

我正在使用这个模型：

top_words = 5000length= len(X_data)embedding_vecor_length = 32model = Sequential()model.add(Embedding(embedding_vecor_length, top_words, input_length=length))model.add(LSTM(100))model.add(Dense(1, activation='sigmoid'))model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])print(model.summary())model.fit(X_data, y, epochs=3, batch_size=32)ValueError: Error when checking input: expected embedding_8_input to have shape (None, 24) but got array with shape (24, 1)

使用这些数据在这个模型中有什么问题？我想使用输入X_data来预测’y’？

回答：

你需要将你的pandas数据框转换为numpy数组，这些数组将是不规则的，因此你需要填充它们。你还需要设置一个词向量字典，因为你不能直接将单词传递给神经网络。一些例子在这里，这里，和这里。你需要自己进行研究，根据你提供的数据样本无法做太多事情。

length = len(X_data) 表示你有多少个数据样本，Keras对此不关心，它想知道你有多少个单词作为输入（每个必须相同，这就是为什么前面提到填充）

所以你输入网络的是你有多少列

#假设你正确地将X_data转换为numpy数组和词向量model.add(Embedding(embedding_vecor_length, top_words, input_length=X_data.shape[1]))

你的分类值需要是二进制的。

from keras.utils import to_categoricaly = to_categorical(y)

你的最后一个密集层现在是10，假设你有10个类别，并且正确的激活函数是softmax，适用于多类问题

model.add(Dense(10, activation='softmax'))

你的损失函数现在必须是categorical_crossentropy，因为这是多类问题

model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])

学技术

如何使用Keras库训练NLP分类？

发表回复取消回复

相关文章：

Related Posts

使用LSTM在Python中预测未来值

如何在gensim的word2vec模型中查找双词组的相似性

dask_xgboost.predict 可以工作但无法显示 – 数据必须是一维的

ML Tuning – Cross Validation in Spark

如何在React JS中使用fetch从REST API获取预测

如何分析ML.NET中多类分类预测得分数组？

发表回复 取消回复

发表回复取消回复