在Keras/Tensorflow中生成ngram（二元或三元）

我想从一系列标记中生成n-grams：

bigram:: "1 3 4 5" --> { (1,3), (3,4), (4,5) }

经过搜索，我找到了这个讨论串，它使用了以下代码：

def find_ngrams(input_list, n):  return zip(*[input_list[i:] for i in range(n)])

如果我在训练过程中使用这段代码，我认为它会严重影响性能。所以我在寻找更好的选项。

回答：

如果你需要以字符串格式生成bigram：

import tensorflow as tftf.enable_eager_execution()sentence = ['this is example sentence']x = tf.string_split(sentence).values[:-1] + ' ' + tf.string_split(sentence).values[1:]# tf.Tensor([b'this is' b'is example' b'example sentence'], shape=(3,), dtype=string)

你也可以使用tensorflow-transform来生成ngrams。

import tensorflow_transform as tfttft.ngrams(tensor, (1,2), " ")

注意：截至2019年1月22日，tensorflow-transform仅支持Python 2。

学技术

在Keras/Tensorflow中生成ngram（二元或三元）

发表回复取消回复

相关文章：

Related Posts

使用LSTM在Python中预测未来值

如何在gensim的word2vec模型中查找双词组的相似性

dask_xgboost.predict 可以工作但无法显示 – 数据必须是一维的

ML Tuning – Cross Validation in Spark

如何在React JS中使用fetch从REST API获取预测

如何分析ML.NET中多类分类预测得分数组？

发表回复 取消回复

发表回复取消回复