如何为包含类别值列表的列创建嵌入

我在决定如何为我的深度神经网络模型的类别特征创建嵌入时遇到了一些麻烦。这个特征由一组非固定标签组成。

这个特征看起来像这样：

column = [['Adventure','Animation','Comedy'],          ['Adventure','Comedy'],          ['Adventure','Children','Comedy']

我想使用 tensorflow 来实现，我知道 tf.feature_column 模块应该能用，我只是不知道该使用哪个版本。

谢谢！

回答：

首先，您需要将特征填充到相同的长度。

import itertoolsimport numpy as npcolumn = np.array(list(itertools.zip_longest(*column, fillvalue='UNK'))).Tprint(column)[['Adventure' 'Animation' 'Comedy'] ['Adventure' 'Comedy' 'UNK'] ['Adventure' 'Children' 'Comedy']]

然后，您可以使用 tf.feature_column.embedding_column 为类别特征创建嵌入。embedding_column 的输入必须是通过任何 categorical_column_* 函数创建的 CategoricalColumn。

# 如果您在文件中有大的词汇列表，您可以使用 tf.feature_column.categorical_column_with_vocabulary_filecat_fc = tf.feature_column.categorical_column_with_vocabulary_list(    'cat_data', # 识别输入特征    ['Adventure', 'Animation', 'Comedy', 'Children'], # 词汇列表    dtype=tf.string,    default_value=-1)cat_column = tf.feature_column.embedding_column(    categorical_column =cat_fc,    dimension = 5,    combiner='mean')

categorical_column_with_vocabulary_list 将忽略 'UNK'，因为词汇列表中没有 'UNK'。dimension 指定嵌入的维度，combiner 指定在单行中有多个条目时如何减少，默认情况下 embedding_column 使用 ‘mean’。

结果如下：

tensor = tf.feature_column.input_layer({'cat_data':column}, [cat_column])with tf.Session() as session:    session.run(tf.global_variables_initializer())    session.run(tf.tables_initializer())    print(session.run(tensor))[[-0.694761   -0.0711766   0.05720187  0.01770079 -0.09884425] [-0.8362482   0.11640486 -0.01767573 -0.00548441 -0.05738768] [-0.71162754 -0.03012567  0.15568805  0.00752804 -0.1422816 ]]

学技术

如何为包含类别值列表的列创建嵌入

发表回复取消回复

相关文章：

Related Posts

使用LSTM在Python中预测未来值

如何在gensim的word2vec模型中查找双词组的相似性

dask_xgboost.predict 可以工作但无法显示 – 数据必须是一维的

ML Tuning – Cross Validation in Spark

如何在React JS中使用fetch从REST API获取预测

如何分析ML.NET中多类分类预测得分数组？

发表回复 取消回复

发表回复取消回复