def unpack_dict(matrix, map_index_to_word): table = sorted(map_index_to_word, key=map_index_to_word.get) data = matrix.data indices = matrix.indices indptr = matrix.indptr num_doc = matrix.shape[0] return [{k:v for k,v in zip([table[word_id] for word_id in indices[indptr[i]:indptr[i+1]] ], data[indptr[i]:indptr[i+1]].tolist())} \ for i in range(num_doc) ]wiki['tf_idf'] = unpack_dict(tf_idf, map_index_to_word)
map_index_to_word 是一个包含数千个单词的单词:索引字典。tf_idf 是一个 TFIDF 稀疏向量。wiki 数据框的显示截图如下所示。
回答:
[{k: v for k, v in zip([table[word_id] for word_id in indices[indptr[i]:indptr[i + 1]]],data[indptr[i]:indptr[i + 1]].tolist())} for i in range(num_doc)]
与以下代码等效:
final_list = []for i in range(num_doc): new_list = [] for word_id in indices[indptr[i]:indptr[i + 1]]: new_list.append(table[word_id]) new_dict = {} for k, v in zip(new_list, data[indptr[i]:indptr[i + 1]].tolist()): new_dict[k] = v final_list.append(new_dict)