我正在构建一个多标签文本分类算法。下面是我labels.txt文件的一个片段,我希望将这些记录转换成一个字典,字典的键是id,对应的值是包含类别的元组或列表,即{id:(cat1,cat2)}。这些记录之间没有换行符。我不知道如何将这种数据转换成字典。
B0027DQHA0 Movies & TV, TV Music, Classical0756400120 Books, Literature & Fiction, Anthologies & Literary Collections, General Books, Literature & Fiction, United States Books, Science Fiction & Fantasy, Science Fiction, Anthologies Books, Science Fiction & Fantasy, Science Fiction, Short StoriesB0000012D5 Music, Blues Music, Pop Music, R&B
回答:
如果类别名称总是以空格缩进,而ID不是,你可以利用这一点来区分它们,并在循环中将类别名称追加到以ID为索引的字典中的列表中:
r = '''B0027DQHA0 Movies & TV, TV Music, Classical0756400120 Books, Literature & Fiction, Anthologies & Literary Collections, General Books, Literature & Fiction, United States Books, Science Fiction & Fantasy, Science Fiction, Anthologies Books, Science Fiction & Fantasy, Science Fiction, Short StoriesB0000012D5 Music, Blues Music, Pop Music, R&B'''d = {}for l in r.splitlines(): if l.startswith(' '): d.setdefault(i, []).append(l.lstrip()) else: i = lprint(d)
输出结果如下:
{'B0027DQHA0': ['Movies & TV, TV', 'Music, Classical'], '0756400120': ['Books, Literature & Fiction, Anthologies & Literary Collections, General', 'Books, Literature & Fiction, United States', 'Books, Science Fiction & Fantasy, Science Fiction, Anthologies', 'Books, Science Fiction & Fantasy, Science Fiction, Short Stories'], 'B0000012D5': ['Music, Blues', 'Music, Pop', 'Music, R&B']}