Pandas: 从DataFrame列中创建字典的字典的最有效方法

import pandas as pdimport numpy as npimport randomlabels = ["c1","c2","c3"]c1 = ["one","one","one","two","two","three","three","three","three"]c2 = [random.random() for i in range(len(c1))]c3 = ["alpha","beta","gamma","alpha","gamma","alpha","beta","gamma","zeta"]DF = pd.DataFrame(np.array([c1,c2,c3])).TDF.columns = labels

DataFrame看起来像这样:

      c1               c2     c30    one   0.440958516531  alpha1    one   0.476439953723   beta2    one   0.254235673552  gamma3    two   0.882724336464  alpha4    two    0.79817899139  gamma5  three   0.677464637887  alpha6  three   0.292927670096   beta7  three  0.0971956881825  gamma8  three   0.993934915508   zeta

我能想到的创建字典的唯一方法是:

D_greek_value = {}for greek in set(DF["c3"]):    D_c1_c2 = {}    for i in range(DF.shape[0]):        row = DF.iloc[i,:]        if row[2] == greek:            D_c1_c2[row[0]] = row[1]    D_greek_value[greek] = D_c1_c2D_greek_value

生成的字典看起来像这样:

{'alpha': {'one': '0.67919712421',  'three': '0.67171020684',  'two': '0.571150669821'}, 'beta': {'one': '0.895090207979', 'three': '0.489490074662'}, 'gamma': {'one': '0.964777504708',  'three': '0.134397632659',  'two': '0.10302290374'}, 'zeta': {'three': '0.0204226923557'}}

我不想假设c1会成块出现（每次“one”都在一起）。我正在处理一个几百MB的csv文件，我觉得我的方法完全不对。如果你有任何想法，请帮助我！

回答：

如果我理解正确的话，你可以利用groupby来处理大部分工作:

>>> result = df.groupby("c3")[["c1","c2"]].apply(lambda x: dict(x.values)).to_dict()>>> pprint.pprint(result){'alpha': {'one': 0.440958516531,           'three': 0.677464637887,           'two': 0.8827243364640001}, 'beta': {'one': 0.47643995372299996, 'three': 0.29292767009599996}, 'gamma': {'one': 0.254235673552,           'three': 0.0971956881825,           'two': 0.79817899139}, 'zeta': {'three': 0.993934915508}}

一些解释：首先我们按c3分组，并选择c1和c2列。这给了我们想要转换成字典的组:

>>> grouped = df.groupby("c3")[["c1", "c2"]]>>> grouped.apply(lambda x: print(x,"\n","--")) # 仅用于显示目的      c1                   c20    one    0.6799261786873873    two  0.114950909344131665  three   0.7458197179794177  --      c1                   c20    one    0.6799261786873873    two  0.114950909344131665  three   0.7458197179794177  --      c1                   c21    one  0.129432667572779166  three  0.28944292691097817  --      c1                   c22    one  0.366428348093412744    two   0.56909442245146247  three   0.7018221838129789  --      c1                  c28  three  0.7195852795555373  --

给定这些子框架中的任何一个，比如倒数第二个，我们需要找到一种方法将其转换为字典。例如:

>>> d3      c1        c22    one  0.3664284    two  0.5690947  three  0.701822

如果我们尝试使用dict或to_dict，我们不会得到我们想要的结果，因为索引和列标签会干扰:

>>> dict(d3){'c1': 2      one4      two7    threeName: c1, dtype: object, 'c2': 2    0.3664284    0.5690947    0.701822Name: c2, dtype: float64}>>> d3.to_dict(){'c1': {2: 'one', 4: 'two', 7: 'three'}, 'c2': {2: 0.36642834809341279, 4: 0.56909442245146236, 7: 0.70182218381297889}}

但是我们可以通过使用.values下降到基础数据来忽略这一点，然后可以将其传递给dict:

>>> d3.valuesarray([['one', 0.3664283480934128],       ['two', 0.5690944224514624],       ['three', 0.7018221838129789]], dtype=object)>>> dict(d3.values){'three': 0.7018221838129789, 'one': 0.3664283480934128, 'two': 0.5690944224514624}

所以如果我们应用这个方法，我们会得到一个Series，其索引为我们想要的c3键，值为字典，我们可以使用.to_dict()将其转换为字典:

>>> result = df.groupby("c3")[["c1", "c2"]].apply(lambda x: dict(x.values))>>> resultc3alpha    {'three': '0.7458197179794177', 'one': '0.6799...beta     {'one': '0.12943266757277916', 'three': '0.289...gamma    {'three': '0.7018221838129789', 'one': '0.3664...zeta                       {'three': '0.7195852795555373'}dtype: object>>> result.to_dict(){'zeta': {'three': '0.7195852795555373'}, 'gamma': {'three': '0.7018221838129789', 'one': '0.36642834809341274', 'two': '0.5690944224514624'}, 'beta': {'one': '0.12943266757277916', 'three': '0.28944292691097817'}, 'alpha': {'three': '0.7458197179794177', 'one': '0.679926178687387', 'two': '0.11495090934413166'}}

学技术

Pandas: 从DataFrame列中创建字典的字典的最有效方法

发表回复取消回复

相关文章：

Related Posts

使用LSTM在Python中预测未来值

如何在gensim的word2vec模型中查找双词组的相似性

dask_xgboost.predict 可以工作但无法显示 – 数据必须是一维的

ML Tuning – Cross Validation in Spark

如何在React JS中使用fetch从REST API获取预测

如何分析ML.NET中多类分类预测得分数组？

发表回复 取消回复

发表回复取消回复