字典的平均值 – 学技术

使用来自sklearn的iris数据集。我将数据分开，应用感知器，将得分记录在一个字典中，该字典将用于拟合模型的样本大小（键）映射到相应的得分（训练和测试得分作为一个元组）。

由于我运行循环3次，这会生成3个字典。如何找到3次迭代中得分的平均值？我尝试将字典存储在列表中并计算平均值，但没有成功。

例如，如果字典是这样的：

{21: (0.85, 0.82), 52: (0.80, 0.62), 73: (0.82, 0.45), 94: (0.81, 0.78)}{21: (0.95, 0.91), 52: (0.80, 0.89), 73: (0.84, 0.87), 94: (0.79, 0.41)}{21: (0.809, 0.83), 52: (0.841, 0.77), 73: (0.84, 0.44), 94: (0.79, 0.33)}

输出应该为 {21:(0.869,0.853),52.....}，其中键21的值的第一个元素是 (0.85+0.95+0.809)/3，第二个元素是 (0.82+0.91+0.83)/3

import numpy as npimport pandas as pdfrom sklearn.datasets import load_irisfrom sklearn.linear_model import Perceptronfrom sklearn.model_selection import train_test_splitscore_list=shape_list=[]iris = load_iris()props=[0.2,0.5,0.7,0.9]df = pd.DataFrame(data= np.c_[iris['data'], iris['target']], columns= iris['feature_names'] + ['target'])y=df[list(df.loc[:,df.columns.values =='target'])]X=df[list(df.loc[:,df.columns.values !='target'])]# 试验次数for i in range(3):    X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, train_size=0.7)    results = {}    for i in props:        size = int(i*len(X_train))        ix = np.random.choice(X_train.index, size=size, replace = False)        sampleX = X_train.loc[ix]        sampleY = y_train.loc[ix]        #应用模型        modelP = Perceptron(tol=1e-3)        modelP.fit(sampleX, sampleY)        train_score = modelP.score(sampleX,sampleY)        test_score = modelP.score(X_test,y_test)        #存储到字典中        results[size] = (train_score, test_score)    print(results)

另外，如果有人懂统计学，有没有办法计算试验的标准误差，并为每个样本大小（字典的键）打印平均标准误差？

回答：

更新现有循环以将results保存到list中，命名为rl
将rl加载到数据框中，因为您已经在使用pandas
将tuples的列扩展为单独的列
使用.agg获取指标
使用python 3.8和pandas 1.3.1进行测试
- f-strings（例如f'TrS{c}', f'TeS{c}'）需要python >= 3.6

对现有代码的更新

# 选择X和y的列y = df.loc[:, 'target']X = df.loc[:, iris['feature_names']]# 试验次数rl = list()  # 添加：将结果保存到列表中for i in range(3):    ...    results = {}    for i in props:        ...        ...    rl.append(results)  # 添加：追加结果

获取指标的新代码

将metrics转换为tuples的list比转换为tuples的tuple更容易，因为tuple一旦创建就是不可变的。这意味着tuples可以添加到现有的list中，但不能添加到现有的tuple中。
- 因此，使用defaultdict创建tuples的list更容易，然后使用map将每个值转换为tuple。
- k[3:]要求数字始终从index 3开始

from collections import defaultdict# 将rl转换为数据框rl = [{21: (0.5714285714285714, 0.6888888888888889), 52: (0.6153846153846154, 0.7111111111111111), 73: (0.7123287671232876, 0.6222222222222222), 94: (0.7127659574468085, 0.6)}, {21: (0.6190476190476191, 0.6444444444444445), 52: (0.6923076923076923, 0.6444444444444445), 73: (0.3698630136986301, 0.35555555555555557), 94: (0.7978723404255319, 0.7777777777777778)}, {21: (0.8095238095238095, 0.5555555555555556), 52: (0.7307692307692307, 0.5555555555555556), 73: (0.7534246575342466, 0.5777777777777777), 94: (0.6170212765957447, 0.7555555555555555)}]df = pd.DataFrame(rl)# display(df)                                         21                                        52                                         73                                        940  (0.5714285714285714, 0.6888888888888889)  (0.6153846153846154, 0.7111111111111111)   (0.7123287671232876, 0.6222222222222222)                 (0.7127659574468085, 0.6)1  (0.6190476190476191, 0.6444444444444445)  (0.6923076923076923, 0.6444444444444445)  (0.3698630136986301, 0.35555555555555557)  (0.7978723404255319, 0.7777777777777778)2  (0.8095238095238095, 0.5555555555555556)  (0.7307692307692307, 0.5555555555555556)   (0.7534246575342466, 0.5777777777777777)  (0.6170212765957447, 0.7555555555555555)# 扩展元组for c in df.columns:    df[[f'TrS{c}', f'TeS{c}']] = pd.DataFrame(df[c].tolist(), index= df.index)    df.drop(c, axis=1, inplace=True)# 获取均值和标准差metrics = df.agg(['mean', 'std']).round(3)# display(metrics)      TrS21  TeS21  TrS52  TeS52  TrS73  TeS73  TrS94  TeS94mean  0.667  0.630  0.679  0.637  0.612  0.519  0.709  0.711std   0.126  0.068  0.059  0.078  0.211  0.143  0.090  0.097# 转换为字典dd = defaultdict(list)for k, v in metrics.to_dict().items():     dd[int(k[3:])].append(tuple(v.values()))    dd = dict(zip(dd, map(tuple, dd.values())))print(dd)[out]:{21: ((0.667, 0.126), (0.63, 0.068)), 52: ((0.679, 0.059), (0.637, 0.078)), 73: ((0.612, 0.211), (0.519, 0.143)), 94: ((0.709, 0.09), (0.711, 0.097))}

对现有代码的更新

获取指标的新代码

相关文章：

Related Posts

使用LSTM在Python中预测未来值

如何在gensim的word2vec模型中查找双词组的相似性

dask_xgboost.predict 可以工作但无法显示 – 数据必须是一维的

ML Tuning – Cross Validation in Spark

如何在React JS中使用fetch从REST API获取预测

如何分析ML.NET中多类分类预测得分数组？

发表回复 取消回复

发表回复取消回复