Pandas: 在每行应用滚动函数计算新列值

我试图通过检查列X的前9行和当前行的值来计算每行上的新列Y。基本上，每行上的新列Y的值将告诉我们包括当前记录在内的前10个记录中，列X的值大于1的百分比。以下是我使用的代码，但得到的结果与预期不同

[已编辑]

def count_pcnt(x): return ((np.sum(x > 1) / len(x)) * 100.0)def run():df = pd.DataFrame(data={'X': ['8.12', '7.13', '-5.30', '3.21', '4.21', '3.14','8.65',             '7.33', '-5.10', '3.01']      })df['Y'] = df['X'].rolling(window=10, min_periods=1).apply(lambda x:           count_pcnt(x)).apply(int)

预期结果 [ 已编辑 ]

     X    Y(%)0   8.12  1001   7.13  1002  -5.30  66.673   3.21  754   4.21  805   3.14  83.336   8.65  85.717   7.33  87.508  -5.10  77.779   3.01  80

实际结果

      X    Y 0   8.12  100 1   7.13  100 2  -5.30  0 3   3.21  0 4   4.21  0 5   3.14  0 6   8.65  0 7   7.33  0 8  -5.10  0 9   3.01  0

更新我使用了下面推荐的选项，它奏效了。虽然有其他选项，但我觉得这个更简洁

df['Y'] = df['X'].astype(float)             .rolling(window=w, min_periods=1)             .apply(lambda x: (x>1).mean()) * 100

如果您想基于接下来的10行而不是前10行来计算列值 – 以下是解决方案（感谢jezrael提供的解决方案）

df['Y'] = (df['X'].astype(float).iloc[::-1].rolling(window=10, min_periods=1).apply(lambda x: (x>1).mean()) * 100)[::-1]

回答：

您可以使用以下方法：

首先通过astype将列X转换为float
在Series.rolling中添加参数min_periods
使用带有(x>1).mean()的lambda函数替代自定义函数，输出相同

df = pd.DataFrame(data={'X': ['8.12', '7.13', '-5.30', '3.21', '4.21', '3.14','8.65',             '7.33', '-5.10', '3.01']      })w = 10df['Y'] = df['X'].astype(float)                 .rolling(window=w, min_periods=1)                 .apply(lambda x: (x>1).mean()) * 100print(df)      X           Y0  8.12  100.0000001  7.13  100.0000002 -5.30   66.6666673  3.21   75.0000004  4.21   80.0000005  3.14   83.3333336  8.65   85.7142867  7.33   87.5000008 -5.10   77.7777789  3.01   80.000000

使用自定义函数的解决方案：

def count_pcnt(x):    return ((np.sum(x>1))/ len(x))*100.0w = 10df['Y'] = df['X'].astype(float).rolling(window=w, min_periods=1).apply(count_pcnt)print(df)       X           Y0   8.12  100.0000001   7.13  100.0000002  -5.30   66.6666673   3.21   75.0000004   4.21   80.0000005   3.14   83.3333336   8.65   85.7142867   7.33   87.5000008  -5.10   77.7777789   3.01   80.000000

编辑：

函数可以更改为：

def count_pcnt(x):    return ((x>1).sum() / len(x))*100.0

或者：

def count_pcnt(x):    return (x>1).mean()*100.0

学技术

Pandas: 在每行应用滚动函数计算新列值

发表回复取消回复

相关文章：

Related Posts

使用LSTM在Python中预测未来值

如何在gensim的word2vec模型中查找双词组的相似性

dask_xgboost.predict 可以工作但无法显示 – 数据必须是一维的

ML Tuning – Cross Validation in Spark

如何在React JS中使用fetch从REST API获取预测

如何分析ML.NET中多类分类预测得分数组？

发表回复 取消回复

发表回复取消回复