您好, 欢迎来到 !    登录 | 注册 | | 设为首页 | 收藏本站

如何在滚动运算符中访问多列?

5b51 2022/1/14 8:21:46 python 字数 3113 阅读 536 来源 www.jb51.cc/python

我想在pandas中做一些滚动窗口计算,需要同时处理两列.我将用一个简单的例子来清楚地表达问题:import pandas as pd df = pd.DataFrame({ 'x': [1, 2, 3, 2, 1, 5, 4, 6, 7, 9], 'y': [4, 3, 4, 6, 5, 9, 1, 3, 1, 2] }) window

概述

我想在pandas中做一些滚动窗口计算,需要同时处理两列.我将用一个简单的例子来清楚地表达问题:

import pandas as pd

df = pd.DataFrame({
    'x': [1,2,3,1,5,4,6,7,9],'y': [4,9,2]
})

windowSize = 4
result = []

for i in range(1,len(df)+1):
    if i < windowSize:
        result.append(None)
    else:
        x = df.x.iloc[i-windowSize:i]
        y = df.y.iloc[i-windowSize:i]
        m = y.mean()
        r = sum(x[y > m]) / sum(x[y <= m])
        result.append(r)

print(result)

有没有办法在没有for pringas循环来解决问题?任何帮助表示赞赏

windowSize = 4
a = df.values
X = strided_app(a[:,0],windowSize,1)
Y = strided_app(a[:,1],1)
M = Y.mean(1)
mask = Y>M[:,None]
sums = np.einsum('ij,ij->i',X,mask)
rest_sums = X.sum(1) - sums
out = sums/rest_sums

strided_app取自here.

运行时测试 –

方法

# @kazemakase's solution
def rolling_window_sum(df,windowSize=4):
    rw = rolling_window(df.values.T,windowSize)
    m = np.mean(rw[1],axis=-1,keepdims=True)
    a = np.sum(rw[0] * (rw[1] > m),axis=-1)
    b = np.sum(rw[0] * (rw[1] <= m),axis=-1)
    result = a / b
    return result    

# Proposed in this post    
def strided_einsum(df,windowSize=4):
    a = df.values
    X = strided_app(a[:,1)
    Y = strided_app(a[:,1)
    M = Y.mean(1)
    mask = Y>M[:,None]
    sums = np.einsum('ij,mask)
    rest_sums = X.sum(1) - sums
    out = sums/rest_sums
    return out

计时 –

In [46]: df = pd.DataFrame(np.random.randint(0,(1000000,2)))

In [47]: %timeit rolling_window_sum(df)
10 loops,best of 3: 90.4 ms per loop

In [48]: %timeit strided_einsum(df)
10 loops,best of 3: 62.2 ms per loop

为了获得更多性能,我们可以计算Y.mean(1)部分,它基本上是Scipy's 1D uniform filter的窗口求和.因此,M可以替代地计算为windowSize = 4 –

from scipy.ndimage.filters import uniform_filter1d as unif1d

M = unif1d(a[:,1].astype(float),windowSize)[2:-1]

性能提升显着 –

In [65]: %timeit strided_einsum(df)
10 loops,best of 3: 61.5 ms per loop

In [66]: %timeit strided_einsum_unif_filter(df)
10 loops,best of 3: 49.4 ms per loop

总结

以上是编程之家为你收集整理的如何在滚动运算符中访问多列?全部内容,希望文章能够帮你解决如何在滚动运算符中访问多列?所遇到的程序开发问题。


如果您也喜欢它,动动您的小指点个赞吧

除非注明,文章均由 laddyq.com 整理发布,欢迎转载。

转载请注明:
链接:http://laddyq.com
来源:laddyq.com
著作权归作者所有。商业转载请联系作者获得授权,非商业转载请注明出处。


联系我
置顶