您好, 欢迎来到 !    登录 | 注册 | | 设为首页 | 收藏本站

为什么pandas在这里应用lambda比循环慢?

为什么pandas在这里应用lambda比循环慢?

apply后台使用循环,因此,如果需要更好的性能,最好的和最快的方法是最好的选择。

没有循环,只有链2条件向量化解决方案:

m1 = all_actions['Lower'] <= all_actions['Mid']
m2 = all_actions['Mid'] <= all_actions['Upper']
qualified_actions = m1 & m2

感谢on Clements提供的另一种解决方案:

all_actions.Mid.between(all_actions.Lower, all_actions.Upper)

np.random.seed(2017)
N = 45000
all_actions=pd.DataFrame(np.random.randint(50, size=(N,3)),columns=['Lower','Mid','Upper'])

#print (all_actions)
In [85]: %%timeit
    ...: qualified_actions = []
    ...: for row in all_actions.index:
    ...:     if all_actions.ix[row,'Lower'] <= all_actions.ix[row, 'Mid'] <= all_actions.ix[row,'Upper']:
    ...:         qualified_actions.append(True)
    ...:     else:
    ...:         qualified_actions.append(False)
    ...: 
    ...: 
__main__:259: DeprecationWarning: 
.ix is deprecated. Please use
.loc for label based indexing or
.iloc for positional indexing

See the documentation here:
http://pandas.pydata.org/pandas-docs/stable/indexing.html#ix-indexer-is-deprecated
1 loop, best of 3: 579 ms per loop

In [86]: %%timeit
    ...: (all_actions.apply(lambda row: row['Lower'] <= row['Mid'] <= row['Upper'], axis=1))
    ...: 
1 loop, best of 3: 1.17 s per loop

In [87]: %%timeit
    ...: ((all_actions['Lower'] <= all_actions['Mid']) & (all_actions['Mid'] <= all_actions['Upper']))
    ...: 
1000 loops, best of 3: 509 µs per loop


In [90]: %%timeit
    ...: (all_actions.Mid.between(all_actions.Lower, all_actions.Upper))
    ...: 
1000 loops, best of 3: 520 µs per loop
其他 2022/1/1 18:27:45 有667人围观

撰写回答


你尚未登录,登录后可以

和开发者交流问题的细节

关注并接收问题和回答的更新提醒

参与内容的编辑和改进,让解决方法与时俱进

请先登录

推荐问题


联系我
置顶