您好, 欢迎来到 !    登录 | 注册 | | 设为首页 | 收藏本站

python – 熊猫:如何在DataFrame中使用Pandas(不是用于循环)逐行列出列表列表?

5b51 2022/1/14 8:23:02 python 字数 7365 阅读 591 来源 www.jb51.cc/python

数据帧 df = pd.DataFrame({'A': [['gener'], ['gener'], ['system'], ['system'], ['gutter'], ['gutter'], ['gutter'], ['gutter'], ['gutter'], ['gutter'], ['aluminum'], ['aluminum'], ['aluminum'], ['aluminum'

概述

df = pd.DataFrame({'A': [['gener'],['gener'],['system'],['gutter'],['aluminum'],['aluminum','toledo']],'B': [['gutter'],['gutter','system'],'guard',['ohio','gutter'],'toledo'],['toledo',['how','to','instal','aluminum','gutter','color'],'adrian','ohio'],'bowl','green','maume','perrysburg','tecumseh','toledo','ohio']]},columns=['A','B'])

它看起来像什么

我有一个包含两列列表的数据框.

A                                      B
0              [gener]                               [gutter]
1              [gener]                               [gutter]
2             [system]                       [gutter,system]
3             [system]                [gutter,guard,system]
4             [gutter]                         [ohio,gutter]
5             [gutter]                       [gutter,toledo]
6             [gutter]                       [toledo,gutter]
7             [gutter]                               [gutter]
8             [gutter]                               [gutter]
9             [gutter]                               [gutter]
10          [aluminum]    [how,to,instal,aluminum,gutter]
11          [aluminum]                     [aluminum,gutter]
12          [aluminum]              [aluminum,gutter,color]
13          [aluminum]                     [aluminum,gutter]
14          [aluminum]       [aluminum,adrian,ohio]
15          [aluminum]  [aluminum,bowl,green,ohio]
16          [aluminum]        [aluminum,maume,ohio]
17          [aluminum]   [aluminum,perrysburg,ohio]
18          [aluminum]     [aluminum,tecumseh,ohio]
19  [aluminum,toledo]       [aluminum,toledo,ohio]

如果我有列的列,是否有一个pandas函数,让我操作整个列表数组来检查交集并返回一个布尔值或交叉值作为一个新的系列?

例如,我想让熊猫拥有相同的东西:

def intersection(df,col1,col2,return_type='boolean'):
    if return_type == 'boolean':
        df = df[[col1,col2]]
        s = []
        for idx in df.iterrows():
            s.append(any([phrase in idx[1][0] for phrase in idx[1][1]]))
        S = pd.Series(s)
        return S
    elif return_type == 'word':
        df = df[[col1,col2]]
        s = []
        for idx in df.iterrows():
            s.append(','.join([word for word in list(set(idx[1][0]).intersection(set(idx[1][1])))]))
        S = pd.Series(s)
        return S

#Create column C in df
df['C'] = intersection(df,'A','B','word')

…无需编写自己的函数或求助于循环.我觉得必须有一种更简单的方法来比较同一行中两列中的列表,看它们是否相交.

我可以用for循环来做,但这对我来说很难看

for循环返回一个布尔系列:

for idx in df.iterrows():
    any([phrase in idx[1][0] for phrase in idx[1][1]])

生产:

False
False
True
True
True
True
True
True
True
True
True
True
True
True
True
True
True
True
True
True

或者,使用集合查找相交的单词:

for idx in df.iterrows():
    ','.join([word for word in list(set(idx[1][0]).intersection(set(idx[1][1])))])

''
''
'system'
'system'
'gutter'
'gutter'
'gutter'
'gutter'
'gutter'
'gutter'
'aluminum'
'aluminum'
'aluminum'
'aluminum'
'aluminum'
'aluminum'
'aluminum'
'aluminum'
'aluminum'
'toledo,aluminum'
>>> df.apply(lambda row: all(i in row.B for i in row.A),axis=1)
# OR: ~(df['A'].apply(set) - df['B'].apply(set)).astype(bool)
0     False
1     False
2      True
3      True
4      True
5      True
6      True
7      True
8      True
9      True
10     True
11     True
12     True
13     True
14     True
15     True
16     True
17     True
18     True
19     True
dtype: bool

要获得联盟:

df['intersection'] = [list(set(a).intersection(set(b))) for a,b in zip(df.A,df.B)]

>>> df
                     A                                      B        intersection
0              [gener]                               [gutter]                  []
1              [gener]                               [gutter]                  []
2             [system]                       [gutter,system]            [system]
3             [system]                [gutter,system]            [system]
4             [gutter]                         [ohio,gutter]            [gutter]
5             [gutter]                       [gutter,toledo]            [gutter]
6             [gutter]                       [toledo,gutter]            [gutter]
7             [gutter]                               [gutter]            [gutter]
8             [gutter]                               [gutter]            [gutter]
9             [gutter]                               [gutter]            [gutter]
10          [aluminum]    [how,gutter]          [aluminum]
11          [aluminum]                     [aluminum,gutter]          [aluminum]
12          [aluminum]              [aluminum,color]          [aluminum]
13          [aluminum]                     [aluminum,gutter]          [aluminum]
14          [aluminum]       [aluminum,ohio]          [aluminum]
15          [aluminum]  [aluminum,ohio]          [aluminum]
16          [aluminum]        [aluminum,ohio]          [aluminum]
17          [aluminum]   [aluminum,ohio]          [aluminum]
18          [aluminum]     [aluminum,ohio]          [aluminum]
19  [aluminum,ohio]  [aluminum,toledo]

总结

以上是编程之家为你收集整理的python – 熊猫:如何在DataFrame中使用Pandas(不是用于循环)逐行列出列表列表?全部内容,希望文章能够帮你解决python – 熊猫:如何在DataFrame中使用Pandas(不是用于循环)逐行列出列表列表?所遇到的程序开发问题。


如果您也喜欢它,动动您的小指点个赞吧

除非注明,文章均由 laddyq.com 整理发布,欢迎转载。

转载请注明:
链接:http://laddyq.com
来源:laddyq.com
著作权归作者所有。商业转载请联系作者获得授权,非商业转载请注明出处。


联系我
置顶