这将名称分组
from fuzzywuzzy import fuzz
combined_list = ['rakesh', 'zakesh', 'bikash', 'zikash', 'goldman LLC', 'oldman LLC']
combined_list.append('bakesh')
print('input names:', combined_list)
grs = list() # groups of names with distance > 80
for name in combined_list:
for g in grs:
if all(fuzz.ratio(name, w) > 80 for w in g):
g.append(name)
break
else:
grs.append([name, ])
print('output groups:', grs)
outlist = [el for g in grs for el in g]
print('output list:', outlist)
生产
input names: ['rakesh', 'zakesh', 'bikash', 'zikash', 'goldman LLC', 'oldman LLC', 'bakesh']
output groups: [['rakesh', 'zakesh', 'bakesh'], ['bikash', 'zikash'], ['goldman LLC', 'oldman LLC']]
output list: ['rakesh', 'zakesh', 'bakesh', 'bikash', 'zikash', 'goldman LLC', 'oldman LLC']
如您所见,名称已正确分组,但顺序可能不是您想要的。