此解决方案需要预处理您的语料库。但是一旦完成,这将是一个非常快速的字典查找。
from collections import defaultdict
from stemming.porter2 import stem
with open('/usr/share/dict/words') as f:
words = f.read().splitlines()
stems = defaultdict(list)
for word in words:
word_stem = stem(word)
stems[word_stem].append(word)
if __name__ == '__main__':
word = 'leukocyte'
word_stem = stem(word)
print(stems[word_stem])
对于/usr/share/dict/words
语料库,这将产生结果
['leukocyte', "leukocyte's", 'leukocytes']
它使用stemming
可以安装的模块
pip install stemming