使用蒂卡对我有用!
from tika import parser
rawText = parser.from_file('January2019.pdf')
rawList = rawText['content'].splitlines()
如何在Python 3.7中从pdf提取文本
使用蒂卡对我有用!
from tika import parser
rawText = parser.from_file('January2019.pdf')
rawList = rawText['content'].splitlines()