你可以用这个
re.sub(r'[^\sa-zA-Z0-9]', '', text).lower().strip()
>>> import re
>>> def removePunctuation(s):
return re.sub(r'[^\sa-zA-Z0-9]', '', s).lower().strip()
>>> print removePunctuation('Hi, you!')
hi you
>>> print removePunctuation(' No under_score!')
no underscore
re.sub('(?!\s)[\W_]', '', text).lower().strip()