另一个解决方案(不是pythonic,但速度非常快)是使用string.translate- 尽管请注意,这不适用于unicode。还值得注意的是,您可以通过将字符移动到集合(通过哈希查找,而不是每次都执行线性搜索)中来加快Dana的代码。以下是我给出的各种解决方案的时间安排:
import string, re, timeit
# Precomputed values (for str_join_set and translate)
letter_set = frozenset(string.ascii_lowercase + string.ascii_uppercase)
tab = string.maketrans(string.ascii_lowercase + string.ascii_uppercase,
string.ascii_lowercase * 2)
deletions = ''.join(ch for ch in map(chr,range(256)) if ch not in letter_set)
s="A235th@#$&( er Ra{}|?>ndom"
# From unwind's filter approach
def test_filter(s):
return filter(lambda x: x in string.ascii_lowercase, s.lower())
# using set instead (and contains)
def test_filter_set(s):
return filter(letter_set.__contains__, s).lower()
# Tomalak's solution
def test_regex(s):
return re.sub('[^a-z]', '', s.lower())
# Dana's
def test_str_join(s):
return ''.join(c for c in s.lower() if c in string.ascii_lowercase)
# Modified to use a set.
def test_str_join_set(s):
return ''.join(c for c in s.lower() if c in letter_set)
# Translate approach.
def test_translate(s):
return string.translate(s, tab, deletions)
for test in sorted(globals()):
if test.startswith("test_"):
assert globals()[test](s)=='atherrandom'
print "%30s : %s" % (test, timeit.Timer("f(s)",
"from __main__ import %s as f, s" % test).timeit(200000))
这给了我:
test_filter : 2.57138351271
test_filter_set : 0.981806765698
test_regex : 3.10069885233
test_str_join : 2.87172979743
test_str_join_set : 2.43197956381
test_translate : 0.335367566218
[编辑]还更新了过滤器解决方案。(请注意,set.__contains__
此处使用的区别很大,因为它避免了对lambda进行额外的函数调用。