您是否考虑过使用BioPython。他们有一个序列读取器,可以读取fasta文件。并且,如果您有兴趣自己编写代码,可以看看BioPython的代码。
def read_fasta(fp):
name, seq = None, []
for line in fp:
line = line.rstrip()
if line.startswith(">"):
if name: yield (name, ''.join(seq))
name, seq = line, []
else:
seq.append(line)
if name: yield (name, ''.join(seq))
with open('f.fasta') as fp:
for name, seq in read_fasta(fp):
print(name, seq)