此处的一般解决方案是为此编写一个生成器函数,该函数一次生成一组。这是您一次只能在内存中存储一??组。
def get_groups(seq, group_by):
data = []
for line in seq:
# Here the `startswith()` logic can be replaced with other
# condition(s) depending on the requirement.
if line.startswith(group_by):
if data:
yield data
data = []
data.append(line)
if data:
yield data
with open('input.txt') as f:
for i, group in enumerate(get_groups(f, ">"), start=1):
print ("Group #{}".format(i))
print ("".join(group))
Group #1
> header1 description
data data
data
Group #2
>header2 description
more data
data
data
对于一般的FASTA格式,我建议使用Biopython软件包。