如果您能熟练使用BeautifulSoup,则只需将soupselect添加到您的库中。 Soupselect是BeautifulSoup的CSS选择器扩展。
用法:
>>> from BeautifulSoup import BeautifulSoup as Soup
>>> from soupselect import select
>>> import urllib
>>> soup = Soup(urllib.urlopen('http://slashdot.org/'))
>>> select(soup, 'div.title h3')
[<h3><span><a href='//science.slashdot.org/'>Science</a>:</span></h3>,
<h3><a href='//slashdot.org/articles/07/02/28/0120220.shtml'>Star Trek</h3>,
..]