您好, 欢迎来到 !    登录 | 注册 | | 设为首页 | 收藏本站

Selenium Python-访问搜索结果的下一页

Selenium Python-访问搜索结果的下一页

问题是具有id的element的值total_results页面加载后发生变化,首先包含117,然后变为44

相反,这是一种更可靠的方法。它逐页处理,直到没有剩余的页面了:

from selenium import webdriver
from selenium.common.exceptions import NoSuchElementException

driver = webdriver.Firefox()
url = 'http://www.nice.org.uk/Search.do?searchText=bevacizumab&newsearch=true#/search/?searchText=bevacizumab&mode=&staticTitle=false&SEARCHTYPE_all2=true&SEARCHTYPE_all1=&SEARCHTYPE=GUIDANCE&TOPICLVL0_all2=true&TOPICLVL0_all1=&HIDEFILTER=TOPICLVL1&HIDEFILTER=TOPICLVL2&TREATMENTS_all2=true&TREATMENTS_all1=&GUIDANCETYPE_all2=true&GUIDANCETYPE_all1=&STATUS_all2=true&STATUS_all1=&HIDEFILTER=EGAPREFERENCE&HIDEFILTER=TOPICLVL3&DATEFILTER_ALL=ALL&DATEFILTER_PREV=ALL&custom_date_from=&custom_date_to=11-06-2014&PAGINATIONURL=%2FSearch.do%3FsearchText%40%40bevacizumab%26newsearch%40%40true%26page%40%40&SORTORDER=BESTMATCH'
driver.get(url)

page_number = 1
while True:
    try:
        link = driver.find_element_by_link_text(str(page_number))
    except NoSuchElementException:
        break
    link.click()
    print driver.current_url
    page_number += 1

基本上,这里的想法是获取下一页链接,直到没有此类链接NoSuchElementException将被抛出)。请注意,它适用于任意数量页面和结果。

它打印:

http://www.nice.org.uk/Search.do?searchText=bevacizumab&newsearch=true&page=1
http://www.nice.org.uk/Search.do?searchText=bevacizumab&newsearch=true&page=2#showfilter
http://www.nice.org.uk/Search.do?searchText=bevacizumab&newsearch=true&page=3#showfilter
http://www.nice.org.uk/Search.do?searchText=bevacizumab&newsearch=true&page=4#showfilter
http://www.nice.org.uk/Search.do?searchText=bevacizumab&newsearch=true&page=5#showfilter
python 2022/1/1 18:14:35 有520人围观

撰写回答


你尚未登录,登录后可以

和开发者交流问题的细节

关注并接收问题和回答的更新提醒

参与内容的编辑和改进,让解决方法与时俱进

请先登录

推荐问题


联系我
置顶