在python中使用scrapy执行Javascript提交表单功能

检出以下有关如何将selenium一起使用的摘要。爬网速度会变慢，因为你不仅要下载html，还可以完全访问DOM。

注意：由于先前提供的链接不再起作用，因此我已复制粘贴此代码段。

# Snippet imported from snippets.scrapy.org (which no longer works)

from scrapy.contrib.spiders import CrawlSpider, Rule
from scrapy.contrib.linkextractors.sgml import SgmlLinkExtractor
from scrapy.selector import HtmlXPathSelector
from scrapy.http import Request

from selenium import selenium

class SeleniumSpider(CrawlSpider):
    name = "SeleniumSpider"
    start_urls = ["http://www.domain.com"]

    rules = (
        Rule(SgmlLinkExtractor(allow=('\.html', )),
        callback='parse_page',follow=True),
    )

    def __init__(self):
        CrawlSpider.__init__(self)
        self.verificationErrors = []
        self.selenium = selenium("localhost", 4444, "*chrome", "http://www.domain.com")
        self.selenium.start()

    def __del__(self):
        self.selenium.stop()
        print self.verificationErrors
        CrawlSpider.__del__(self)

    def parse_page(self, response):
        item = Item()

        hxs = HtmlXPathSelector(response)
        #Do some XPath selection with Scrapy
        hxs.select('//div').extract()

        sel = self.selenium
        sel.open(response.url)

        #Wait for javscript to load in Selenium
        time.sleep(2.5)

        #Do some crawling of javascript created content with Selenium
        sel.get_text("//div")
        yield item

python 2022/1/1 18:24:22 有217人围观

撰写回答

你尚未登录，登录后可以

和开发者交流问题的细节

关注并接收问题和回答的更新提醒

参与内容的编辑和改进，让解决方法与时俱进

请先登录

推荐问题

如何在PHP变量中去除空格？

如何在PHP变量中去除空格？

php 2022-01-01 1199
我可以在php中的SESSION数组上使用array_push吗？

我可以在php中的SESSION数组上使用array_push吗？

php 2022-01-01 1191
如何使用bcrypt在PHP中对密码进行哈希处理？

如何使用bcrypt在PHP中对密码进行哈希处理？

php 2022-01-01 940
如何在PHP中使用XMLReader？

如何在PHP中使用XMLReader？

php 2022-01-01 1084
PDOException“找不到驱动程序”在PHP

PDOException“找不到驱动程序”在PHP

php 2022-01-01 1066
为什么在pom.xml的第1行中出现Unknown错误？

为什么在pom.xml的第1行中出现Unknown错误？

其他 2022-01-01 1245
__construct（）与SameAsClassName（）在PHP中的构造函数

__construct（）与SameAsClassName（）在PHP中的构造函数

php 2022-01-01 869
使用Retrofit2在POST请求中发送JSON

使用Retrofit2在POST请求中发送JSON

其他 2022-01-01 972
用单引号在PHP中打印换行符

用单引号在PHP中打印换行符

php 2022-01-01 887
可以嵌套在P元素内的HTML5元素列表？

可以嵌套在P元素内的HTML5元素列表？

其他 2022-01-01 913
为什么在PHP中通过标头（'Location ..'）重定向后必须调用'exit'？

为什么在PHP中通过标头（'Location ..'）重定向后必须调用'exit'？

php 2022-01-01 857
如何在PHP中发出异步GET请求？

如何在PHP中发出异步GET请求？

php 2022-01-01 874
如何在php中为其他所有函数调用自动调用函数

如何在php中为其他所有函数调用自动调用函数

php 2022-01-01 932
当软键盘出现在phonegap中时，输入字段隐藏

当软键盘出现在phonegap中时，输入字段隐藏

其他 2022-01-01 891
在PHP中连接n个数组的值

在PHP中连接n个数组的值

php 2022-01-01 887
在PHP中“ =>”是什么意思？

在PHP中“ =>”是什么意思？

php 2022-01-01 913
在PHP中写入新行到文件（换行）

在PHP中写入新行到文件（换行）

php 2022-01-01 844
文件上传可以在PHP中超时吗？

文件上传可以在PHP中超时吗？

php 2022-01-01 890
如何在Python中使用Selenium滚动到页面的末尾？

如何在Python中使用Selenium滚动到页面的末尾？

python 2022-01-01 881
在PHP中对关联数组进行排序

在PHP中对关联数组进行排序

php 2022-01-01 848

在python中使用scrapy执行Javascript提交表单功能

撰写回答

推荐问题

如何在PHP变量中去除空格？

我可以在php中的SESSION数组上使用array_push吗？

如何使用bcrypt在PHP中对密码进行哈希处理？

如何在PHP中使用XMLReader？

PDOException“找不到驱动程序”在PHP

为什么在pom.xml的第1行中出现Unknown错误？

__construct（）与SameAsClassName（）在PHP中的构造函数

使用Retrofit2在POST请求中发送JSON

用单引号在PHP中打印换行符

可以嵌套在P元素内的HTML5元素列表？

为什么在PHP中通过标头（'Location ..'）重定向后必须调用'exit'？

如何在PHP中发出异步GET请求？

如何在php中为其他所有函数调用自动调用函数

当软键盘出现在phonegap中时，输入字段隐藏

在PHP中连接n个数组的值

在PHP中“ =>”是什么意思？

在PHP中写入新行到文件（换行）

文件上传可以在PHP中超时吗？

如何在Python中使用Selenium滚动到页面的末尾？

在PHP中对关联数组进行排序

分类汇总

您的鼓励是对我最大的支持