Selenium-Selenium-Webdriver

Node-使用-Selenium-进行前端自动化操作

Node 使用 Selenium 进行前端自动化操作前言：最近项目中有类似的需求：需要对前端项目中某一个用户下的产品数据进行批量的处理。手动处理的流程大概是首先登录系统，获取到当前用户下的产品列表，点击产品列表的中产品项进入详情页，对该产品进行一系列的操作，然后保存退出。因为当前有20多万条数据，手动一条一条的处理不太现实，所以希望通过写脚本的方式来进行处理。需求分析其实这个需求还算比较简单，需要实现的点主要有三个，一是如何进行登录，获取登录信息，查询当前用户下的产品数据；二是如何知道当前数据是否处理完，然后退出当前的处理流程；三是如何异步的处理一批数据。所以需要做的工作就是模拟登录，调用产品列表的查询接口获取产品ID集合，然后循环遍历当前的集合，通过产品ID跳转产品详情页面，模拟页面按钮的点击操作，监听处理完成的动作，退出当前的流程。 Selenium 介绍What is Selenium?Selenium automates browsers. That's it! What you do with that power is entirely up to you. Primarily, it is for automating web applications for testing purposes, but is certainly not limited to just that. Boring web-based administration tasks can (and should!) be automated as well.Selenium has the support of some of the largest browser vendors who have taken (or are taking) steps to make Selenium a native part of their browser. It is also the core technology in countless other browser automation tools, APIs and frameworks.翻译过来大致意思就是： Selenium 可以自动化操作浏览器。怎么去使用Selenium 的功能完全取决于我们自己。它主要还是使用在web应用的自动化测试上。但是他的功能并不仅限于此。那些枯燥的基于web的管理任务也可以自动化。很多流行的浏览器都采取了一些措施来支持Selenium实现本地化。它也是很多浏览器自动化工具、API自动化以及框架的核心技术。 ...

Selenium-chromeDriver-Python3-完成-Flash-播放

在使用 selenium + chromeDriver + python3 截图时，遇上 Flash 无法加载，导致了截图 Falsh 是空白区。环境要求：selenium chromeDriver Python3 问题chrome 无头浏览器无法自动加载 Flash 解决办法参考了 allow-flash-content-in-chrome-69-running-via-chromedriver 的回答，直接修改 Chrome 的设置 chrome://settings/content/siteDetails?site= 里面的 Flash 设置，修改为 Allow #!/usr/bin/env python3# -*- coding: utf-8 -*-from selenium import webdriverfrom selenium.webdriver.common.by import Byfrom selenium.webdriver.support.ui import Selectclass chromeDriver(): def __init__(self, driver = ''): # 设置窗口大小 self.window_width = 1680 self.window_height = 948 # 设置 chromedriver 位置 self.executable_path = '/usr/local/bin/chromedriver' # 设置 Flash 的路径 self.flash_path = '/Users/cindy/Library/Application Support/Google/Chrome/PepperFlash/32.0.0.171/PepperFlashPlayer.plugin' # 获取 driver if driver: self.driver = driver else: self.driver = self.get_chrome_driver() def get_chrome_driver(self): # 头部 user_agent = 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_14_2) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/73.0.3683.86 Safari/537.36' # 创建参数对象 options = webdriver.ChromeOptions() prefs = { # 开启图片 "profile.managed_default_content_settings.images":1, # 关闭 Notification "profile.default_content_setting_values.notifications": 2, } # 设置 Flash 的路径 options.add_argument('--ppapi-flash-version=32.0.0.171') options.add_argument('--ppapi-flash-path=' + self.flash_path) options.add_argument('binary_location=/Applications/Google Chrome.app/Contents/MacOS/Google Chrome') # 指定屏幕分辨率 options.add_argument('window-size=' + str(self.window_width) + 'x' + str(self.window_height) + '\'') # 最大化窗口 options.add_argument('--start-maximized') # 规避bug options.add_argument('--disable-gpu') # 禁用弹出拦截 options.add_argument('--disable-popup-blocking') # 隐藏自动软件 options.add_argument('disable-infobars') # 设置中文 options.add_argument('lang=zh_CN.UTF-8') #忽略 Chrome 浏览器证书错误报警提示 options.add_argument('--ignore-certificate-errors') # 更换头部 options.add_argument('user-agent=' + user_agent) options.add_argument('no-default-browser-check') # 关闭特征变量 options.add_experimental_option('excludeSwitches', ['enable-automation']) options.add_experimental_option('prefs', prefs) # 创建 Chrome 对象 driver = webdriver.Chrome(options = options, executable_path = self.executable_path) return driver def get(self, web_url): if not web_url: return False return self.driver.get(web_url) def add_flash_site(self, web_url): if not web_url: return False self.get("chrome://settings/content/siteDetails?site=" + web_url) root1 = self.driver.find_element(By.TAG_NAME, "settings-ui") shadow_root1 = self.expand_root_element(root1) root2 = shadow_root1.find_element(By.ID, "container") root3 = root2.find_element(By.ID, "main") shadow_root3 = self.expand_root_element(root3) shadow_root3 = self.expand_root_element(root3) root4 = shadow_root3.find_element(By.CLASS_NAME, "showing-subpage") shadow_root4 = self.expand_root_element(root4) root5 = shadow_root4.find_element(By.ID, "advancedPage") root6 = root5.find_element(By.TAG_NAME, "settings-privacy-page") shadow_root6 = self.expand_root_element(root6) root7 = shadow_root6.find_element(By.ID, "pages") root8 = root7.find_element(By.TAG_NAME, "settings-subpage") root9 = root8.find_element(By.TAG_NAME, "site-details") shadow_root9 = self.expand_root_element(root9) root10 = shadow_root9.find_element(By.ID, "plugins") shadow_root10 = self.expand_root_element(root10) root11 = shadow_root10.find_element(By.ID, "permission") Select(root11).select_by_value("allow") def expand_root_element(self, element): return self.driver.execute_script("return arguments[0].shadowRoot", element) def get_flash_url(self, web_url): if not web_url: return False self.add_flash_site(web_url) self.get(web_url) def quit_driver(self): self.driver.quit()driver = chromeDriver()url = 'http://your.website/'driver.get_flash_url(url)最后不能使用无界面模式，设置 handless 参数 options.add_argument('--headless')。否则无法直接修改 Chrome 的设置。 ...

Selenium爬虫之使用代理ip的方法

我们在爬取数据的时候有可能会出现爬到一半就不动了，有可能是因为目标网站封了你的ip，因为程序的运行是非常快的，人为的访问速度没有那么快，所以会被封，我们可以有多种方法可以避免这种问题第一种：降低访问速度，我们可以使用time模块中的sleep，使程序每运行一次后就睡眠1s，这样的话就可以大大的减少ip被封的几率第二种：为了提高效率，我们可以使用代理ip来解决，ip是亿牛云的动态转发代理，以下是代理配置过程的示例Seleniumfrom selenium import webdriverimport stringimport zipfile# 代理服务器proxyHost = “t.16yun.cn"proxyPort = “31111”# 代理隧道验证信息proxyUser = “username"proxyPass = “password"def create_proxy_auth_extension(proxy_host, proxy_port,proxy_username,proxy_password, scheme=‘http’, plugin_path=None): if plugin_path is None: plugin_path = r’C:/{}_{}@t.16yun.zip’.format(proxy_username, proxy_password) manifest_json = "”” { “version”: “1.0.0”, “manifest_version”: 2, “name”: “16YUN Proxy”, “permissions”: [ “proxy”, “tabs”, “unlimitedStorage”, “storage”, “”, “webRequest”, “webRequestBlocking” ], “background”: { “scripts”: [“background.js”] }, “minimum_chrome_version”:“22.0.0” } "”" background_js = string.Template( """ var config = { mode: “fixed_servers”, rules: { singleProxy: { scheme: “${scheme}”, host: “${host}”, port: parseInt(${port}) }, bypassList: [“foobar.com”] } }; chrome.proxy.settings.set({value: config, scope: “regular”}, function() {}); function callbackFn(details) { return { authCredentials: { username: “${username}”, password: “${password}” } }; } chrome.webRequest.onAuthRequired.addListener( callbackFn, {urls: [""]}, [‘blocking’] ); """ ).substitute( host=proxy_host, port=proxy_port, username=proxy_username, password=proxy_password, scheme=scheme, ) with zipfile.ZipFile(plugin_path, ‘w’) as zp: zp.writestr(“manifest.json”, manifest_json) zp.writestr(“background.js”, background_js) return plugin_pathproxy_auth_plugin_path = create_proxy_auth_extension( proxy_host=proxyHost, proxy_port=proxyPort, proxy_username=proxyUser, proxy_password=proxyPass)option = webdriver.ChromeOptions()option.add_argument("–start-maximized")option.add_extension(proxy_auth_plugin_path)driver = webdriver.Chrome(chrome_options=option)driver.get(“http://httpbin.org/ip")这段代码示例是可以直接复制使用的，但是里面的代理信息是我使用过期了的，所以要程序可以运行，还需要在他们客服那里开通代理获取新的代理信息替换才可以使用。 ...