关于python:抓取免费的代理IP供自己使用第二章抓取快代理

首发于：https://mp.weixin.qq.com/s/O0...

如何应用ip

既然咱们找到了收费的代理ip，咱们要应用，怎么用呢，总不能一个个的复制吧，这不就太憨了嘛

咱们应用爬虫技术，把这些收费的代理ip抓下来就是了

抓下来放进数据库，前面用的时候间接应用程序提取数据库中的代理ip，不就能够了嘛

思路还是简略清晰的把

上面就是开始爬取各网站的代理ip......

抓取快代理

筹备

网址：https://www.kuaidaili.com/free/
零碎：windows
浏览器：Google
语言：python
版本：3.x
数据库：MongoDB

剖析网址

先关上网址看下：https://www.kuaidaili.com/free/

咱们来点击第二页

看下网址

网址变动了，比拟大可能是get申请，咱们按下F12，关上开发者模式

咱们再点击第一页

在看上面的图片，依据图片里的步骤来

咱们能够发现

这个就是咱们想要的网址

代码实现

import requestsfrom lxml import etreeimport pymongoimport timeclass get_ip:    def __init__(self):        _mongo = pymongo.MongoClient(host="127.0.0.1",port=27017)        db = _mongo.IPS        self.ip_table = db["ip_table"]    def get_html(self,page):        headers = {            'Connection': 'keep-alive',            'sec-ch-ua': '"Google Chrome";v="87", " Not;A Brand";v="99", "Chromium";v="87"',            'sec-ch-ua-mobile': '?0',            'Upgrade-Insecure-Requests': '1',            'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/87.0.4280.88 Safari/537.36',            'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.9',            'Sec-Fetch-Site': 'same-origin',            'Sec-Fetch-Mode': 'navigate',            'Sec-Fetch-User': '?1',            'Sec-Fetch-Dest': 'document',            'Referer': f'https://www.kuaidaili.com/free/inha/{page}/',            'Accept-Language': 'zh-CN,zh-TW;q=0.9,zh;q=0.8,en;q=0.7',        }        response = response = requests.get(f'https://www.kuaidaili.com/free/inha/{page}/', headers=headers)        html = etree.HTML(response.text)        trs = html.xpath('//div[@id="list"]//table/tbody/tr')        print(trs)        print(response)        for tr in trs:            ip = tr.xpath('td[@data-title="IP"]/text()')[0]            port = tr.xpath('td[@data-title="PORT"]/text()')[0]            print(f"{ip}:{port}")            _find = self.ip_table.find_one({"ip":ip,"post":port})            # 去重            if not _find:                print("ip不存在")                self.ip_table.insert_one({"ip":ip,"post":port})            else:                print("ip已存在")                g = get_ip()for i in range(1,9999):    g.get_html(i)    time.sleep(3)

关注我获取更多内容