关于python:高质量爬手当然得爬一手高质量壁纸

61次阅读

共计 1780 个字符，预计需要花费 5 分钟才能阅读完成。

每天我的壁纸都是 Windows 自带的天蓝色，看的真的没意思，有意思吗，没意思~

所以啊，家喻户晓，我是一个喜爱高质量的博主，当然的整一手高质量壁纸，没有别的意思。

好了，不多哔哔，开启明天的高质量旅途~

这些通通安顿上

python 3.6  
pycharm
requests
parsel

通过开发者工具（F12 或者鼠标右键点击查看）查找图片的 url 地址起源；
申请壁纸的详情页获取它网页源代码就能够获取图片 url 地址了（一张）；
申请列表页就能够获取每个壁纸的详情页 url 以及题目；

壁纸的列表页 url：http://www.netbian.com/1920×1…

网页源代码 / response.text 网页文本数据

css xpath bs4 re
壁纸详情页 url：/desk/23397.htm 2. 壁纸题目

保留图片是二进制数据

观众姥爷：就这就这？代码呢？代码都不放你几个意思？

别慌，来了来了

我就不一一拆解了，正文加上第三步，置信聪慧的你能够了解，切实不行最初我放视频解说吧。

import requests # 申请模块 第三方模块 pip install requests
import parsel # 数据解析模块 第三方模块 pip install parsel
import time # 工夫模块 内置模块

time_1 = time.time()
# 要什么用模块 首先要晓得模块有什么用
for page in range(2, 12):
    print(f'==================== 正在爬取第 {page} 页的数据内容 ====================')
    url = f'http://www.netbian.com/1920x1080/index_{page}.htm'
    # 申请头：把 python 代码伪装成浏览器对服务器发送申请
    headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/89.0.4389.114 Safari/537.36'
    }
    response = requests.get(url=url, headers=headers)
    # 呈现乱码怎么办？须要转码
    # html_data = response.content.decode('gbk')
    response.encoding = response.apparent_encoding # 主动转码
    # 获取源代码 / 获取网页文本数据 response.text
    # print(response.text)
    # 解析数据
    selector = parsel.Selector(response.text)
    # CSS 选择器 就是依据网页标签内容提取数据
    # 第一次提取 提取所有的 li 标签内容
    lis = selector.css('.list li')
    for li in lis:
        # http://www.netbian.com/desk/23397.htm
        title = li.css('b::text').get()
        if title:
            href = 'http://www.netbian.com' + li.css('a::attr(href)').get()
            response_1 = requests.get(url=href, headers=headers)
            selector_1 = parsel.Selector(response_1.text)
            img_url = selector_1.css('.pic img::attr(src)').get()

            img_content = requests.get(url=img_url, headers=headers).content
            with open('img\\' + title + '.jpg', mode='wb') as f:
                f.write(img_content)
                print('正在保留:', title)

time_2 = time.time()
use_time = int(time_2) - int(time_1)
print(f'总计耗时 {use_time} 秒')

大家能够本人运行试试，记得三连哇

正文完

python

发表至： python

2021-12-23

0

关于python:Django安装

关于python:ApacheCN-Python-译文集二20211110-更新

关于python:好书推荐Python黑魔法指南附高清PDF版

关于python:Python-笔记-一行命令快速开启-http-文件下载上传服务器

关于前端:用一个-flvjs-播放监控的例子带你深撅直播流技术

关于python:高质量爬手当然得爬一手高质量壁纸

一、写在后面

二、筹备工作

三、爬虫流程

1）对于数据起源查找：

1、确定指标需要：爬取高清壁纸图片（此岸）

2）代码实现：

1、发送申请

2、获取数据

3、解析数据

4、保留数据

四、代码展现

Just My Socks（注册教程内含优惠码）

关于python:高质量爬手当然得爬一手高质量壁纸

一、写在后面

二、筹备工作

三、爬虫流程

1）对于数据起源查找：

1、确定指标需要：爬取高清壁纸图片（此岸）

2）代码实现：

1、发送申请

2、获取数据

3、解析数据

4、保留数据

四、代码展现

Just My Socks（注册教程 内含优惠码）

Just My Socks（注册教程内含优惠码）