如何查看某个用户的网易云所有评论

65次阅读

共计 3654 个字符,预计需要花费 10 分钟才能阅读完成。

当你想查看某个用户写的评论,但发现设置仅自己可见,外人看不了的时候,这个时候,我们可以通过写一个 python 程序来实现这个操作。有需要找我代查(w-x:fas1024)可以加我,下面是开发实例:

我们可以发现,这些评论是通过向
music.163.com/weapi/v1/resource/comments/R_SO_4_26075485?csrf_token=
发起 post 请求得到的,期间还传入两个参数,params 和 encSecKey


也就是说我们只要通过模拟浏览器向网易云服务器发送 post 请求就能获得评论!
这里还要注意这个 post 的链接,R_SO_4_ 之后跟的一串数字实际上就是这首歌曲对应的 id;而且这里需要传入的参数,也得好好分析一下(在后面)
第一步

代码如下:
headers = {

'User-Agent':'Mozilla/5.0 (Windows NT 6.3; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/67.0.3396.87 Safari/537.36'

}

baseUrl = ‘https://music.163.com’
def getHtml(url):

r = requests.get(url, headers=headers)
html = r.text
return html

def getUrl():

# 从最新歌单开始
startUrl = 'https://music.163.com/discover/playlist/?order=new'
html = getHtml(startUrl)
pattern =re.compile('<li>.*?<p.*?class="dec">.*?<.*?title="(.*?)".*?href="(.*?)".*?>.*?span class="s-fc4".*?title="(.*?)".*?href="(.*?)".*?</li>',re.S)
result = re.findall(pattern,html)

pageNum = re.findall(r'<span class=”zdot”.?class=”zpgi”>(.?)’,html,re.S)[0]

info = []
for i in result:
    data = {}
    data['title'] = i[0]
    url = baseUrl+i[1]
    print url
    data['url'] = url
    data['author'] = i[2]
    data['authorUrl'] = baseUrl+i[3]
    info.append(data)
   getSongSheet(url)
    time.sleep(random.randint(1,10))
    break
    这也是网易云一个有趣的地方,我们在爬取的时候,需要把 # 删了才可这样就可以看到
    ![](https://upload-images.jianshu.io/upload_images/7933544-ba9a4003bde734ac?imageMogr2/auto-orient/strip|imageView2/2/w/951/format/webp)
    ** 第二步 **
    def getSongSheet(url):
#获取每个歌单里的每首歌的 id,作为接下来 post 获取的关键
html = getHtml(url)
result = re.findall(r'<li><a.*?href="/song\?id=(.*?)">(.*?)</a></li>',html,re.S)
result.pop()
musicList = []
for i in result:
    data = {}
    headers1 = {'Referer': 'https://music.163.com/song?id={}'.format(i[0]),
        'User-Agent': 'Mozilla/5.0 (Windows NT 6.3; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/67.0.3396.87 Safari/537.36'
    }
    musicUrl = baseUrl+'/song?id='+i[0]
    print musicUrl
    #歌曲 url
    data['musicUrl'] = musicUrl
    #歌曲名
    data['title'] = i[1]
    musicList.append(data)
    postUrl = 'https://music.163.com/weapi/v1/resource/comments/R_SO_4_{}?csrf_token='.format(i[0])
    param = {'params': get_params(1),
        'encSecKey': get_encSecKey()}
    r = requests.post(postUrl,data = param,headers = headers1)
    total = r.json()
    # 总评论数
    total = int(total['total'])
    comment_TatalPage = total/20
    # 基础总页数
    print comment_TatalPage
    #判断评论页数,有余数则为多一页,整除则正好
    if total%20 != 0:
        comment_TatalPage = comment_TatalPage+1
        comment_data,hotComment_data = getMusicComments(comment_TatalPage, postUrl, headers1)
        #存入数据库的时候若出现 ID 重复,那么注意爬下来的数据是否只有一个
        saveToMongoDB(str(i[1]),comment_data,hotComment_data)
        print 'End!'
    else:
        comment_data, hotComment_data = getMusicComments(comment_TatalPage, postUrl, headers1)
        saveToMongoDB(str(i[1]),comment_data,hotComment_data)
        print 'End!'

    time.sleep(random.randint(1, 10))
    break
    根据 id,构造 postUrl 通过对第一页的 post(关于如何 post 得到想要的信息,在后面会讲到),获取评论的总条数,及总页数;

以及调用获取歌曲评论的方法;
第三步
def getMusicComments(comment_TatalPage ,postUrl, headers1):

commentinfo = []
hotcommentinfo = []
# 对每一页评论
for j in range(1, comment_TatalPage + 1):
    # 热评只在第一页可抓取
    if j == 1:
        #获取评论
        r = getPostApi(j , postUrl, headers1)
        comment_info = r.json()['comments']
        for i in comment_info:
            com_info = {}
            com_info['content'] = i['content']
            com_info['author'] = i['user']['nickname']
            com_info['likedCount'] = i['likedCount']
            commentinfo.append(com_info)
        hotcomment_info = r.json()['hotComments']
        for i in hotcomment_info:
            hot_info = {}
            hot_info['content'] = i['content']
            hot_info['author'] = i['user']['nickname']
            hot_info['likedCount'] = i['likedCount']
            hotcommentinfo.append(hot_info)
    else:
        r = getPostApi(j, postUrl, headers1)
        comment_info = r.json()['comments']
        for i in comment_info:
            com_info = {}
            com_info['content'] = i['content']
            com_info['author'] = i['user']['nickname']
            com_info['likedCount'] = i['likedCount']
            commentinfo.append(com_info)
    print u'第'+str(j)+u'页爬取完毕...'
    time.sleep(random.randint(1,10))
print commentinfo
print '\n-----------------------------------------------------------\n'
print hotcommentinfo
return commentinfo,hotcommentinfo

   

正文完
 0