目标网址:https://y.qq.com/n/yqq/song/001qvvgF38HVc4.html#comment_box
qq音乐 周杰伦的说好不哭
打开charles,刷新网页,复制评论在charles里查找,很容易找到评论接口
https://c.y.qq.com/base/fcgi-bin/fcg_global_comment_h5.fcg?g_tk=160454710&loginUin=1808163167&hostUin=0&format=json&inCharset=utf8&outCharset=GB2312¬ice=0&platform=yqq.json&needNewCode=0&cid=205360772&reqtype=2&biztype=1&topid=237773700&cmd=8&needmusiccrit=0&pagenum=0&pagesize=25&lasthotcommentid=&domain=qq.com&ct=24&cv=10101010
评论翻页,发现只有page,lasthotcommentid参数改变了,page即页数,lasthotcommentid为最后一条评论id
https://c.y.qq.com/base/fcgi-bin/fcg_global_comment_h5.fcg?g_tk=160454710&loginUin=1808163167&hostUin=0&format=json&inCharset=utf8&outCharset=GB2312¬ice=0&platform=yqq.json&needNewCode=0&cid=205360772&reqtype=2&biztype=1&topid=237773700&cmd=8&needmusiccrit=0&pagenum=1&pagesize=25&lasthotcommentid=song_237773700_3559701714_1573875409&domain=qq.com&ct=24&cv=10101010
于是直接循环请求:
page=0
lasthotcommentid=''
while 1:
url='https://c.y.qq.com/base/fcgi-bin/fcg_global_comment_h5.fcg?g_tk=160454710&loginUin=1808163167&hostUin=0&format=json&inCharset=utf8&outCharset=GB2312¬ice=0&platform=yqq.json&needNewCode=0&cid=205360772&reqtype=2&biztype=1&topid=237773700&cmd=8&needmusiccrit=0&pagenum=%s&pagesize=25&lasthotcommentid=%s&domain=qq.com&ct=24&cv=10101010'%(page,lasthotcommentid)
response=requests.get(url,verify=False)
jsno_data=json.loads(response.text)
print(jsno_data)
commentsArr=jsno_data['comment']['commentlist']
commenttotal=jsno_data['comment']['commenttotal']
print('共有%s条评论'%commenttotal)
page+=1
break
评论格式如图,处理评论并保存
def saveComments(commentsArr):
for comment in commentsArr:
nick=comment['nick']
rootcommentcontent=comment['rootcommentcontent']
compile=re.compile(r'\[em].*[/em].',re.S)
c=re.sub(compile,'',rootcommentcontent)
f.write(nick+'----'+c+'\n')
结果:
完整代码:https://github.com/Liangjianghao/everyDay_spider.git qqMusic_comments