西部世界 1080P高清下载和自动提醒后续新出的#

1、主要思路是，通过高清源头的网站提供的资源，爬取后，通过迅雷实现自动下载，
然后后续新出的，比如下周1更新后，脚本会自动捕捉后发邮件通知，并自动下载。

2、代码：

# -*- coding: utf-8 -*-
# python 3.5.2
# 测试系统，Win10
# Author:Van
# 实现《西部世界》有更新后自动下载，以及邮件通知
# V1.01
# 修改@href提取方法，因源头网站变更
# 请把对应的帐号密码修改成自己的


# from selenium import webdriver
import requests
from lxml import etree
import time
import os
from win32com.client import Dispatch
import smtplib
from email.mime.text import MIMEText
from email.header import Header
import copy

# hints
print('请确保电脑安装了迅雷')
print('如果你用的是破解版的迅雷，请先开启再运行程序')
print()
# requests
url = 'http://www.btbtdy.com/btdy/dy7280.html'
html = requests.get(url).content.decode('utf-8')

# lxml
selector = etree.HTML(html)
real_link = []

# to be easy, try 'starts-with' , very useful in this case :)
HDTV = selector.xpath('//a[starts-with(@title, "HDTV-1080P")]/text()')
for each in HDTV:
    print(each)


# the site modified the magnet link position with adding the span
# we should use: following-sibling function to catch it :)
href = selector.xpath('//a[starts-with(@title, "HDTV-1080P")]/following-sibling::span/a/@href')
print()
print('目前有 %d 集西部世界' %len(href))
print()

for each in href:
    # split to get the right magnet link
    each = 'magnet' + each.split('magnet')[-1]
    # print(each)
    real_link.append(each)

print('他们的磁链接是 :\n', real_link)
# define a temp_link in deepcopy to compare for new series
temp_link = copy.deepcopy(real_link)
print('temp_link is :', temp_link)




def addTasktoXunlei(down_url,course_infos):
    flag = False
    o = Dispatch("ThunderAgent.Agent.1")
    if down_url:
        course_path = os.getcwd()
        try:
            #AddTask("下载地址", "另存文件名", "保存目录","任务注释","引用地址","开始模式", "只从原始地址下载","从原始地址下载线程数")
            o.AddTask(down_url, '', course_path, "", "", -1, 0, 5)
            o.CommitTasks()
            flag = True
        except Exception:

            print(Exception.message)
            print(" AddTask is fail!")
    return flag

def new_href():
    # to judge if there is a new series of WestWorld
    time.sleep(2)
    if len(real_link) > len(temp_link):
        print('西部世界1080P有更新!')
        print('现在一共有 %d 集了。' %len(real_link))
        return True
    else:
        return False

def send_email(htm):
    # send email to notice new WestWorld is coming
    sender = 'xxxxxxxx@163.com'
    receiver = 'xxxxxxxx@qq.com,xxxxxxxx@163.com'
    subject = '西部世界 1080P有更新！'
    smtpserver = 'smtp.163.com'
    username = 'xxxxxxxx@163.com'
    password = 'xxxxxxxx'
    msg = MIMEText(htm, 'html', 'utf-8')
    msg['Subject'] = Header(subject, 'UTF-8')
    msg['From'] = sender
    msg['To'] = ','.join(receiver)
    smtp = smtplib.SMTP()
    smtp.connect(smtpserver)
    smtp.login(username, password)
    smtp.sendmail(sender, receiver, msg.as_string())
    smtp.quit()

def new_download():
    # only download the new WestWorld series
    if len(real_link) > len(temp_link):
        # 2个地址数据的差集
        new_link = list(set(real_link).difference(set(temp_link)))
        for i in new_link:
            addTasktoXunlei(i, course_infos=None)



if __name__ == '__main__':
    # download the exiting series of WestWorld
    # send_email('最新更新磁链接：'+ str(real_link))
    for i in real_link:
        addTasktoXunlei(i, course_infos=None)

    # to get the later WestWorld for each hour
    while 1:
        if new_href():
            send_email('所有的下载地址（磁链接）：'+ str(real_link))
            new_download()
            time.sleep(15)
            # wait for an hour
            temp_link = real_link
            print(temp_link)
            print('神剧很好看吧，亲，耐心等下一集！~！')

3、代码分析，其中用到了deepcopy，这个功能很有用，并配合了2个数组的差集，使得可以规避定时器，而让脚本直接比较temp_link的内容，而扑捉到网站有新的更新了。另外，在地址识别的时候，一开始用.xpath 没显示内容，有点奇怪，后来根据特性，使用了strats_with识别了内容。另外，原始的邮件发送函数，是一个接收人，如果要多发，则receiver的格式为list，并修改 msg['To'] = ','.join(receiver)

4、邮件的作用是可以利用微信绑定来推送，相对短信，更觉方便。

5、感谢：
@陌提供了163发送email的代码
@何方提供了高清网站源
@其他人，交流了细节

6、可改进点：
邮件的地址内容显示的是一个列表，有待改进。

7、github对应仓库:

https://github.com/vansnowpea/WestWorld-auto-download-email-xunlei-

最后编辑于：2017.12.04 05:31:54

西部世界 1080P高清下载和自动提醒后续新出的【Python】

西部世界 1080P高清下载和自动提醒后续新出的#

推荐阅读更多精彩内容