MetaGPT智能体开发入门3-订阅智能体(OSS)实践

本节课的任务
通过动手实现一个OSS（Open Source Software）订阅智能体，来了解MetaGPT如何解决一些日常工作场景中遇到的问题。

主要完成如下任务：

为OSS实现两个Action:
- Action 1：实现对Github Trending页面的爬取，并获取每一个项目的名称、URL链接、描述
- Action 2：独立完成对Huggingface Papers页面的爬取，先获取到每一篇Paper的链接（标题元素中的href标签），并通过链接访问标题的描述页面（例如：https://huggingface.co/papers/2312.03818），在页面中获取一篇Paper的标题、摘要
OSS自动生成总结内容的目录，然后根据二级标题进行分块，每块内容做出对应的总结，形成一篇资讯文档；
OSS定时为通知渠道发送以上总结的资讯文档（尝试实现邮箱发送的功能）

使用MetaGPT实现订阅智能体的步骤

image.png

如上图，使用MetaGPT实现订阅智能体基本需要如下步骤：

实现OSS Agent（基于Role），并实现Agent需要的爬虫Action和分析Action
实现触发（trigger,即如何触发Agent进行Action，比如爬取和分析）
实现回调（callback，即完成后干啥事，比如推送到discord、微信，或者发送邮箱）
最终把上面的OSS Agent、trigger和callback串联起来工作，就是SubscriptionRunner
当然，你也可以不用SubscriptionRunner，直接基于role.run()来自行编码。但是SubscriptionRunner是一种模式，可以复用。

实现

代码

下面代码（oss.py）通过github trending爬取、总结，信息发布到discord, 并通过邮件发送


import asyncio
import os

import fire
import discord
import aiohttp
from bs4 import BeautifulSoup
from typing import Any

from metagpt.actions import Action
from metagpt.config import CONFIG
from metagpt.environment import Environment
from metagpt.logs import logger
from metagpt.roles import Role
from metagpt.roles.role import RoleReactMode
from metagpt.schema import Message
from metagpt.subscription import SubscriptionRunner


class CrawlOSSTrending(Action):
    async def run(self, url: str = "https://github.com/trending"):
        # return "https://github.com/trending"
        async with aiohttp.ClientSession() as client:
            async with client.get(url, proxy=CONFIG.global_proxy) as response:
                response.raise_for_status()
                html = await response.text()

        soup = BeautifulSoup(html, 'html.parser')

        repositories = []

        for article in soup.select('article.Box-row'):
            repo_info = {'name': article.select_one('h2 a').text.strip().replace("\n", "").replace(" ", ""),
                         'url': "https://github.com" + article.select_one('h2 a')['href'].strip()}

            # Description
            description_element = article.select_one('p')
            repo_info['description'] = description_element.text.strip() if description_element else None

            # Language
            language_element = article.select_one('span[itemprop="programmingLanguage"]')
            repo_info['language'] = language_element.text.strip() if language_element else None

            # Stars and Forks
            stars_element = article.select('a.Link--muted')[0]
            forks_element = article.select('a.Link--muted')[1]
            repo_info['stars'] = stars_element.text.strip()
            repo_info['forks'] = forks_element.text.strip()

            # Today's Stars
            today_stars_element = article.select_one('span.d-inline-block.float-sm-right')
            repo_info['today_stars'] = today_stars_element.text.strip() if today_stars_element else None

            repositories.append(repo_info)

        return repositories


class CrawlOSSHugginfacePapers(Action):
    async def run(self, msg: Message) -> str:
        logger.info(f"{msg}")
        return msg.text



TRENDING_ANALYSIS_PROMPT = """# Requirements
You are a GitHub Trending Analyst, aiming to provide users with insightful and personalized recommendations based on the latest
GitHub Trends. Based on the context, fill in the following missing information, generate engaging and informative titles, 
ensuring users discover repositories aligned with their interests.

# The title about Today's GitHub Trending
## Today's Trends: Uncover the Hottest GitHub Projects Today! Explore the trending programming languages and discover key domains capturing developers' attention. From ** to **, witness the top projects like never before.
## The Trends Categories: Dive into Today's GitHub Trending Domains! Explore featured projects in domains such as ** and **. Get a quick overview of each project, including programming languages, stars, and more.
## Highlights of the List: Spotlight noteworthy projects on GitHub Trending, including new tools, innovative projects, and rapidly gaining popularity, focusing on delivering distinctive and attention-grabbing content for users.
---
# Format Example

\```
# [Title]

## Today's Trends
Today, ** and ** continue to dominate as the most popular programming languages. Key areas of interest include **, ** and **.
The top popular projects are Project1 and Project2.

## The Trends Categories
1. Generative AI
    - [Project1](https://github/xx/project1): [detail of the project, such as star total and today, language, ...]
    - [Project2](https://github/xx/project2): ...
...

## Highlights of the List
1. [Project1](https://github/xx/project1): [provide specific reasons why this project is recommended].
...
\```

---
# Github Trending
{trending}
"""


class AnalysisOSSTrending(Action):

    async def run(
            self,
            trending: Any
    ):
        return await self._aask(TRENDING_ANALYSIS_PROMPT.format(trending=trending))


class OssWatcher(Role):
    name: str = "XiaoGang"
    profile: str = "OssWatcher"
    goal: str = "Generate an insightful GitHub Trending and Huggingface papers analysis report."
    constraints: str = "Only analyze based on the provided GitHub Trending and Huggingface papers data."

    def __init__(self, **kwargs):
        super().__init__(**kwargs)
        self._init_actions([CrawlOSSTrending, AnalysisOSSTrending])
        self._set_react_mode(RoleReactMode.BY_ORDER.value)

    async def _act(self) -> Message:
        logger.info(f"{self._setting}: to do {self.rc.todo}")
        todo = self.rc.todo
        msg = self.get_memories(k=1)[0]  # find the most recent messages
        new_msg = await todo.run(msg.content)
        msg = Message(content=str(new_msg), role=self.profile, cause_by=type(todo))
        self.rc.memory.add(msg)  # add the new message to memory
        return msg

async def discord_callback(msg: Message):
    intents = discord.Intents.default()
    intents.message_content = True
    client = discord.Client(intents=intents, proxy=CONFIG.global_proxy)
    token = os.environ["DISCORD_TOKEN"]
    channel_id = int(os.environ["DISCORD_CHANNEL_ID"])
    async with client:
        await client.login(token)
        channel = await client.fetch_channel(channel_id)
        lines = []
        for i in msg.content.splitlines():
            if i.startswith(("# ", "## ", "### ")):
                if lines:
                    await channel.send("\n".join(lines))
                    lines = []
            lines.append(i)

        if lines:
            await channel.send("\n".join(lines))


async def mail_callback(msg: Message):
    async_mailer = AsyncMailer()
    await async_mailer.send(os.environ["MAIL_SENDER"], os.environ["MAIL_RECEIVER"], 'GitHub Trending Analysis', msg.content)


async def oss_callback(discord: bool = True, mail: bool = True):
    callbacks = []
    if discord:
        callbacks.append(discord_callback)

    if mail:
        callbacks.append(mail_callback)
    if not callbacks:
        async def _print(msg: Message):
            print(msg.content)

        callbacks.append(_print)

    async def callback(msg: Message):
        await asyncio.gather(*[cb(msg) for cb in callbacks])

    return callback


async def oss_trigger():
    while True:
        yield Message(content="https://github.com/trending")
        await asyncio.sleep(3600 * 24)


async def main(discord: bool = True, mail: bool = True):
    runner = SubscriptionRunner()
    callback = await oss_callback(discord, mail)
    runner.model_rebuild()
    await runner.subscribe(OssWatcher(), oss_trigger(), callback)
    await runner.run()


if __name__ == "__main__":
    fire.Fire(main)

日志

2024-01-18 00:39:45.138 | INFO     | metagpt.const:get_metagpt_package_root:32 - Package root set to D:\workspace\sourcecode\MetaGPT
2024-01-18 00:39:45.281 | INFO     | metagpt.config:get_default_llm_provider_enum:126 - API: LLMProviderEnum.ZHIPUAI
2024-01-18 00:39:48.908 | INFO     | metagpt.config:get_default_llm_provider_enum:126 - API: LLMProviderEnum.ZHIPUAI
2024-01-18 00:39:48.914 | INFO     | __main__:_act:123 - XiaoGang(OssWatcher): to do CrawlOSSTrending
2024-01-18 00:39:50.138 | INFO     | __main__:_act:123 - XiaoGang(OssWatcher): to do AnalysisOSSTrending
 Here's a title for today's GitHub Trending based on the provided data:

**Today's Trends: Explore the Hottest GitHub Projects in Programming Languages and Domains**

---

## Today's Trends

Today, JavaScript and Python continue to dominate as the most popular programming languages. Key areas of interest include generative AI, personal finance, and scalability. Discover the top popular projects like never before, from **TencentARC/PhotoMaker** to **linexjlin/GPTs**.

## The Trends Categories

1. Generative AI
    * [TencentARC/PhotoMaker](https://github.com/TencentARC/PhotoMaker): A powerful photo manipulation tool using AI.
    * [linexjlin/GPTs](https://github.com/linexjlin/GPTs): A collection of leaked GPT-3 prompts.
2. Personal Finance
    * [maybe-finance/maybe](<https://github.com/maybe-finance/maybe>: A comprehensive personal finance and wealth management app.
3. Scalability
    * [binhnguyennus/awesome-scalability](<https://github.com/binhnguyennus/awesome-scalability>: A curated list of patterns for building scalable, reliable, and performant large-scale systems.

## Highlights of the List

1. **TencentARC/PhotoMaker**: This project offers a powerful photo manipulation tool that uses AI to create stunning images. With over 2,000 stars and 150 forks, it's a must-watch repository for AI-driven image processing.
2. **maybe-finance/maybe**: This comprehensive personal finance and wealth management app has earned over 10,000 stars and 741 forks. It's a great resource for anyone looking to manage their finances effectively.
3. **linexjlin/GPTs**: This repository contains a collection of leaked GPT-3 prompts, earning it 22,916 stars and 3,291 forks. It's an interesting resource for those interested in exploring AI-generated text.

Check out these projects and more in the full list above! Stay tuned for more insightful and personalized recommendations based on the latest GitHub Trends.
2024-01-18 00:40:14.373 | INFO     | metagpt.utils.cost_manager:update_cost:48 - Total running cost: $0.000 | Max budget: $10.000 | Current cost: $0.000, prompt_tokens: 2858, completion_tokens: 526

发送Discord效果

image.png

发送到邮箱

这里使用163的邮箱，需要开启smtp服务

image.png

MAIL_PASSWORD不是邮箱密码，是开启smtp服务时会生成，将MAIL_PASSWORD设置到环境变量中。
另外代码中MAIL_SENDER和MAIL_RECEIVER分别表示发件人和收件人，也通过环境变量设置。

发送邮件的类：

import asyncio
import os
from email.mime.text import MIMEText
from email.header import Header
import aiosmtplib
from aiosmtplib.email import formataddr

from metagpt.logs import logger


class AsyncMailer:
    def __init__(self, smtp_server="smtp.163.com", smtp_port=25):
        self.smtp_server = smtp_server
        self.smtp_port = smtp_port
        self.password = os.environ["MAIL_PASSWORD"]

    async def send(self, sender, receiver, title, content) -> None:
        message = MIMEText(content, 'plain', 'utf-8')
        message['From'] = formataddr((sender.split('@')[0], sender))  # 设置发件人昵称
        message['To'] = formataddr((receiver.split('@')[0], receiver))  # 设置收件人昵称
        # message['Message-ID'] = Header('123456789', 'utf-8')  # 设置邮件id
        message['Content-Type'] = Header('text/plain', 'utf-8')  # 设置邮件内容类型
        message['Content-Transfer-Encoding'] = Header('base64', 'utf-8')  # 设置邮件内容编码
        message['X-Priority'] = Header('3', 'utf-8')  # 设置邮件优先级
        message['X-Mailer'] = Header('Aiosmtplib', 'utf-8')  # 设置邮件客户端
        message['MIME-Version'] = Header('1.0', 'utf-8')  # 设置邮件版本
        message['X-AntiAbuse'] = Header('1', 'utf-8')  # 设置邮件防垃圾邮件
        message['Subject'] = Header(title, 'utf-8')  # 设置邮件主题

        # 异步连接邮件服务器并登录
        smtp_connection = aiosmtplib.SMTP(hostname=self.smtp_server, port=self.smtp_port, local_hostname='localhost')
        await smtp_connection.connect()
        await smtp_connection.login(sender, self.password)

        # 异步发送邮件
        await smtp_connection.sendmail(sender, receiver, message.as_string())

        # 关闭连接
        await smtp_connection.quit()
        logger.info("邮件发送成功！")




async def main():
    async_mailer = AsyncMailer()
    await async_mailer.send(os.environ["MAIL_SENDER"], os.environ["MAIL_RECEIVER"], 'Mail Test', 'Hello World!')

if __name__ == '__main__':
    # 运行示例
    asyncio.run(main())

增加发送邮件的callback


async def mail_callback(msg: Message):
    async_mailer = AsyncMailer()
    await async_mailer.send(os.environ["MAIL_SENDER"], os.environ["MAIL_RECEIVER"], 'GitHub Trending Analysis', msg.content)

async def oss_callback(discord: bool = True, mail: bool = True):
    callbacks = []
    if discord:
        callbacks.append(discord_callback)

    if mail:
        callbacks.append(mail_callback)
    if not callbacks:
        async def _print(msg: Message):
            print(msg.content)

        callbacks.append(_print)

    async def callback(msg: Message):
        await asyncio.gather(*[cb(msg) for cb in callbacks])

    return callback

邮箱发送效果

image.png

Huggingface Papers页面爬取和总结

下面我们再完成对Huggingface Papers页面的爬取，这个页面是Hugging Face论文页面，分享了与NLP和相关技术领域有关的研究论文、文章和资源，可以在这里找到关于模型、算法、实验等方面的详细信息。这里完成从Huggingface Papers获取每一篇Paper的链接，并通过链接访问标题的描述页面，在页面中获取Paper的标题、摘要，然后自动生成总结内容的目录，每块内容做出对应的总结，形成一篇资讯文档。

Huggingface Papers页面爬取

通过F12或者右键菜单|检查打开开发者工具

image.png

然后找到如下部分：

image.png

首先通过bs4获得每篇paper的连接

def hg_article_urls(html_soup):
    _urls = []
    for article in html_soup.select('article.flex.flex-col.overflow-hidden.rounded-xl.border'):
        url = article.select_one('h3 a')['href']
        _urls.append('https://huggingface.co' + url)
    return _urls

需要注意的是需要使用<h3><a href>来进行定位，不能使用<a href>，即应像上面写为

url = article.select_one('h3 a')['href']

上面获取到url，如https://huggingface.co/papers/2401.10020，通过url链接访问paper描述页面,获取标题和摘要。

以https://huggingface.co/papers/2401.10020为例：

image.png

通过下面代码获取上图中data-props的信息,因为data-props的内容是json字符串，所以通过json.loads解析为json对象。

        info = soup.select_one('section.pt-8.border-gray-100')
        data_props = json.loads(info.select_one('div.SVELTE_HYDRATER.contents')['data-props'])

image.png

如上图，通过data_props可以获取到paper的id、投票数、发布时间、标题和摘要的信息。
上面作为工具代码保存到了hg_parse.py中，完整代码如下：

import asyncio
import json

import aiohttp
from bs4 import BeautifulSoup

from metagpt.config import CONFIG
from metagpt.logs import logger


def get_local_html_soup(url, features='html.parser'):
    with open(url, encoding="utf-8") as f:
        html = f.read()
    soup = BeautifulSoup(html, features)
    return soup


async def get_html_soup(url: str):
    async with aiohttp.ClientSession() as client:
        async with client.get(url, proxy=CONFIG.global_proxy) as response:
            response.raise_for_status()
            html = await response.text()

    soup = BeautifulSoup(html, 'html.parser')
    return url, soup


def hg_article_urls(html_soup):
    _urls = []
    for article in html_soup.select('article.flex.flex-col.overflow-hidden.rounded-xl.border'):
        url = article.select_one('h3 a')['href']
        _urls.append('https://huggingface.co' + url)
    return _urls


def hg_article_infos(_url, html_soup):
    logger.info(f'Parsing {_url}')
    _article = {}
    info = html_soup.select_one('section.pt-8.border-gray-100')
    data_props = json.loads(info.select_one('div.SVELTE_HYDRATER.contents')['data-props'])
    paper = data_props['paper']
    _article['url'] = _url
    _article['id'] = paper['id']
    _article['title'] = paper['title']
    _article['upvotes'] = paper['upvotes']
    _article['publishedAt'] = paper['publishedAt']
    _article['summary'] = paper['summary']
    return _article


async def get_hg_articles():
    _, _soup = await get_html_soup("https://huggingface.co/papers")
    hg_urls = hg_article_urls(_soup)
    _soups = await asyncio.gather(*[get_html_soup(url) for url in hg_urls])
    hg_articles = map(lambda param: hg_article_infos(param[0], param[1]), _soups)

    return list(hg_articles)

if __name__ == "__main__":
    import asyncio
    for article in asyncio.run(get_hg_articles()):
        print(article)

在前面的oss.py中增加Huggingface Papers页面爬取的Action:

class CrawlOSSHuggingfacePapers(Action):
    async def run(self, msg: Message) -> str:
        logger.info(f"{msg}")
        return await get_hg_articles()

Huggingface Papers页面总结

页面总结Action主要是写Prompt，参考github trending的Prompt实现AnalysisOSSHuggingfacePapers:

HG_PAPERS_ANALYSIS_PROMPT = """# Requirements
You are a Haggingface Papers Analyst, aiming to provide users with insightful and personalized consultation based on the latest
Haggingface Papers abstract. Based on the context, fill in the following missing information, generate engaging and informative titles, 
ensuring users discover articles aligned with their interests.

# The title about Today's Haggingface Papers Consultation
## Today's Haggingface Papers Consultation: Uncover the Hottest Haggingface Papers Today! Explore the trending programming languages and discover key domains capturing developers' attention. From ** to **, witness the top papers like never before.
## The Papers Categories: Dive into Today's Haggingface Papers Domains! Explore featured papers in domains such as ** and **. Get a quick overview of each paper, including upvotes, and more.
## Highlights of the List: Spotlight noteworthy papers on Haggingface Papers, including new tools, new methods, innovative papers, and rapidly gaining popularity, focusing on delivering distinctive and attention-grabbing content for users.
---
# Format Example

\```
# [Title]

## Today's Haggingface Papers Consultation
Today, ** and ** continue to dominate as the most popular research areas. Key areas of interest include **, ** and **.
The top popular papers are Paper1 and Paper2.

## The Papers Categories
1. Large Language Model
    - [Paper1](https://huggingface.co/papers/paper1): [Abstract of the paper, such as upvotes total ...]
    - [Paper2](https://huggingface.co/papers/paper2): ...
...

## Highlights of the List
1. [Paper1](https://huggingface.co/papers/paper1): [provide specific reasons why this paper is recommended].
...
\```

---
# Haggingface Papers
{papers}
"""


class AnalysisOSSHuggingfacePapers(Action):
    async def run(
            self,
            papers: Any
    ):
        return await self._aask(HG_PAPERS_ANALYSIS_PROMPT.format(papers=papers))

最终Haggingface Papers咨询信息发送到discord和邮箱的效果如下：

image.png

最后编辑于：2024.01.23 13:01:17

人面猴
序言：七十年代末，一起剥皮案震惊了整个滨河市，随后出现的几起案子，更是在滨河造成了极大的恐慌，老刑警刘岩，带你破解...
沈念sama阅读 206,968评论 6赞 482
死咒
序言：滨河连续发生了三起死亡事件，死亡现场离奇诡异，居然都是意外死亡，警方通过查阅死者的电脑和手机，发现死者居然都...
沈念sama阅读 88,601评论 2赞 382
救了他两次的神仙让他今天三更去死
文/潘晓璐我一进店门，熙熙楼的掌柜王于贵愁眉苦脸地迎上来，“玉大人，你说我怎么就摊上这事。” “怎么了？”我有些...
开封第一讲书人阅读 153,220评论 0赞 344
道士缉凶录：失踪的卖姜人
文/不坏的土叔我叫张陵，是天一观的道长。经常有香客问我，道长，这世上最难降的妖魔是什么？我笑而不...
开封第一讲书人阅读 55,416评论 1赞 279
港岛之恋（遗憾婚礼）
正文为了忘掉前任，我火速办了婚礼，结果婚礼上，老公的妹妹穿的比我还像新娘。我一直安慰自己，他们只是感情好，可当我...
茶点故事阅读 64,425评论 5赞 374
恶毒庶女顶嫁案：这布局不是一般人想出来的
文/花漫我一把揭开白布。她就那样静静地躺着，像睡着了一般。火红的嫁衣衬着肌肤如雪。梳的纹丝不乱的头发上，一...
开封第一讲书人阅读 49,144评论 1赞 285
城市分裂传说
那天，我揣着相机与录音，去河边找鬼。笑死，一个胖子当着我的面吹牛，可吹牛的内容都是我干的。我是一名探鬼主播，决...
沈念sama阅读 38,432评论 3赞 401
双鸳鸯连环套：你想象不到人心有多黑
文/苍兰香墨我猛地睁开眼，长吁一口气：“原来是场噩梦啊……” “哼！你这毒妇竟也来了？” 一声冷哼从身侧响起，我...
开封第一讲书人阅读 37,088评论 0赞 261
万荣杀人案实录
序言：老挝万荣一对情侣失踪，失踪者是张志新（化名）和其女友刘颖，没想到半个月后，有当地人在树林里发现了一具尸体，经...
沈念sama阅读 43,586评论 1赞 300
护林员之死
正文独居荒郊野岭守林人离奇死亡，尸身上长有42处带血的脓包…… 初始之章·张勋以下内容为张勋视角年9月15日...
茶点故事阅读 36,028评论 2赞 325
白月光启示录
正文我和宋清朗相恋三年，在试婚纱的时候发现自己被绿了。大学时的朋友给我发了我未婚夫和他白月光在一起吃饭的照片。...
茶点故事阅读 38,137评论 1赞 334
活死人
序言：一个原本活蹦乱跳的男人离奇死亡，死状恐怖，灵堂内的尸体忽然破棺而出，到底是诈尸还是另有隐情，我是刑警宁泽，带...
沈念sama阅读 33,783评论 4赞 324
日本核电站爆炸内幕
正文年R本政府宣布，位于F岛的核电站，受9级特大地震影响，放射性物质发生泄漏。R本人自食恶果不足惜，却给世界环境...
茶点故事阅读 39,343评论 3赞 307
男人毒药：我在死后第九天来索命
文/蒙蒙一、第九天我趴在偏房一处隐蔽的房顶上张望。院中可真热闹，春花似锦、人声如沸。这庄子的主人今日做“春日...
开封第一讲书人阅读 30,333评论 0赞 19
一桩弑父案，背后竟有这般阴谋
文/苍兰香墨我抬头看了看天上的太阳。三九已至，却和暖如春，着一层夹袄步出监牢的瞬间，已是汗流浃背。一阵脚步声响...
开封第一讲书人阅读 31,559评论 1赞 262
情欲美人皮
我被黑心中介骗来泰国打工，没想到刚下飞机就差点儿被人妖公主榨干…… 1. 我叫王不留，地道东北人。一个月前我还...
沈念sama阅读 45,595评论 2赞 355
代替公主和亲
正文我出身青楼，却偏偏与公主长得像，于是被迫代替她去往敌国和亲。传闻我的和亲对象是个残疾皇子，可洞房花烛夜当晚...
茶点故事阅读 42,901评论 2赞 345

MetaGPT智能体开发入门3-订阅智能体(OSS)实践

MetaGPT智能体开发入门3-订阅智能体(OSS)实践

使用MetaGPT实现订阅智能体的步骤

实现

相关配置

代码

发送Discord效果

发送到邮箱

Huggingface Papers页面爬取和总结

Huggingface Papers页面爬取

Huggingface Papers页面总结

推荐阅读更多精彩内容