大模型&LLM介绍
参数量大:亿级参数
InternLM是轻量级训练框架,已发布预训练模型InternLM-7B、InternLM-20B。
InternLM-7B
本小节我们将使用 InternStudio 中的 A100(1/4) 机器和 InternLM-Chat-7B
模型部署一个智能对话 Demo。
环境准备
选择英伟达 Cuda11.7 纯净镜像,基于ubuntu预装 Conda
# 创建conda虚拟环境
/root/share/install_conda_env_internlm_base.sh internlm-demo
# 激活conda环境
conda activate internlm-demo
# 升级pip
python -m pip install --upgrade pip
# 安装依赖
pip install modelscope==1.9.5
pip install transformers==4.35.2
pip install streamlit==1.24.0
pip install sentencepiece==0.1.99
pip install accelerate==0.24.1
下载模型
模型大小为 14 GB,下载模型大概需要 10~20 分钟
import torch
from modelscope import snapshot_download, AutoModel, AutoTokenizer
import os
model_dir = snapshot_download('Shanghai_AI_Laboratory/internlm-chat-7b', cache_dir='/root/model', revision='v1.0.3')
模型文件列表(pytorch格式):
(base) root@intern-studio:~# ls /root/model/Shanghai_AI_Laboratory/internlm-chat-7b/
README.md pytorch_model-00002-of-00008.bin pytorch_model.bin.index.json
config.json pytorch_model-00003-of-00008.bin special_tokens_map.json
configuration.json pytorch_model-00004-of-00008.bin tokenization_internlm.py
configuration_internlm.py pytorch_model-00005-of-00008.bin tokenizer.model
generation_config.json pytorch_model-00006-of-00008.bin tokenizer_config.json
modeling_internlm.py pytorch_model-00007-of-00008.bin
pytorch_model-00001-of-00008.bin pytorch_model-00008-of-00008.bin
查看模型的配置信息:
# cat /root/model/Shanghai_AI_Laboratory/internlm-chat-7b/config.json
config.json文件内容:
{
"architectures": [
"InternLMForCausalLM"
],
"auto_map": {
"AutoConfig": "configuration_internlm.InternLMConfig",
"AutoModel": "modeling_internlm.InternLMForCausalLM",
"AutoModelForCausalLM": "modeling_internlm.InternLMForCausalLM"
},
"bias": true,
"bos_token_id": 1,
"eos_token_id": 2,
"hidden_act": "silu",
"hidden_size": 4096,
"initializer_range": 0.02,
"intermediate_size": 11008,
"max_position_embeddings": 2048,
"model_type": "internlm",
"num_attention_heads": 32,
"num_hidden_layers": 32,
"pad_token_id": 2,
"rms_norm_eps": 1e-06,
"tie_word_embeddings": false,
"torch_dtype": "float16",
"transformers_version": "4.33.2",
"use_cache": true,
"vocab_size": 103168
}
命令行demo:
import torch
from transformers import AutoTokenizer, AutoModelForCausalLM
model_name_or_path = "/root/model/Shanghai_AI_Laboratory/internlm-chat-7b"
tokenizer = AutoTokenizer.from_pretrained(model_name_or_path, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(model_name_or_path, trust_remote_code=True, torch_dtype=torch.bfloat16, device_map='auto')
model = model.eval()
system_prompt = """You are an AI assistant whose name is InternLM (书生·浦语).
- InternLM (书生·浦语) is a conversational language model that is developed by Shanghai AI Laboratory (上海人工智能实验室). It is designed to be helpful, honest, and harmless.
- InternLM (书生·浦语) can understand and communicate fluently in the language chosen by the user such as English and 中文.
"""
messages = [(system_prompt, '')]
print("=============Welcome to InternLM chatbot, type 'exit' to exit.=============")
while True:
input_text = input("User >>> ")
input_text = input_text.replace(' ', '')
if input_text == "exit":
break
response, history = model.chat(tokenizer, input_text, history=messages)
messages.append((input_text, response))
print(f"robot >>> {response}")
在终端运行python cli_demo.py
进行对话,输入exit
离开。
web-demo
我们切换到 VScode
中,运行 /root/code/InternLM
目录下的 web_demo.py
文件,输入以下命令后,查看本教程5.2配置本地端口后,将端口映射到本地。在本地浏览器输入 http://127.0.0.1:6006
即可。
bash
conda activate internlm-demo # 首次进入 vscode 会默认是 base 环境,所以首先切换环境
cd /root/code/InternLM
streamlit run web_demo.py --server.address 127.0.0.1 --server.port 6006
Lagent智能体工具调用 Demo
轻量级智能体框架
本小节我们将使用 InternStudio 中的 A100(1/4) 机器、InternLM-Chat-7B
模型和 Lagent
框架部署一个智能工具调用 Demo。
Lagent 是一个轻量级、开源的基于大语言模型的智能体(agent)框架,支持用户快速地将一个大语言模型转变为多种类型的智能体,并提供了一些典型工具为大语言模型赋能。通过 Lagent 框架可以更好的发挥 InternLM 的全部性能。
报错信息
File "/root/.conda/envs/internlm-demo/lib/python3.10/site-packages/transformers/modeling_utils.py", line 3870, in _load_pretrained_model
new_error_msgs, offload_index, state_dict_index = _load_state_dict_into_meta_model(
File "/root/.conda/envs/internlm-demo/lib/python3.10/site-packages/transformers/modeling_utils.py", line 743, in _load_state_dict_into_meta_model
set_module_tensor_to_device(model, param_name, param_device, **set_module_kwargs)
File "/root/.conda/envs/internlm-demo/lib/python3.10/site-packages/accelerate/utils/modeling.py", line 317, in set_module_tensor_to_device
new_value = value.to(device)
torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 86.00 MiB (GPU 0; 19.99 GiB total capacity; 19.42 GiB already allocated; 36.00 MiB free; 19.42 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF
重新测试,加载运行模型成功(Loading checkpoint shards: 100%|███████████████| 8/8 [00:27<00:00, 3.38s/it]),测试完成demo,此时工作机监控指标如下:
CPU4.02%
内存 2.92 / 56 GB5.21%
GPU: Nvidia A100(1/4)0%
显存 15166 / 20470 MiB74.09%
例子1
已知 2x+3=10,求x。
求解成功。
回答正确。
例子2
已知 2x+3y=10,x + 5y = 20, 求x和y。
求解成功。执行结果: [{x: -10/7, y: 30/7}]
回答错误。根据方程组2x+3y=10和x+5y=20,我们可以使用消元法求解得到x=2,y=4。
要求重新回答仍然错误。
根据方程组2x+3y=10和x+5y=20,我们可以使用消元法求解得到x=5,y=4。
图文demo
浦语·灵笔图文理解创作 Demo
本小节我们将使用 InternStudio 中的 A100(1/4) * 2 机器和 internlm-xcomposer-7b
模型部署一个图文理解创作 Demo 。
InternLM-Xcomposer-7B模型
环境配置
pip换源
conda换源
模型下载
三种方式
huggingface-cli
OpenXlab
Modelscope
实践
A100数据中心显卡