容器【chatgpt】使用docker运行chatglm3，原生支持工具调用（Function Call）、代码执行（Code Interpreter）和 Agent 任务，可以本地运行啦

vengomo 博客 2024-06-10 2 0

1，项目地址

https://github.com/THUDM/ChatGLM3

介绍 ChatGLM3-6B 是 ChatGLM 系列最新一代的开源模型，在保留了前两代模型对话流畅、部署门槛低等众多优秀特性的基础上，ChatGLM3-6B 引入了如下特性：

更强大的基础模型： ChatGLM3-6B 的基础模型 ChatGLM3-6B-Base 采用了更多样的训练数据、更充分的训练步数和更合理的训练策略。在语义、数学、推理、代码、知识等不同角度的数据集上测评显示，ChatGLM3-6B-Base 具有在 10B 以下的预训练模型中最强的性能。更完整的功能支持： ChatGLM3-6B 采用了全新设计的 Prompt 格式，除正常的多轮对话外。同时原生支持工具调用（Function Call）、代码执行（Code Interpreter）和 Agent 任务等复杂场景。更全面的开源序列：除了对话模型 ChatGLM3-6B 外，还开源了基础模型 ChatGLM-6B-Base、长文本对话模型 ChatGLM3-6B-32K。以上所有权重对学术研究完全开放，在填写问卷进行登记后亦允许免费商业使用。

详细视频地址： https://www.bilibili.com/video/BV1D84y1X7d1/?vd_source=4b290247452adda4e56d84b659b0c8a2

使用docker本地运行chatglm3，原生支持工具调用（Function Call）、代码执行（Code Interpreter）和 Agent 任务

2，最重要的是模型下载，项目下载

国内访问不了 Hugging Face Hub 所以模型发放到阿里的 modelscope 上。

https://modelscope.cn/models/ZhipuAI/chatglm3-6b

首先需要下载本仓库：

git clone https://github.com/THUDM/ChatGLM3

要是网速慢可以使用： git clone https://ghproxy.com/https://github.com/THUDM/ChatGLM3

下载模型，和项目放到一起，一共24G

cd ChatGLM3

git clone https://www.modelscope.cn/ZhipuAI/chatglm3-6b.git chatglm3-6b-models

下载模型需要等待下

24G chatglm3-6b-models/

下载完成之后可以删除下 .git 文件夹，可以减少空间。

12G .git

ls -lha

total 12G

drwxrwxr-x 3 test test 4.0K 10月 28 23:49 .

drwxrwxr-x 7 test test 4.0K 10月 28 23:56 ..

-rw-rw-r-- 1 test test 1.3K 10月 28 23:09 config.json

-rw-rw-r-- 1 test test 2.3K 10月 28 23:09 configuration_chatglm.py

-rw-rw-r-- 1 test test 40 10月 28 23:09 configuration.json

drwxrwxr-x 9 test test 4.0K 10月 28 23:50 .git

-rw-rw-r-- 1 test test 1.5K 10月 28 23:09 .gitattributes

-rw-rw-r-- 1 test test 55K 10月 28 23:09 modeling_chatglm.py

-rw-rw-r-- 1 test test 4.1K 10月 28 23:09 MODEL_LICENSE

-rw-rw-r-- 1 test test 1.8G 10月 28 23:48 pytorch_model-00001-of-00007.bin

-rw-rw-r-- 1 test test 1.9G 10月 28 23:49 pytorch_model-00002-of-00007.bin

-rw-rw-r-- 1 test test 1.8G 10月 28 23:49 pytorch_model-00003-of-00007.bin

-rw-rw-r-- 1 test test 1.7G 10月 28 23:46 pytorch_model-00004-of-00007.bin

-rw-rw-r-- 1 test test 1.9G 10月 28 23:50 pytorch_model-00005-of-00007.bin

-rw-rw-r-- 1 test test 1.8G 10月 28 23:43 pytorch_model-00006-of-00007.bin

-rw-rw-r-- 1 test test 1005M 10月 28 23:36 pytorch_model-00007-of-00007.bin

-rw-rw-r-- 1 test test 20K 10月 28 23:09 pytorch_model.bin.index.json

-rw-rw-r-- 1 test test 15K 10月 28 23:09 quantization.py

-rw-rw-r-- 1 test test 4.4K 10月 28 23:09 README.md

-rw-rw-r-- 1 test test 12K 10月 28 23:09 tokenization_chatglm.py

-rw-rw-r-- 1 test test 244 10月 28 23:09 tokenizer_config.json

-rw-rw-r-- 1 test test 995K 10月 28 23:09 tokenizer.model

3，使用nvidia 镜像启动

docker run -itd --name chatglm3 -v `pwd`/ChatGLM3:/data \

--gpus=all -e NVIDIA_DRIVER_CAPABILITIES=compute,utility -e NVIDIA_VISIBLE_DEVICES=all \

-p 8501:8501 pytorch/pytorch:2.0.1-cuda11.7-cudnn8-devel

docker exec -it chatglm3 bash

cd /data

pip3 config set global.index-url https://mirrors.aliyun.com/pypi/simple

pip3 config set install.trusted-host mirrors.aliyun.com

pip3 install -r requirements.txt

streamlit run web_demo2.py

修改代码：

默认情况下，模型以 FP16 精度加载，运行上述代码需要大概 13GB 显存。如果你的 GPU 显存有限，可以尝试以量化方式加载模型，使用方法如下：

model = AutoModel.from_pretrained("THUDM/chatglm3-6b",trust_remote_code=True).quantize(4).cuda()

修改模型路径和模型int4

否则会报错：

torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 108.00 MiB (GPU 0; 10.91 GiB total capacity; 10.67 GiB already allocated; 49.00 MiB free; 10.68 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

web 运行效果：

streamlit run web_demo2.py

Loading checkpoint shards: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 7/7 [00:45<00:00, 6.49s/it]

查看占用显存4G：

| 0 N/A N/A 16365 C /opt/conda/bin/python 4428MiB |

自己回答了是chatglm3 的版本。

运行画爱心的程序

https://github.com/THUDM/ChatGLM3/blob/main/composite_demo/README.md

修改 87 行：

87 self.model = AutoModel.from_pretrained(model_path, trust_remote_code=True).quantize(8).to(

cd composite_demo

pip3 install -r requirements.txt

ipython kernel install --name chatglm3-demo --user

streamlit run main.py

然后就可以看到了：

使用工具，调用天气函数：可以直接画一个爱心：椭圆没有画成功：正弦曲线可以：

画正方形的时候，程序报错了，重新执行：

总结

chatglm3，原生支持工具调用（Function Call）、代码执行（Code Interpreter）和 Agent 任务，可以本地运行。这样就可以开发更丰富的应用了。

精彩内容

评论可见，请评论后查看内容，谢谢！！！评论后请刷新页面。

本文由用户于 2024-06-10 发布在夸智网，如有疑问，请联系我们。
本文链接：https://www.kuazhi.com/post/712787898.html

夸智网

容器【chatgpt】使用docker运行chatglm3，原生支持工具调用（Function Call）、代码执行（Code Interpreter）和 Agent 任务，可以本地运行啦

服务器运维 Centos系统磁盘占用:/dev/vda1占用100%空间不足处理与解决思路（实战docker占用空间太大）

spring boot java Docker 部署 SpringBoot 的两种方法，后一种一键部署超好用！

发表评论取消回复

夸智网

容器 【chatgpt】使用docker运行chatglm3，原生支持工具调用（Function Call）、代码执行（Code Interpreter）和 Agent 任务，可以本地运行啦

服务器 运维 Centos系统磁盘占用:/dev/vda1占用100%空间不足处理与解决思路（实战docker占用空间太大）

spring boot java Docker 部署 SpringBoot 的两种方法，后一种一键部署超好用！

相关文章

发表评论取消回复

容器【chatgpt】使用docker运行chatglm3，原生支持工具调用（Function Call）、代码执行（Code Interpreter）和 Agent 任务，可以本地运行啦

服务器运维 Centos系统磁盘占用:/dev/vda1占用100%空间不足处理与解决思路（实战docker占用空间太大）