L2G3000-LMDeploy 量化摆设实践

饭宝发表于 2025-12-12 18:21:01

LMDeploy 量化摆设实践闯关任务

环境设置

conda create -n lmdeploypython=3.10 -y
conda activate lmdeploy
conda install pytorch==2.1.2 torchvision==0.16.2 torchaudio==2.1.2 pytorch-cuda=12.1 -c pytorch -c nvidia -y
pip install timm==1.0.8 openai==1.40.3 lmdeploy==0.5.3

pip install datasets==2.19.2
创建文件夹并设置开辟机共享目次的软链接。
mkdir /root/models
ln -s /root/share/new_models/Shanghai_AI_Laboratory/internlm2_5-7b-chat /root/models
ln -s /root/share/new_models/Shanghai_AI_Laboratory/internlm2_5-1_8b-chat /root/models
ln -s /root/share/new_models/OpenGVLab/InternVL2-26B /root/models
启动InternLM2_5-1_8b-chat
lmdeploy chat /root/models/internlm2_5-1_8b-chat
https://i-blog.csdnimg.cn/direct/bcab30faa02e40248bb26a26f8ae05eb.png
https://i-blog.csdnimg.cn/direct/3ca9a87916974422a66ec796f93d51ec.png
API摆设
lmdeploy serve api_server \
/root/models/internlm2_5-1_8b-chat \
--model-format hf \
--quant-policy 0 \
--server-name 0.0.0.0 \
--server-port 23333 \
--tp 1
https://i-blog.csdnimg.cn/direct/66e97b7fea7c4ef29637797e1352e104.png
以下令行情势毗连API服务器
关闭http://127.0.0.1:23333网页，但保持终端和本地窗口不动，新建一个终端。
https://i-blog.csdnimg.cn/direct/2a77429e2499404faa7f7ff856178114.png
以Gradio网页情势毗连API服务器
lmdeploy serve gradio http://localhost:23333 \
--server-name 0.0.0.0 \
--server-port 6006
https://i-blog.csdnimg.cn/direct/884216cfd37944bda66ab49733f526c9.png
https://i-blog.csdnimg.cn/direct/7ef77dd5db49469f91a7f0fe45b92ae9.png
W4A16 量化+ KV cache+KV cache 量化

lmdeploy serve api_server \
/root/models/internlm2_5-1_8b-chat-w4a16-4bit/ \
--model-format awq \
--quant-policy 4 \
--cache-max-entry-count 0.4\
--server-name 0.0.0.0 \
--server-port 23333 \
--tp 1
原模子
https://i-blog.csdnimg.cn/direct/17407c08dfbc4063ba06270209cf443d.png
量化后
https://i-blog.csdnimg.cn/direct/9efeb21dece7493a83d4f70286387342.png
量化后做kv cache
lmdeploy serve api_server \
/root/models/internlm2_5-1_8b-chat-w4a16-4bit/ \
--model-format awq \
--quant-policy 4 \
--cache-max-entry-count 0.4\
--server-name 0.0.0.0 \
--server-port 23333 \
--tp 1
https://i-blog.csdnimg.cn/direct/836bb893c18f48dba10828d35d1de440.png
https://i-blog.csdnimg.cn/direct/b473781b74ec4b399d1b1ce2f8887971.png
Function call

conda activate lmdeploy
lmdeploy serve api_server \
/root/models/internlm2_5-7b-chat \
--model-format hf \
--quant-policy 0 \
--server-name 0.0.0.0 \
--server-port 23333 \
--tp 1
touch /root/internlm2_5_func.py
from openai import OpenAI

def add(a: int, b: int):
return a + b

def mul
免责声明：如果侵犯了您的权益，请联系站长，我们会及时删除侵权内容，谢谢合作！qidao123.com:ToB企服之家，中国第一个企服评测及软件市场,开放入驻,技术点评得现金

页: [1]

qidao123.com ToB IT社区-企服评测·应用市场's Archiver

L2G3000-LMDeploy 量化摆设实践