关于人工智能:浅尝-ChatGLM26B

听闻小名，说了拿下了中文榜单第一名：ChatGLM-6B 第二代模型开源，拿下 LLM 模型中文能力评估榜单第一名

排行榜原地址：https://cevalbenchmark.com/static/leaderboard_zh.html

ChatGLM2-6B 反对

FP16，最厉害，然而须要的资源也最多
int8，中庸
int4，最笨，然而资源也起码

原始代码

from transformers import AutoTokenizer, AutoModel


tokenizer = AutoTokenizer.from_pretrained("THUDM/chatglm2-6b-int4", trust_remote_code=True)
model = AutoModel.from_pretrained("THUDM/chatglm2-6b-int4", trust_remote_code=True).float()


model = model.eval()


response, history = model.chat(tokenizer, "你好", history=[])
print(response)

response, history = model.chat(tokenizer, "早晨睡不着应该怎么办", history=history)
print(response)

因为我是穷鬼，我想看看这个 int4 模型在 cpu 下的速度

耗费的内存：5.3GB
耗费的 CPU：28 core cpu

我的 CPU：Intel(R) Xeon(R) CPU E5-2690 v4 @ 2.60GHz

关于人工智能:浅尝-ChatGLM26B

FP16 火力全开

int8，中庸者

int4，穷鬼套餐

cpu 下的速度测试