OpenAI 公司基于 GPT 模型的 ChatGPT 景色无两,眼看它起朱楼,眼看它宴宾客,FaceBook 终于坐不住了,公布了同样基于 LLM 的人工智能大语言模型 LLaMA,号称蕴含 70 亿、130 亿、330 亿和 650 亿这 4 种参数规模的模型,参数是指神经网络中的权重和偏置等可调整的变量,用于训练和优化神经网络的性能,70 亿意味着神经网络中有 70 亿个参数,由此类推。
在一些大型神经网络中,每个参数须要应用 32 位或 64 位浮点数进行存储,这意味着每个参数须要占用 4 字节或 8 字节的存储空间。因而,对于蕴含 70 亿个参数的神经网络,其存储空间将别离为 8 GB 或 12GB。
此外,神经网络的大小不仅取决于参数的数量,还取决于神经元的数目,层数和其余构造参数等。因而,70 亿的神经网络可能会占用更多的存储空间,具体取决于网络的构造和实现细节。
因而这种体量的模型单机跑相对够咱们喝一壶,所以本次应用最小的 LLaMA 7B 模型进行测试。
LLaMA 我的项目装置和模型配置
和 Stable-Diffusion 我的项目一模一样,FaceBook 开源的 LLaMA 我的项目默认写死应用 cuda 模式,这也就意味着必须有 NVIDIA 的 GPU 来训练和运行,不过好在大神 GeorgiGerganov 用 C++ 基于 LLaMA 我的项目重写了一个跑在 CPU 上的移植版本 llama.cpp 利用。
llama.cpp 首先适配的就是苹果的 M 系列芯片,这对于果粉来说无疑是一个重大利好,首先通过命令拉取 C ++ 版本的 LLaMA 我的项目:
git clone https://github.com/ggerganov/llama.cpp
随后进入我的项目目录:
llama.cpp
在我的项目中,须要独自建设一个模型文件夹 models:
mkdir models
随后去 huggingface 官网下载 LLaMA 的 7B 模型文件:https://huggingface.co/nyanko7/LLaMA-7B/tree/main
是的,主模型文件曾经达到了 13.5gb 之巨,如果本地硬盘空间告急,请审慎下载。
随后在 models 目录建设模型子目录 7B:
mkdir 7B
将 tokenizer.model 和 tokenizer\_checklist.chk 放入和 7B 平行的目录中:
➜ models git:(master) ✗ ls
7B tokenizer.model tokenizer_checklist.chk
随后将 checklist.chk consolidated.00.pth 和 params.json 放入 7B 目录中:
➜ 7B git:(master) ✗ ls
checklist.chk consolidated.00.pth params.json
至此,模型就配置好了。
LLaMA 模型转换
因为咱们没有应用 FaceBook 的原版我的项目,所以它的模型还须要进行转换,也就是转换为以后 C ++ 版本的 LLaMA 能够运行的模型。
这里通过 Python 脚本进行转换操作:
python3 convert-pth-to-ggml.py models/7B/ 1
第一个参数是模型所在目录,第二个参数为转换时应用的浮点类型,应用 float32,转换的后果文件会大一倍,当该参数值为 1 时,则应用 float16 这个默认值,这里咱们应用默认数据类型。
程序输入:
➜ llama.cpp git:(master) ✗ python convert-pth-to-ggml.py models/7B/ 1
{'dim': 4096, 'multiple_of': 256, 'n_heads': 32, 'n_layers': 32, 'norm_eps': 1e-06, 'vocab_size': -1}
n_parts = 1
Processing part 0
Processing variable: tok_embeddings.weight with shape: torch.Size([32000, 4096]) and type: torch.float16
Processing variable: norm.weight with shape: torch.Size([4096]) and type: torch.float16
Converting to float32
Processing variable: output.weight with shape: torch.Size([32000, 4096]) and type: torch.float16
Processing variable: layers.0.attention.wq.weight with shape: torch.Size([4096, 4096]) and type: torch.float16
Processing variable: layers.0.attention.wk.weight with shape: torch.Size([4096, 4096]) and type: torch.float16
Processing variable: layers.0.attention.wv.weight with shape: torch.Size([4096, 4096]) and type: torch.float16
Processing variable: layers.0.attention.wo.weight with shape: torch.Size([4096, 4096]) and type: torch.float16
Processing variable: layers.0.feed_forward.w1.weight with shape: torch.Size([11008, 4096]) and type: torch.float16
Processing variable: layers.0.feed_forward.w2.weight with shape: torch.Size([4096, 11008]) and type: torch.float16
Processing variable: layers.0.feed_forward.w3.weight with shape: torch.Size([11008, 4096]) and type: torch.float16
Processing variable: layers.0.attention_norm.weight with shape: torch.Size([4096]) and type: torch.float16
Converting to float32
Processing variable: layers.0.ffn_norm.weight with shape: torch.Size([4096]) and type: torch.float16
Converting to float32
Processing variable: layers.1.attention.wq.weight with shape: torch.Size([4096, 4096]) and type: torch.float16
Processing variable: layers.1.attention.wk.weight with shape: torch.Size([4096, 4096]) and type: torch.float16
Processing variable: layers.1.attention.wv.weight with shape: torch.Size([4096, 4096]) and type: torch.float16
Processing variable: layers.1.attention.wo.weight with shape: torch.Size([4096, 4096]) and type: torch.float16
Processing variable: layers.1.feed_forward.w1.weight with shape: torch.Size([11008, 4096]) and type: torch.float16
Processing variable: layers.1.feed_forward.w2.weight with shape: torch.Size([4096, 11008]) and type: torch.float16
Processing variable: layers.1.feed_forward.w3.weight with shape: torch.Size([11008, 4096]) and type: torch.float16
Processing variable: layers.1.attention_norm.weight with shape: torch.Size([4096]) and type: torch.float16
Converting to float32
Processing variable: layers.1.ffn_norm.weight with shape: torch.Size([4096]) and type: torch.float16
Converting to float32
Processing variable: layers.2.attention.wq.weight with shape: torch.Size([4096, 4096]) and type: torch.float16
Processing variable: layers.2.attention.wk.weight with shape: torch.Size([4096, 4096]) and type: torch.float16
Processing variable: layers.2.attention.wv.weight with shape: torch.Size([4096, 4096]) and type: torch.float16
Processing variable: layers.2.attention.wo.weight with shape: torch.Size([4096, 4096]) and type: torch.float16
Processing variable: layers.2.feed_forward.w1.weight with shape: torch.Size([11008, 4096]) and type: torch.float16
Processing variable: layers.2.feed_forward.w2.weight with shape: torch.Size([4096, 11008]) and type: torch.float16
Processing variable: layers.2.feed_forward.w3.weight with shape: torch.Size([11008, 4096]) and type: torch.float16
Processing variable: layers.2.attention_norm.weight with shape: torch.Size([4096]) and type: torch.float16
Converting to float32
Processing variable: layers.2.ffn_norm.weight with shape: torch.Size([4096]) and type: torch.float16
Converting to float32
Processing variable: layers.3.attention.wq.weight with shape: torch.Size([4096, 4096]) and type: torch.float16
Processing variable: layers.3.attention.wk.weight with shape: torch.Size([4096, 4096]) and type: torch.float16
Processing variable: layers.3.attention.wv.weight with shape: torch.Size([4096, 4096]) and type: torch.float16
Processing variable: layers.3.attention.wo.weight with shape: torch.Size([4096, 4096]) and type: torch.float16
Processing variable: layers.3.feed_forward.w1.weight with shape: torch.Size([11008, 4096]) and type: torch.float16
Processing variable: layers.3.feed_forward.w2.weight with shape: torch.Size([4096, 11008]) and type: torch.float16
Processing variable: layers.3.feed_forward.w3.weight with shape: torch.Size([11008, 4096]) and type: torch.float16
Processing variable: layers.3.attention_norm.weight with shape: torch.Size([4096]) and type: torch.float16
Converting to float32
Processing variable: layers.3.ffn_norm.weight with shape: torch.Size([4096]) and type: torch.float16
Converting to float32
Processing variable: layers.4.attention.wq.weight with shape: torch.Size([4096, 4096]) and type: torch.float16
Processing variable: layers.4.attention.wk.weight with shape: torch.Size([4096, 4096]) and type: torch.float16
Processing variable: layers.4.attention.wv.weight with shape: torch.Size([4096, 4096]) and type: torch.float16
Processing variable: layers.4.attention.wo.weight with shape: torch.Size([4096, 4096]) and type: torch.float16
Processing variable: layers.4.feed_forward.w1.weight with shape: torch.Size([11008, 4096]) and type: torch.float16
Processing variable: layers.4.feed_forward.w2.weight with shape: torch.Size([4096, 11008]) and type: torch.float16
Processing variable: layers.4.feed_forward.w3.weight with shape: torch.Size([11008, 4096]) and type: torch.float16
Processing variable: layers.4.attention_norm.weight with shape: torch.Size([4096]) and type: torch.float16
Converting to float32
Processing variable: layers.4.ffn_norm.weight with shape: torch.Size([4096]) and type: torch.float16
Converting to float32
Processing variable: layers.5.attention.wq.weight with shape: torch.Size([4096, 4096]) and type: torch.float16
Processing variable: layers.5.attention.wk.weight with shape: torch.Size([4096, 4096]) and type: torch.float16
Processing variable: layers.5.attention.wv.weight with shape: torch.Size([4096, 4096]) and type: torch.float16
Processing variable: layers.5.attention.wo.weight with shape: torch.Size([4096, 4096]) and type: torch.float16
Processing variable: layers.5.feed_forward.w1.weight with shape: torch.Size([11008, 4096]) and type: torch.float16
Processing variable: layers.5.feed_forward.w2.weight with shape: torch.Size([4096, 11008]) and type: torch.float16
Processing variable: layers.5.feed_forward.w3.weight with shape: torch.Size([11008, 4096]) and type: torch.float16
Processing variable: layers.5.attention_norm.weight with shape: torch.Size([4096]) and type: torch.float16
Converting to float32
Processing variable: layers.5.ffn_norm.weight with shape: torch.Size([4096]) and type: torch.float16
Converting to float32
Processing variable: layers.6.attention.wq.weight with shape: torch.Size([4096, 4096]) and type: torch.float16
Processing variable: layers.6.attention.wk.weight with shape: torch.Size([4096, 4096]) and type: torch.float16
Processing variable: layers.6.attention.wv.weight with shape: torch.Size([4096, 4096]) and type: torch.float16
Processing variable: layers.6.attention.wo.weight with shape: torch.Size([4096, 4096]) and type: torch.float16
Processing variable: layers.6.feed_forward.w1.weight with shape: torch.Size([11008, 4096]) and type: torch.float16
Processing variable: layers.6.feed_forward.w2.weight with shape: torch.Size([4096, 11008]) and type: torch.float16
Processing variable: layers.6.feed_forward.w3.weight with shape: torch.Size([11008, 4096]) and type: torch.float16
Processing variable: layers.6.attention_norm.weight with shape: torch.Size([4096]) and type: torch.float16
Converting to float32
Processing variable: layers.6.ffn_norm.weight with shape: torch.Size([4096]) and type: torch.float16
Converting to float32
Processing variable: layers.7.attention.wq.weight with shape: torch.Size([4096, 4096]) and type: torch.float16
Processing variable: layers.7.attention.wk.weight with shape: torch.Size([4096, 4096]) and type: torch.float16
Processing variable: layers.7.attention.wv.weight with shape: torch.Size([4096, 4096]) and type: torch.float16
Processing variable: layers.7.attention.wo.weight with shape: torch.Size([4096, 4096]) and type: torch.float16
Processing variable: layers.7.feed_forward.w1.weight with shape: torch.Size([11008, 4096]) and type: torch.float16
Processing variable: layers.7.feed_forward.w2.weight with shape: torch.Size([4096, 11008]) and type: torch.float16
Processing variable: layers.7.feed_forward.w3.weight with shape: torch.Size([11008, 4096]) and type: torch.float16
Processing variable: layers.7.attention_norm.weight with shape: torch.Size([4096]) and type: torch.float16
Converting to float32
Processing variable: layers.7.ffn_norm.weight with shape: torch.Size([4096]) and type: torch.float16
Converting to float32
Processing variable: layers.8.attention.wq.weight with shape: torch.Size([4096, 4096]) and type: torch.float16
Processing variable: layers.8.attention.wk.weight with shape: torch.Size([4096, 4096]) and type: torch.float16
Processing variable: layers.8.attention.wv.weight with shape: torch.Size([4096, 4096]) and type: torch.float16
Processing variable: layers.8.attention.wo.weight with shape: torch.Size([4096, 4096]) and type: torch.float16
Processing variable: layers.8.feed_forward.w1.weight with shape: torch.Size([11008, 4096]) and type: torch.float16
Processing variable: layers.8.feed_forward.w2.weight with shape: torch.Size([4096, 11008]) and type: torch.float16
Processing variable: layers.8.feed_forward.w3.weight with shape: torch.Size([11008, 4096]) and type: torch.float16
Processing variable: layers.8.attention_norm.weight with shape: torch.Size([4096]) and type: torch.float16
Converting to float32
Processing variable: layers.8.ffn_norm.weight with shape: torch.Size([4096]) and type: torch.float16
Converting to float32
Processing variable: layers.9.attention.wq.weight with shape: torch.Size([4096, 4096]) and type: torch.float16
Processing variable: layers.9.attention.wk.weight with shape: torch.Size([4096, 4096]) and type: torch.float16
Processing variable: layers.9.attention.wv.weight with shape: torch.Size([4096, 4096]) and type: torch.float16
Processing variable: layers.9.attention.wo.weight with shape: torch.Size([4096, 4096]) and type: torch.float16
Processing variable: layers.9.feed_forward.w1.weight with shape: torch.Size([11008, 4096]) and type: torch.float16
Processing variable: layers.9.feed_forward.w2.weight with shape: torch.Size([4096, 11008]) and type: torch.float16
Processing variable: layers.9.feed_forward.w3.weight with shape: torch.Size([11008, 4096]) and type: torch.float16
Processing variable: layers.9.attention_norm.weight with shape: torch.Size([4096]) and type: torch.float16
Converting to float32
Processing variable: layers.9.ffn_norm.weight with shape: torch.Size([4096]) and type: torch.float16
Converting to float32
Processing variable: layers.10.attention.wq.weight with shape: torch.Size([4096, 4096]) and type: torch.float16
Processing variable: layers.10.attention.wk.weight with shape: torch.Size([4096, 4096]) and type: torch.float16
Processing variable: layers.10.attention.wv.weight with shape: torch.Size([4096, 4096]) and type: torch.float16
Processing variable: layers.10.attention.wo.weight with shape: torch.Size([4096, 4096]) and type: torch.float16
Processing variable: layers.10.feed_forward.w1.weight with shape: torch.Size([11008, 4096]) and type: torch.float16
Processing variable: layers.10.feed_forward.w2.weight with shape: torch.Size([4096, 11008]) and type: torch.float16
Processing variable: layers.10.feed_forward.w3.weight with shape: torch.Size([11008, 4096]) and type: torch.float16
Processing variable: layers.10.attention_norm.weight with shape: torch.Size([4096]) and type: torch.float16
Converting to float32
Processing variable: layers.10.ffn_norm.weight with shape: torch.Size([4096]) and type: torch.float16
Converting to float32
Processing variable: layers.11.attention.wq.weight with shape: torch.Size([4096, 4096]) and type: torch.float16
Processing variable: layers.11.attention.wk.weight with shape: torch.Size([4096, 4096]) and type: torch.float16
Processing variable: layers.11.attention.wv.weight with shape: torch.Size([4096, 4096]) and type: torch.float16
Processing variable: layers.11.attention.wo.weight with shape: torch.Size([4096, 4096]) and type: torch.float16
Processing variable: layers.11.feed_forward.w1.weight with shape: torch.Size([11008, 4096]) and type: torch.float16
Processing variable: layers.11.feed_forward.w2.weight with shape: torch.Size([4096, 11008]) and type: torch.float16
Processing variable: layers.11.feed_forward.w3.weight with shape: torch.Size([11008, 4096]) and type: torch.float16
Processing variable: layers.11.attention_norm.weight with shape: torch.Size([4096]) and type: torch.float16
Converting to float32
Processing variable: layers.11.ffn_norm.weight with shape: torch.Size([4096]) and type: torch.float16
Converting to float32
Processing variable: layers.12.attention.wq.weight with shape: torch.Size([4096, 4096]) and type: torch.float16
Processing variable: layers.12.attention.wk.weight with shape: torch.Size([4096, 4096]) and type: torch.float16
Processing variable: layers.12.attention.wv.weight with shape: torch.Size([4096, 4096]) and type: torch.float16
Processing variable: layers.12.attention.wo.weight with shape: torch.Size([4096, 4096]) and type: torch.float16
Processing variable: layers.12.feed_forward.w1.weight with shape: torch.Size([11008, 4096]) and type: torch.float16
Processing variable: layers.12.feed_forward.w2.weight with shape: torch.Size([4096, 11008]) and type: torch.float16
Processing variable: layers.12.feed_forward.w3.weight with shape: torch.Size([11008, 4096]) and type: torch.float16
Processing variable: layers.12.attention_norm.weight with shape: torch.Size([4096]) and type: torch.float16
Converting to float32
Processing variable: layers.12.ffn_norm.weight with shape: torch.Size([4096]) and type: torch.float16
Converting to float32
Processing variable: layers.13.attention.wq.weight with shape: torch.Size([4096, 4096]) and type: torch.float16
Processing variable: layers.13.attention.wk.weight with shape: torch.Size([4096, 4096]) and type: torch.float16
Processing variable: layers.13.attention.wv.weight with shape: torch.Size([4096, 4096]) and type: torch.float16
Processing variable: layers.13.attention.wo.weight with shape: torch.Size([4096, 4096]) and type: torch.float16
Processing variable: layers.13.feed_forward.w1.weight with shape: torch.Size([11008, 4096]) and type: torch.float16
Processing variable: layers.13.feed_forward.w2.weight with shape: torch.Size([4096, 11008]) and type: torch.float16
Processing variable: layers.13.feed_forward.w3.weight with shape: torch.Size([11008, 4096]) and type: torch.float16
Processing variable: layers.13.attention_norm.weight with shape: torch.Size([4096]) and type: torch.float16
Converting to float32
Processing variable: layers.13.ffn_norm.weight with shape: torch.Size([4096]) and type: torch.float16
Converting to float32
Processing variable: layers.14.attention.wq.weight with shape: torch.Size([4096, 4096]) and type: torch.float16
Processing variable: layers.14.attention.wk.weight with shape: torch.Size([4096, 4096]) and type: torch.float16
Processing variable: layers.14.attention.wv.weight with shape: torch.Size([4096, 4096]) and type: torch.float16
Processing variable: layers.14.attention.wo.weight with shape: torch.Size([4096, 4096]) and type: torch.float16
Processing variable: layers.14.feed_forward.w1.weight with shape: torch.Size([11008, 4096]) and type: torch.float16
Processing variable: layers.14.feed_forward.w2.weight with shape: torch.Size([4096, 11008]) and type: torch.float16
Processing variable: layers.14.feed_forward.w3.weight with shape: torch.Size([11008, 4096]) and type: torch.float16
Processing variable: layers.14.attention_norm.weight with shape: torch.Size([4096]) and type: torch.float16
Converting to float32
Processing variable: layers.14.ffn_norm.weight with shape: torch.Size([4096]) and type: torch.float16
Converting to float32
Processing variable: layers.15.attention.wq.weight with shape: torch.Size([4096, 4096]) and type: torch.float16
Processing variable: layers.15.attention.wk.weight with shape: torch.Size([4096, 4096]) and type: torch.float16
Processing variable: layers.15.attention.wv.weight with shape: torch.Size([4096, 4096]) and type: torch.float16
Processing variable: layers.15.attention.wo.weight with shape: torch.Size([4096, 4096]) and type: torch.float16
Processing variable: layers.15.feed_forward.w1.weight with shape: torch.Size([11008, 4096]) and type: torch.float16
Processing variable: layers.15.feed_forward.w2.weight with shape: torch.Size([4096, 11008]) and type: torch.float16
Processing variable: layers.15.feed_forward.w3.weight with shape: torch.Size([11008, 4096]) and type: torch.float16
Processing variable: layers.15.attention_norm.weight with shape: torch.Size([4096]) and type: torch.float16
Converting to float32
Processing variable: layers.15.ffn_norm.weight with shape: torch.Size([4096]) and type: torch.float16
Converting to float32
Processing variable: layers.16.attention.wq.weight with shape: torch.Size([4096, 4096]) and type: torch.float16
Processing variable: layers.16.attention.wk.weight with shape: torch.Size([4096, 4096]) and type: torch.float16
Processing variable: layers.16.attention.wv.weight with shape: torch.Size([4096, 4096]) and type: torch.float16
Processing variable: layers.16.attention.wo.weight with shape: torch.Size([4096, 4096]) and type: torch.float16
Processing variable: layers.16.feed_forward.w1.weight with shape: torch.Size([11008, 4096]) and type: torch.float16
Processing variable: layers.16.feed_forward.w2.weight with shape: torch.Size([4096, 11008]) and type: torch.float16
Processing variable: layers.16.feed_forward.w3.weight with shape: torch.Size([11008, 4096]) and type: torch.float16
Processing variable: layers.16.attention_norm.weight with shape: torch.Size([4096]) and type: torch.float16
Converting to float32
Processing variable: layers.16.ffn_norm.weight with shape: torch.Size([4096]) and type: torch.float16
Converting to float32
Processing variable: layers.17.attention.wq.weight with shape: torch.Size([4096, 4096]) and type: torch.float16
Processing variable: layers.17.attention.wk.weight with shape: torch.Size([4096, 4096]) and type: torch.float16
Processing variable: layers.17.attention.wv.weight with shape: torch.Size([4096, 4096]) and type: torch.float16
Processing variable: layers.17.attention.wo.weight with shape: torch.Size([4096, 4096]) and type: torch.float16
Processing variable: layers.17.feed_forward.w1.weight with shape: torch.Size([11008, 4096]) and type: torch.float16
Processing variable: layers.17.feed_forward.w2.weight with shape: torch.Size([4096, 11008]) and type: torch.float16
Processing variable: layers.17.feed_forward.w3.weight with shape: torch.Size([11008, 4096]) and type: torch.float16
Processing variable: layers.17.attention_norm.weight with shape: torch.Size([4096]) and type: torch.float16
Converting to float32
Processing variable: layers.17.ffn_norm.weight with shape: torch.Size([4096]) and type: torch.float16
Converting to float32
Processing variable: layers.18.attention.wq.weight with shape: torch.Size([4096, 4096]) and type: torch.float16
Processing variable: layers.18.attention.wk.weight with shape: torch.Size([4096, 4096]) and type: torch.float16
Processing variable: layers.18.attention.wv.weight with shape: torch.Size([4096, 4096]) and type: torch.float16
Processing variable: layers.18.attention.wo.weight with shape: torch.Size([4096, 4096]) and type: torch.float16
Processing variable: layers.18.feed_forward.w1.weight with shape: torch.Size([11008, 4096]) and type: torch.float16
Processing variable: layers.18.feed_forward.w2.weight with shape: torch.Size([4096, 11008]) and type: torch.float16
Processing variable: layers.18.feed_forward.w3.weight with shape: torch.Size([11008, 4096]) and type: torch.float16
Processing variable: layers.18.attention_norm.weight with shape: torch.Size([4096]) and type: torch.float16
Converting to float32
Processing variable: layers.18.ffn_norm.weight with shape: torch.Size([4096]) and type: torch.float16
Converting to float32
Processing variable: layers.19.attention.wq.weight with shape: torch.Size([4096, 4096]) and type: torch.float16
Processing variable: layers.19.attention.wk.weight with shape: torch.Size([4096, 4096]) and type: torch.float16
Processing variable: layers.19.attention.wv.weight with shape: torch.Size([4096, 4096]) and type: torch.float16
Processing variable: layers.19.attention.wo.weight with shape: torch.Size([4096, 4096]) and type: torch.float16
Processing variable: layers.19.feed_forward.w1.weight with shape: torch.Size([11008, 4096]) and type: torch.float16
Processing variable: layers.19.feed_forward.w2.weight with shape: torch.Size([4096, 11008]) and type: torch.float16
Processing variable: layers.19.feed_forward.w3.weight with shape: torch.Size([11008, 4096]) and type: torch.float16
Processing variable: layers.19.attention_norm.weight with shape: torch.Size([4096]) and type: torch.float16
Converting to float32
Processing variable: layers.19.ffn_norm.weight with shape: torch.Size([4096]) and type: torch.float16
Converting to float32
Processing variable: layers.20.attention.wq.weight with shape: torch.Size([4096, 4096]) and type: torch.float16
Processing variable: layers.20.attention.wk.weight with shape: torch.Size([4096, 4096]) and type: torch.float16
Processing variable: layers.20.attention.wv.weight with shape: torch.Size([4096, 4096]) and type: torch.float16
Processing variable: layers.20.attention.wo.weight with shape: torch.Size([4096, 4096]) and type: torch.float16
Processing variable: layers.20.feed_forward.w1.weight with shape: torch.Size([11008, 4096]) and type: torch.float16
Processing variable: layers.20.feed_forward.w2.weight with shape: torch.Size([4096, 11008]) and type: torch.float16
Processing variable: layers.20.feed_forward.w3.weight with shape: torch.Size([11008, 4096]) and type: torch.float16
Processing variable: layers.20.attention_norm.weight with shape: torch.Size([4096]) and type: torch.float16
Converting to float32
Processing variable: layers.20.ffn_norm.weight with shape: torch.Size([4096]) and type: torch.float16
Converting to float32
Processing variable: layers.21.attention.wq.weight with shape: torch.Size([4096, 4096]) and type: torch.float16
Processing variable: layers.21.attention.wk.weight with shape: torch.Size([4096, 4096]) and type: torch.float16
Processing variable: layers.21.attention.wv.weight with shape: torch.Size([4096, 4096]) and type: torch.float16
Processing variable: layers.21.attention.wo.weight with shape: torch.Size([4096, 4096]) and type: torch.float16
Processing variable: layers.21.feed_forward.w1.weight with shape: torch.Size([11008, 4096]) and type: torch.float16
Processing variable: layers.21.feed_forward.w2.weight with shape: torch.Size([4096, 11008]) and type: torch.float16
Processing variable: layers.21.feed_forward.w3.weight with shape: torch.Size([11008, 4096]) and type: torch.float16
Processing variable: layers.21.attention_norm.weight with shape: torch.Size([4096]) and type: torch.float16
Converting to float32
Processing variable: layers.21.ffn_norm.weight with shape: torch.Size([4096]) and type: torch.float16
Converting to float32
Processing variable: layers.22.attention.wq.weight with shape: torch.Size([4096, 4096]) and type: torch.float16
Processing variable: layers.22.attention.wk.weight with shape: torch.Size([4096, 4096]) and type: torch.float16
Processing variable: layers.22.attention.wv.weight with shape: torch.Size([4096, 4096]) and type: torch.float16
Processing variable: layers.22.attention.wo.weight with shape: torch.Size([4096, 4096]) and type: torch.float16
Processing variable: layers.22.feed_forward.w1.weight with shape: torch.Size([11008, 4096]) and type: torch.float16
Processing variable: layers.22.feed_forward.w2.weight with shape: torch.Size([4096, 11008]) and type: torch.float16
Processing variable: layers.22.feed_forward.w3.weight with shape: torch.Size([11008, 4096]) and type: torch.float16
Processing variable: layers.22.attention_norm.weight with shape: torch.Size([4096]) and type: torch.float16
Converting to float32
Processing variable: layers.22.ffn_norm.weight with shape: torch.Size([4096]) and type: torch.float16
Converting to float32
Processing variable: layers.23.attention.wq.weight with shape: torch.Size([4096, 4096]) and type: torch.float16
Processing variable: layers.23.attention.wk.weight with shape: torch.Size([4096, 4096]) and type: torch.float16
Processing variable: layers.23.attention.wv.weight with shape: torch.Size([4096, 4096]) and type: torch.float16
Processing variable: layers.23.attention.wo.weight with shape: torch.Size([4096, 4096]) and type: torch.float16
Processing variable: layers.23.feed_forward.w1.weight with shape: torch.Size([11008, 4096]) and type: torch.float16
Processing variable: layers.23.feed_forward.w2.weight with shape: torch.Size([4096, 11008]) and type: torch.float16
Processing variable: layers.23.feed_forward.w3.weight with shape: torch.Size([11008, 4096]) and type: torch.float16
Processing variable: layers.23.attention_norm.weight with shape: torch.Size([4096]) and type: torch.float16
Converting to float32
Processing variable: layers.23.ffn_norm.weight with shape: torch.Size([4096]) and type: torch.float16
Converting to float32
Processing variable: layers.24.attention.wq.weight with shape: torch.Size([4096, 4096]) and type: torch.float16
Processing variable: layers.24.attention.wk.weight with shape: torch.Size([4096, 4096]) and type: torch.float16
Processing variable: layers.24.attention.wv.weight with shape: torch.Size([4096, 4096]) and type: torch.float16
Processing variable: layers.24.attention.wo.weight with shape: torch.Size([4096, 4096]) and type: torch.float16
Processing variable: layers.24.feed_forward.w1.weight with shape: torch.Size([11008, 4096]) and type: torch.float16
Processing variable: layers.24.feed_forward.w2.weight with shape: torch.Size([4096, 11008]) and type: torch.float16
Processing variable: layers.24.feed_forward.w3.weight with shape: torch.Size([11008, 4096]) and type: torch.float16
Processing variable: layers.24.attention_norm.weight with shape: torch.Size([4096]) and type: torch.float16
Converting to float32
Processing variable: layers.24.ffn_norm.weight with shape: torch.Size([4096]) and type: torch.float16
Converting to float32
Processing variable: layers.25.attention.wq.weight with shape: torch.Size([4096, 4096]) and type: torch.float16
Processing variable: layers.25.attention.wk.weight with shape: torch.Size([4096, 4096]) and type: torch.float16
Processing variable: layers.25.attention.wv.weight with shape: torch.Size([4096, 4096]) and type: torch.float16
Processing variable: layers.25.attention.wo.weight with shape: torch.Size([4096, 4096]) and type: torch.float16
Processing variable: layers.25.feed_forward.w1.weight with shape: torch.Size([11008, 4096]) and type: torch.float16
Processing variable: layers.25.feed_forward.w2.weight with shape: torch.Size([4096, 11008]) and type: torch.float16
Processing variable: layers.25.feed_forward.w3.weight with shape: torch.Size([11008, 4096]) and type: torch.float16
Processing variable: layers.25.attention_norm.weight with shape: torch.Size([4096]) and type: torch.float16
Converting to float32
Processing variable: layers.25.ffn_norm.weight with shape: torch.Size([4096]) and type: torch.float16
Converting to float32
Processing variable: layers.26.attention.wq.weight with shape: torch.Size([4096, 4096]) and type: torch.float16
Processing variable: layers.26.attention.wk.weight with shape: torch.Size([4096, 4096]) and type: torch.float16
Processing variable: layers.26.attention.wv.weight with shape: torch.Size([4096, 4096]) and type: torch.float16
Processing variable: layers.26.attention.wo.weight with shape: torch.Size([4096, 4096]) and type: torch.float16
Processing variable: layers.26.feed_forward.w1.weight with shape: torch.Size([11008, 4096]) and type: torch.float16
Processing variable: layers.26.feed_forward.w2.weight with shape: torch.Size([4096, 11008]) and type: torch.float16
Processing variable: layers.26.feed_forward.w3.weight with shape: torch.Size([11008, 4096]) and type: torch.float16
Processing variable: layers.26.attention_norm.weight with shape: torch.Size([4096]) and type: torch.float16
Converting to float32
Processing variable: layers.26.ffn_norm.weight with shape: torch.Size([4096]) and type: torch.float16
Converting to float32
Processing variable: layers.27.attention.wq.weight with shape: torch.Size([4096, 4096]) and type: torch.float16
Processing variable: layers.27.attention.wk.weight with shape: torch.Size([4096, 4096]) and type: torch.float16
Processing variable: layers.27.attention.wv.weight with shape: torch.Size([4096, 4096]) and type: torch.float16
Processing variable: layers.27.attention.wo.weight with shape: torch.Size([4096, 4096]) and type: torch.float16
Processing variable: layers.27.feed_forward.w1.weight with shape: torch.Size([11008, 4096]) and type: torch.float16
Processing variable: layers.27.feed_forward.w2.weight with shape: torch.Size([4096, 11008]) and type: torch.float16
Processing variable: layers.27.feed_forward.w3.weight with shape: torch.Size([11008, 4096]) and type: torch.float16
Processing variable: layers.27.attention_norm.weight with shape: torch.Size([4096]) and type: torch.float16
Converting to float32
Processing variable: layers.27.ffn_norm.weight with shape: torch.Size([4096]) and type: torch.float16
Converting to float32
Processing variable: layers.28.attention.wq.weight with shape: torch.Size([4096, 4096]) and type: torch.float16
Processing variable: layers.28.attention.wk.weight with shape: torch.Size([4096, 4096]) and type: torch.float16
Processing variable: layers.28.attention.wv.weight with shape: torch.Size([4096, 4096]) and type: torch.float16
Processing variable: layers.28.attention.wo.weight with shape: torch.Size([4096, 4096]) and type: torch.float16
Processing variable: layers.28.feed_forward.w1.weight with shape: torch.Size([11008, 4096]) and type: torch.float16
Processing variable: layers.28.feed_forward.w2.weight with shape: torch.Size([4096, 11008]) and type: torch.float16
Processing variable: layers.28.feed_forward.w3.weight with shape: torch.Size([11008, 4096]) and type: torch.float16
Processing variable: layers.28.attention_norm.weight with shape: torch.Size([4096]) and type: torch.float16
Converting to float32
Processing variable: layers.28.ffn_norm.weight with shape: torch.Size([4096]) and type: torch.float16
Converting to float32
Processing variable: layers.29.attention.wq.weight with shape: torch.Size([4096, 4096]) and type: torch.float16
Processing variable: layers.29.attention.wk.weight with shape: torch.Size([4096, 4096]) and type: torch.float16
Processing variable: layers.29.attention.wv.weight with shape: torch.Size([4096, 4096]) and type: torch.float16
Processing variable: layers.29.attention.wo.weight with shape: torch.Size([4096, 4096]) and type: torch.float16
Processing variable: layers.29.feed_forward.w1.weight with shape: torch.Size([11008, 4096]) and type: torch.float16
Processing variable: layers.29.feed_forward.w2.weight with shape: torch.Size([4096, 11008]) and type: torch.float16
Processing variable: layers.29.feed_forward.w3.weight with shape: torch.Size([11008, 4096]) and type: torch.float16
Processing variable: layers.29.attention_norm.weight with shape: torch.Size([4096]) and type: torch.float16
Converting to float32
Processing variable: layers.29.ffn_norm.weight with shape: torch.Size([4096]) and type: torch.float16
Converting to float32
Processing variable: layers.30.attention.wq.weight with shape: torch.Size([4096, 4096]) and type: torch.float16
Processing variable: layers.30.attention.wk.weight with shape: torch.Size([4096, 4096]) and type: torch.float16
Processing variable: layers.30.attention.wv.weight with shape: torch.Size([4096, 4096]) and type: torch.float16
Processing variable: layers.30.attention.wo.weight with shape: torch.Size([4096, 4096]) and type: torch.float16
Processing variable: layers.30.feed_forward.w1.weight with shape: torch.Size([11008, 4096]) and type: torch.float16
Processing variable: layers.30.feed_forward.w2.weight with shape: torch.Size([4096, 11008]) and type: torch.float16
Processing variable: layers.30.feed_forward.w3.weight with shape: torch.Size([11008, 4096]) and type: torch.float16
Processing variable: layers.30.attention_norm.weight with shape: torch.Size([4096]) and type: torch.float16
Converting to float32
Processing variable: layers.30.ffn_norm.weight with shape: torch.Size([4096]) and type: torch.float16
Converting to float32
Processing variable: layers.31.attention.wq.weight with shape: torch.Size([4096, 4096]) and type: torch.float16
Processing variable: layers.31.attention.wk.weight with shape: torch.Size([4096, 4096]) and type: torch.float16
Processing variable: layers.31.attention.wv.weight with shape: torch.Size([4096, 4096]) and type: torch.float16
Processing variable: layers.31.attention.wo.weight with shape: torch.Size([4096, 4096]) and type: torch.float16
Processing variable: layers.31.feed_forward.w1.weight with shape: torch.Size([11008, 4096]) and type: torch.float16
Processing variable: layers.31.feed_forward.w2.weight with shape: torch.Size([4096, 11008]) and type: torch.float16
Processing variable: layers.31.feed_forward.w3.weight with shape: torch.Size([11008, 4096]) and type: torch.float16
Processing variable: layers.31.attention_norm.weight with shape: torch.Size([4096]) and type: torch.float16
Converting to float32
Processing variable: layers.31.ffn_norm.weight with shape: torch.Size([4096]) and type: torch.float16
Converting to float32
Done. Output file: models/7B//ggml-model-f16.bin, (part 0)
能够看到,如果转换胜利,会在 models/7B/ 目录生成一个 C ++ 能够调用的 ggml-model-f16.bin 模型文件。
LLaMA 模型调用
接下来就能够调用转换后的模型了,首先在编译 C ++ 我的项目:
make
程序返回:
➜ llama.cpp git:(master) ✗ make
I llama.cpp build info:
I UNAME_S: Darwin
I UNAME_P: arm
I UNAME_M: arm64
I CFLAGS: -I. -O3 -DNDEBUG -std=c11 -fPIC -pthread -DGGML_USE_ACCELERATE
I CXXFLAGS: -I. -I./examples -O3 -DNDEBUG -std=c++17 -fPIC -pthread
I LDFLAGS: -framework Accelerate
I CC: Apple clang version 14.0.0 (clang-1400.0.29.202)
I CXX: Apple clang version 14.0.0 (clang-1400.0.29.202)
cc -I. -O3 -DNDEBUG -std=c11 -fPIC -pthread -DGGML_USE_ACCELERATE -c ggml.c -o ggml.o
c++ -I. -I./examples -O3 -DNDEBUG -std=c++17 -fPIC -pthread -c utils.cpp -o utils.o
c++ -I. -I./examples -O3 -DNDEBUG -std=c++17 -fPIC -pthread main.cpp ggml.o utils.o -o main -framework Accelerate
./main -h
usage: ./main [options]
options:
-h, --help show this help message and exit
-i, --interactive run in interactive mode
-ins, --instruct run in instruction mode (use with Alpaca models)
-r PROMPT, --reverse-prompt PROMPT
in interactive mode, poll user input upon seeing PROMPT (can be
specified more than once for multiple prompts).
--color colorise output to distinguish prompt and user input from generations
-s SEED, --seed SEED RNG seed (default: -1)
-t N, --threads N number of threads to use during computation (default: 4)
-p PROMPT, --prompt PROMPT
prompt to start generation with (default: empty)
--random-prompt start with a randomized prompt.
-f FNAME, --file FNAME
prompt file to start generation.
-n N, --n_predict N number of tokens to predict (default: 128)
--top_k N top-k sampling (default: 40)
--top_p N top-p sampling (default: 0.9)
--repeat_last_n N last n tokens to consider for penalize (default: 64)
--repeat_penalty N penalize repeat sequence of tokens (default: 1.3)
-c N, --ctx_size N size of the prompt context (default: 512)
--ignore-eos ignore end of stream token and continue generating
--memory_f16 use f16 instead of f32 for memory key+value
--temp N temperature (default: 0.8)
-b N, --batch_size N batch size for prompt processing (default: 8)
-m FNAME, --model FNAME
model path (default: models/llama-7B/ggml-model.bin)
c++ -I. -I./examples -O3 -DNDEBUG -std=c++17 -fPIC -pthread quantize.cpp ggml.o utils.o -o quantize -framework Accelerate
编译胜利后,本地会生成一个 main.cpp 文件。
随后依据编译后输入的阐明文档间接调用模型即可:
./main -m ./models/7B/ggml-model-f16.bin -p 'Hi i am'
程序输入:
➜ llama.cpp git:(master) ✗ ./main -m ./models/7B/ggml-model-f16.bin -p 'hi i am'
main: seed = 1679400707
llama_model_load: loading model from './models/7B/ggml-model-f16.bin' - please wait ...
llama_model_load: n_vocab = 32000
llama_model_load: n_ctx = 512
llama_model_load: n_embd = 4096
llama_model_load: n_mult = 256
llama_model_load: n_head = 32
llama_model_load: n_layer = 32
llama_model_load: n_rot = 128
llama_model_load: f16 = 1
llama_model_load: n_ff = 11008
llama_model_load: n_parts = 1
llama_model_load: ggml ctx size = 13365.09 MB
llama_model_load: memory_size = 512.00 MB, n_mem = 16384
llama_model_load: loading model part 1/1 from './models/7B/ggml-model-f16.bin'
llama_model_load: .................................... done
llama_model_load: model size = 12853.02 MB / num tensors = 291
system_info: n_threads = 4 / 10 | AVX = 0 | AVX2 = 0 | AVX512 = 0 | FMA = 0 | NEON = 1 | ARM_FMA = 1 | F16C = 0 | FP16_VA = 1 | WASM_SIMD = 0 | BLAS = 1 | SSE3 = 0 | VSX = 0 |
main: prompt: 'hi i am'
main: number of tokens in prompt = 6
1 -> ''13450 ->' hi'423 ->'i'25523 ->' am'
sampling parameters: temp = 0.800000, top_k = 40, top_p = 0.950000, repeat_last_n = 64, repeat_penalty = 1.300000
hi i am a pythoner, but sunk to become a ruby
说实话,推理速度切实不敢恭维,也可能是因为笔者的电脑配置太渣导致。
结语
LLaMA 7B 模型总体上须要纯英文的提醒词(prompt),对中文的理解能力还不够,劣势是的确能够单机跑起来,当然本地跑的话,缩小了网络传输数据的环节,推理效率天然也就更高,对于一般的 AI 爱好者来说,足矣。