关于机器学习:IDEA封神榜大语言模型二郎神系列ErlangshenUbert110MChinese使用

作者：

在

官网文档和代码中的setup有谬误，我曾经提了PR，如果官网不merge请大家应用我这个版本：文档，https://github.com/Yonggie/Fengshenbang-doc；代码：https://github.com/Yonggie/Fengshenbang-LM。

装置

就按官网的来，只不过下载的repo换成我那个

git clone https://github.com/Yonggie/Fengshenbang-LM.git
cd Fengshenbang-LM
pip install --editable ./

例子

复制如下

import argparse
from fengshen import UbertPipelines

total_parser = argparse.ArgumentParser("TASK NAME")
total_parser = UbertPipelines.pipelines_args(total_parser)
args = total_parser.parse_args()

args.pretrained_model_path = "IDEA-CCNL/Erlangshen-Ubert-330M-Chinese"

test_data=[
    {
        "task_type": "抽取工作", 
        "subtask_type": "实体辨认", 
        "text": "这也让很多业主据此认为，雅清苑是政府公务员挤对了国家的经适房政策。", 
        "choices": [ 
            {"entity_type": "小区名字"}, 
            {"entity_type": "岗位职责"}
            ],
        "id": 0}
]

model = UbertPipelines(args)
result = model.predict(test_data)
for line in result:
    print(line)

我改良了什么？

对于二郎神系列模型Erlangshen-Ubert-110M-Chinese和对应330M的模型：

官网的文档example首先有typo谬误，是跑不通的，须要批改UbertPiplines把UbertPipelines（少了一个e）
另外通过文档的装置形式也是不行的，因为代码的pytorch lightning写法是1.x的，当初2.x曾经不适宜了，官网的setup.py没有规定版本，我曾经更改。另外还短少了一些依赖，我也曾经在这个repo外面补充。
我还退出了默认应用GPU，原版是默认不加的。
模型有限度，只能batch size为128，能够批改，地位 modeling_ubert.py的class UbertDataModel(pl.LightningDataModule):

小结

批改fengshen库的setup.py，规定version，若依照原setup会报错
批改readme的example，批改typo（我的小时没了，切实是没察觉是文档少了个字母）更新了的文档在repohttps://github.com/Yonggie/Fengshenbang-doc外面

评论

发表回复取消回复

这个站点使用 Akismet 来减少垃圾评论。了解你的评论数据如何被处理。

更多文章