关于深度学习:使用LORA微调RoBERTa

68次阅读

共计 4914 个字符,预计需要花费 13 分钟才能阅读完成。

模型微调是指在一个曾经训练好的模型的根底上,针对特定工作或者特定数据集进行再次训练以进步性能的过程。微调能够在使其适应特定工作时产生显着的后果。

RoBERTa(Robustly optimized BERT approach)是由 Facebook AI 提出的一种基于 Transformer 架构的预训练语言模型。它是对 Google 提出的 BERT(Bidirectional Encoder Representations from Transformers)模型的改良和优化。

“Low-Rank Adaptation”(低秩自适应)是一种用于模型微调或迁徙学习的技术。一般来说咱们只是应用 LORA 来微调大语言模型,然而其实只有是应用了 Transformers 块的模型,LORA 都能够进行微调,本文将介绍如何利用🤗PEFT 库,应用 LORA 进步微调过程的效率。

LORA 能够大大减少了可训练参数的数量,节俭了训练工夫、存储和计算成本,并且能够与其余模型自适应技术 (如前缀调优) 一起应用,以进一步加强模型。

然而,LORA 会引入额定的超参数调优层(特定于 LORA 的秩、alpha 等)。并且在某些状况下,性能不如齐全微调的模型最优,这个须要依据不同的需要来进行测试。

首先咱们装置须要的包:

 !pip install transformers datasets evaluate accelerate peft

数据预处理

 import torch
 from transformers import RobertaModel, RobertaTokenizer, AutoModelForSequenceClassification, TrainingArguments, Trainer, DataCollatorWithPadding
 from peft import LoraConfig, get_peft_model
 from datasets import load_dataset
 
 
 
 peft_model_name = 'roberta-base-peft'
 modified_base = 'roberta-base-modified'
 base_model = 'roberta-base'
 
 dataset = load_dataset('ag_news')
 tokenizer = RobertaTokenizer.from_pretrained(base_model)
 
 def preprocess(examples):
     tokenized = tokenizer(examples['text'], truncation=True, padding=True)
     return tokenized
 
 tokenized_dataset = dataset.map(preprocess, batched=True,  remove_columns=["text"])
 train_dataset=tokenized_dataset['train']
 eval_dataset=tokenized_dataset['test'].shard(num_shards=2, index=0)
 test_dataset=tokenized_dataset['test'].shard(num_shards=2, index=1)
 
 
 # Extract the number of classess and their names
 num_labels = dataset['train'].features['label'].num_classes
 class_names = dataset["train"].features["label"].names
 print(f"number of labels: {num_labels}")
 print(f"the labels: {class_names}")
 
 # Create an id2label mapping
 # We will need this for our classifier.
 id2label = {i: label for i, label in enumerate(class_names)}
 
 data_collator = DataCollatorWithPadding(tokenizer=tokenizer, return_tensors="pt")

训练

咱们训练两个模型,一个应用 LORA,另一个应用残缺的微调流程。这里能够看到 LORA 的训练工夫和训练参数的数量能缩小多少

以下是应用残缺微调

 
 training_args = TrainingArguments(
     output_dir='./results',
     evaluation_strategy='steps',
     learning_rate=5e-5,
     num_train_epochs=1,
     per_device_train_batch_size=16,
 )
 

而后进行训练:

 def get_trainer(model):
       return  Trainer(
           model=model,
           args=training_args,
           train_dataset=train_dataset,
           eval_dataset=eval_dataset,
           data_collator=data_collator,
       )
 full_finetuning_trainer = get_trainer(AutoModelForSequenceClassification.from_pretrained(base_model, id2label=id2label),
 )
 
 full_finetuning_trainer.train()

上面看看 PEFT 的 LORA

 model = AutoModelForSequenceClassification.from_pretrained(base_model, id2label=id2label)
 
 peft_config = LoraConfig(task_type="SEQ_CLS", inference_mode=False, r=8, lora_alpha=16, lora_dropout=0.1)
 peft_model = get_peft_model(model, peft_config)
 
 print('PEFT Model')
 peft_model.print_trainable_parameters()
 
 peft_lora_finetuning_trainer = get_trainer(peft_model)
 
 peft_lora_finetuning_trainer.train()
 peft_lora_finetuning_trainer.evaluate()

能够看到

模型参数总计:125,537,288,而 LORA 模型的训练参数为:888,580,咱们只须要用 LORA 训练~0.70% 的参数! 这会大大减少内存的占用和训练工夫。

在训练实现后,咱们保留模型:

 tokenizer.save_pretrained(modified_base)
 peft_model.save_pretrained(peft_model_name)

最初测试咱们的模型

 from peft import AutoPeftModelForSequenceClassification
 from transformers import AutoTokenizer
 
 # LOAD the Saved PEFT model
 inference_model = AutoPeftModelForSequenceClassification.from_pretrained(peft_model_name, id2label=id2label)
 tokenizer = AutoTokenizer.from_pretrained(modified_base)
 
 
 def classify(text):
   inputs = tokenizer(text, truncation=True, padding=True, return_tensors="pt")
   output = inference_model(**inputs)
 
   prediction = output.logits.argmax(dim=-1).item()
 
   print(f'\n Class: {prediction}, Label: {id2label[prediction]}, Text: {text}')
   # return id2label[prediction]
 
 classify("Kederis proclaims innocence Olympic champion Kostas Kederis today left hospital ahead of his date with IOC inquisitors claiming his ...")

 classify("Wall St. Bears Claw Back Into the Black (Reuters) Reuters - Short-sellers, Wall Street's dwindling\band of ultra-cynics, are seeing green again.")

模型评估

咱们还须要对 PEFT 模型的性能与齐全微调的模型的性能进行比照,看看这种形式有没有性能的损失

 from torch.utils.data import DataLoader
 import evaluate
 from tqdm import tqdm
 
 metric = evaluate.load('accuracy')
 
 def evaluate_model(inference_model, dataset):
 
     eval_dataloader = DataLoader(dataset.rename_column("label", "labels"), batch_size=8, collate_fn=data_collator)
     device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
 
     inference_model.to(device)
     inference_model.eval()
     for step, batch in enumerate(tqdm(eval_dataloader)):
         batch.to(device)
         with torch.no_grad():
             outputs = inference_model(**batch)
         predictions = outputs.logits.argmax(dim=-1)
         predictions, references = predictions, batch["labels"]
         metric.add_batch(
             predictions=predictions,
             references=references,
         )
 
     eval_metric = metric.compute()
     print(eval_metric)
     

首先是没有进行微调的模型,也就是原始模型

 evaluate_model(AutoModelForSequenceClassification.from_pretrained(base_model, id2label=id2label), test_dataset)

accuracy: 0.24868421052631579‘

上面是 LORA 微调模型

 evaluate_model(inference_model, test_dataset)

accuracy: 0.9278947368421052

最初是齐全微调的模型:

 evaluate_model(full_finetuning_trainer.model, test_dataset)

accuracy: 0.9460526315789474

总结

咱们应用 PEFT 对 RoBERTa 模型进行了微调和评估,能够看到应用 LORA 进行微调能够大大减少训练的参数和工夫,然而在准确性方面还是要比残缺的微调要稍稍降落。

本文代码:

https://avoid.overfit.cn/post/26e401b70f9840dab185a6a83aac06b0

作者:Achilles Moraites

正文完
 0