1. 引言
自然语言生成 (即文本生成) 是自然语言解决 (NLP) 的外围工作之一。本文将介绍神经网络文本生成畛域以后最先进的解码办法 比照搜寻 (Contrastive Search)。提出该办法的论文 “A Contrastive Framework for Neural Text Generation” 最后发表于 NeurIPS 2022 (论文、官网实现)。尔后, “Contrastive Search Is What You Need For Neural Text Generation” 的作者又进一步证实了比照搜寻能够用 现有的 语言模型在 16 种语言上生成可媲美人类程度的文本 (论文、官网实现)。
[备注] 对于不相熟文本生成的用户,请参阅 此博文 理解更多详情。
2. Hugging Face 🤗 比照搜寻演示
目前,🤗 transformers
的 PyTorch 和 TensorFlow 后端均反对比照搜寻。你能够在 该 Colab notebook 中依据不同的后端抉择相应的局部来摸索该办法,文章顶部也有该 notebook 链接。咱们还构建了这个不错的 演示利用,用它能够直观地比拟比照搜寻与其余风行的解码办法 (例如波束搜寻、top-k 采样 [3] 以及核采样 [4])。
3. 环境装置
在进行后续试验前,咱们要先装置最新的 transformers
库,如下:
pip install torch
pip install "transformers==4.24.0"
4. 现有解码办法存在的问题
解码办法能够分为两类: (i) 确定性办法,(ii) 随机办法。上面咱们别离对两者进行探讨!
4.1. 确定性办法
确定性办法,如贪婪搜寻和波束搜寻,通过在语言模型输入的所有候选补全词中抉择概率最高的词来生成最终文本。然而,正如之前钻研 [3][4] 指出的,确定性办法通常会导致 _模型进化_,即生成的文本不天然且蕴含不必要的反复。
上面,咱们看一个用 GPT-2 模型和贪婪搜寻生成文本的例子。
from transformers import AutoTokenizer, GPT2LMHeadModel
tokenizer = AutoTokenizer.from_pretrained('gpt2-large')
input_ids = tokenizer('DeepMind Company is', return_tensors='pt').input_ids
model = GPT2LMHeadModel.from_pretrained('gpt2-large')
output = model.generate(input_ids, max_length=128)
print("Output:\n" + 100 *'-')
print(tokenizer.decode(output[0], skip_special_tokens=True))
print("" + 100 *'-')
模型输入:
Output:
----------------------------------------------------------------------------------------------------
DeepMind Company is a leading AI research company, with a focus on deep learning and deep learning-based systems.
The company's research is focused on the development of deep learning-based systems that can learn from large amounts of data, and that can be used to solve real-world problems.
DeepMind's research is also used by the UK government to develop new technologies for the UK's National Health Service.
DeepMind's research is also used by the UK government to develop new technologies for the UK's National Health Service.
DeepMind's research is also used by the UK government to develop new technologies
----------------------------------------------------------------------------------------------------
[备注] 咱们能够看到,贪婪搜寻生成的后果中有显著的反复。
4.2. 随机办法
为了解决确定性办法带来的问题,随机办法通过在解码过程中引入随机性来生成文本。罕用的两种随机办法是 (i) top-k 采样 [3] 和 (ii) 核采样 (也称为 top-p 采样) [4]。
上面,咱们给出用 GPT-2 模型和核采样 (p=0.95) 生成文本的示例。
import torch
from transformers import AutoTokenizer, GPT2LMHeadModel
tokenizer = AutoTokenizer.from_pretrained('gpt2-large')
input_ids = tokenizer('DeepMind Company is', return_tensors='pt').input_ids
model = GPT2LMHeadModel.from_pretrained('gpt2-large')
torch.manual_seed(0.)
output = model.generate(input_ids, do_sample=True, max_length=128, top_p=0.95, top_k=0)
print("Output:\n" + 100 *'-')
print(tokenizer.decode(output[0], skip_special_tokens=True))
print("" + 100 *'-')
模型输入:
Output:
----------------------------------------------------------------------------------------------------
DeepMind Company is a leading provider of AI-based research, development, and delivery of AI solutions for security, infrastructure, machine learning, communications, and so on."
'AI is not journalism'
Worse still was the message its researchers hoped would reach the world's media — that it was not really research, but rather a get-rich-quick scheme to profit from living forces' ignorance.
"The thing is, we know that people don't consciously assess the value of the others'
information. They understand they will get the same on their own."
One example? Given the details of today
----------------------------------------------------------------------------------------------------
[备注] 尽管核采样能够生成没有反复的文本,但生成文本的语义一致性并不是很好。例如,生成的短语 ‘AI is not journalism’ 与给定的上文即 ‘DeepMind Company’ 不统一。
咱们留神到,这种语义不统一的问题能够通过升高温度 (temperature) 来局部解决。然而,升高温度会使核采样更靠近贪婪搜寻,这其实就变成了贪婪搜寻和核采样之间的衡量。一般来讲,要找到一个既能防止贪婪搜寻又能防止核采样陷阱的快捷且与模型无关的温度相当有挑战。
5. 比照搜寻
本节咱们来具体介绍一种新的解码办法, _ 比照搜寻_。
5.1. 解码指标
给定前缀文本 \( x_{< t} \),咱们按如下公式抉择输入词元 \( x_{t} \):
上式中, \( V^{(k)} \)是语言模型输入概率分布 \( p_{\theta}(v|x_{< t}) \) 中 k 个概率最大的候选词元的汇合。第一项,即 模型置信度 (model confidence)_,是语言模型预测的每个候选词元 \( v \) 的概率。第二项, _进化惩办 (degeneration penalty)_,用于度量 \( v \) 与上文 \( x_{< t} \) 中每个词元的相异度,其中函数 \( s(\cdot, \cdot)$ 用于计算每两个词元间的余弦类似度。更具体地说,进化惩办被定义为 $v$ 的向量表征 $h_{v}$ 与其上文 $x {< t}$ 中每个词元的向量表征间余弦类似度的最大值。这里,候选词元的向量表征 $h_{v}$ 是在给定 $x_{< t}$ 和 $v$ 的条件下将二者连接起来输出给语言模型,而后由语言模型计算出来的。直观上,如果 $v$ 的进化惩办较大意味着它与上文更类似 (在示意空间中),因而更有可能导致模型进化问题。超参数 $\alpha$ 用于在这两项中折衷。当 $\alpha=0$ 时,比照搜寻进化为纯贪婪搜寻。
[备注] 在生成输入时,比照搜寻同时思考 (i) 语言模型预测的概率,以放弃生成文本和前缀文本之间的语义连贯性; (ii) 与上文的相似性以防止模型进化。
5.2. 应用比照搜寻生成文本
上面,咱们应用与 第 4.1 节 和 第 4.2 节 中雷同的前缀文本 (即 “DeepMind Company is” ),并应用比照搜寻生成文本 (取 k=4、\( \alpha=0.6 \))。为了充沛展现比照搜寻的卓越能力,咱们让语言模型生成一个 512 词元的 长文档,如下:
from transformers import GPT2Tokenizer, GPT2LMHeadModel
model_name = 'gpt2-large'
tokenizer = GPT2Tokenizer.from_pretrained(model_name)
model = GPT2LMHeadModel.from_pretrained(model_name, pad_token_id=tokenizer.eos_token_id)
model.eval()
# prepare the prefix
prefix_text = r'DeepMind Company is'
input_ids = tokenizer(prefix_text, return_tensors='pt').input_ids
# generate the result with contrastive search
output = model.generate(input_ids, penalty_alpha=0.6, top_k=4, max_length=512)
print("Output:\n" + 100 *'-')
print(tokenizer.decode(output[0], skip_special_tokens=True))
print("" + 100 *'-')
参数设置如下:
--top_k
: 比照搜寻中的超参 $k$。--penalty_alpha
: 比照搜寻中的超参 $\alpha$。
模型输入:
Output:
----------------------------------------------------------------------------------------------------
DeepMind Company is a leader in artificial intelligence (AI). We have a long history of working with companies such as Google, Facebook, Amazon, and Microsoft to build products that improve people's lives, and today we are excited to announce that DeepMind's AlphaGo program has won the game of Go, becoming the first program to defeat a professional Go player.
The victory is a testament to the power of deep learning, and to the incredible work of our research team, which has been at the forefront of AI research for the past five years. AlphaGo is one of the most advanced Go programs ever created, and its performance is an important step towards the goal of human-level AI.
"This is the culmination of a decade of hard work," said Andy Ng, co-founder and CTO of DeepMind. "We are thrilled to have achieved this milestone and look forward to continuing to develop AI that can be used in a wide range of applications and to help people live better lives."
DeepMind's work on Go began in 2010, when it began to train a neural network to play Go using millions of games played by top Go players around the world. Since then, the team has refined the algorithm, adding more and more layers of reinforcement learning to make it better at recognizing patterns and making decisions based on those patterns. In the past year and a half, the team has made significant progress in the game, winning a record-tying 13 games in a row to move into the top four of the world rankings.
"The game of Go is a complex game in which players have to be very careful not to overextend their territory, and this is something that we have been able to improve over and over again," said Dr. Demis Hassabis, co-founder and Chief Scientific Officer of DeepMind. "We are very proud of our team's work, and we hope that it will inspire others to take the next step in their research and apply the same techniques to other problems."
In addition to the win in Go, DeepMind has also developed an AI system that can learn to play a number of different games, including poker, Go, and chess. This AI system, called Tarsier, was developed in partnership with Carnegie Mellon University and the University of California, Berkeley, and is being used to teach computer vision and machine learning to identify objects in images and recognize speech in natural language. Tarsier has been trained to play the game of Go and other games on a
----------------------------------------------------------------------------------------------------
[备注] 咱们看到生成的文本品质十分高。整个文档语法晦涩,语义连贯。同时,生成的文本也很好地放弃了事实的正确性。例如,在第一段中,它正确论述了 “AlphaGo” 作为 “第一个击败职业围棋选手的程序” 这一事实。
5.3. 比照搜寻的后果可视化
为了更好地了解比照搜寻的工作原理,咱们对贪婪搜寻 ( 第 4.1 节 ) 和比照搜寻进行了直观比拟。具体来说,咱们别离将贪婪搜寻和比照搜寻生成的词元类似度矩阵可视化。两个词元之间的相似性被定义为它们的向量表征 (即最初一个转换器层的暗藏状态) 之间的余弦相似性。贪婪搜寻 (上) 和比照搜寻 (下) 的后果如下图所示。
[备注] 从贪婪搜寻的后果中,咱们看到非对角线的类似度很高,这分明地表明贪婪搜寻产生了反复。相同,在比照搜寻的后果中,高类似度分数次要呈现在对角线上,这证实咱们胜利解决了进化问题。比照搜寻的这一低劣个性是通过在解码过程中引入进化惩办 (参见 第 5.1 节 ) 来实现的。
6. 更多的生成示例
在本节中,咱们提供了更多的生成示例来比拟不同的解码办法。
6.1. 示例一: GPT-2
在这部分中,咱们应用 GPT-2 生成文本,其前缀文本来自 OpenAI 公布 GPT-2 的博客文章。
In a shocking finding, scientist discovered a herd of unicorns living in a remote, previously unexplored valley, in the Andes Mountains. Even more surprising to the researchers was the fact that the unicorns spoke perfect English.
加载语言模型并筹备前缀文本:
import torch
from transformers import AutoTokenizer, GPT2LMHeadModel
tokenizer = AutoTokenizer.from_pretrained('gpt2-large')
model = GPT2LMHeadModel.from_pretrained('gpt2-large')
prefix_text = r"In a shocking finding, scientist discovered a herd of unicorns living in a remote, previously unexplored valley, in the Andes Mountains. Even more surprising to the researchers was the fact that the unicorns spoke perfect English."
input_ids = tokenizer(prefix_text, return_tensors='pt').input_ids
6.1.1. 应用贪婪搜寻生成文本
代码:
output = model.generate(input_ids, max_length=512)
print("Output:\n" + 100 *'-')
print(tokenizer.decode(output[0], skip_special_tokens=True))
print("" + 100 *'-')
模型输入:
Output:
----------------------------------------------------------------------------------------------------
In a shocking finding, scientist discovered a herd of unicorns living in a remote, previously unexplored valley, in the Andes Mountains. Even more surprising to the researchers was the fact that the unicorns spoke perfect English.
The researchers, led by Dr. David R. Williams of the University of California, Santa Cruz, discovered the unicorns in the Andes Mountains of Peru. The area is known for its unique geology and is home to a number of rare species of animals.
The researchers found the unicorns in the Andes Mountains of Peru.
"We were surprised to find that the unicorns were able to communicate with each other," Williams said. "We were also surprised to find that they were able to communicate in English."
The researchers believe that the unicorns are descendants of the ancient Incas, who lived in the area around 2,000 years ago.
"The Incas were the first people to use the Andes Mountains as a refuge from the harsh climate of the Andes," Williams said. "They were also the first people to use the Andes Mountains as a place to hunt and gather food."
The researchers believe that the unicorns are descendants of the ancient Incas, who lived in the area around 2,000 years ago.
"The Incas were the first people to use the Andes Mountains as a refuge from the harsh climate of the Andes," Williams said. "They were also the first people to use the Andes Mountains as a place to hunt and gather food."
The researchers believe that the unicorns are descendants of the ancient Incas, who lived in the area around 2,000 years ago.
"The Incas were the first people to use the Andes Mountains as a refuge from the harsh climate of the Andes," Williams said. "They were also the first people to use the Andes Mountains as a place to hunt and gather food."
The researchers believe that the unicorns are descendants of the ancient Incas, who lived in the area around 2,000 years ago.
"The Incas were the first people to use the Andes Mountains as a refuge from the harsh climate of the Andes," Williams said. "They were also the first people to use the Andes Mountains as a place to hunt and gather food."
The researchers believe that the unicorns are descendants of the ancient Incas, who lived in the area around 2,000 years ago
----------------------------------------------------------------------------------------------------
6.1.2. 应用核采样生成文本
代码:
torch.manual_seed(0.)
output = model.generate(input_ids, do_sample=True, max_length=512, top_p=0.95, top_k=0)
print("Output:\n" + 100 *'-')
print(tokenizer.decode(output[0], skip_special_tokens=True))
print("" + 100 *'-')
模型输入:
Output:
----------------------------------------------------------------------------------------------------
In a shocking finding, scientist discovered a herd of unicorns living in a remote, previously unexplored valley, in the Andes Mountains. Even more surprising to the researchers was the fact that the unicorns spoke perfect English. The study was published in the Journal of Zoology in March 2016.
Polygynous mammals such as unicorns have remained largely unknown to science. Professor Gustavo Giacota, from the University of Oxford who led the study, said that they had been documented as far as Eastern Siberia in Russia, but had only been seen a handful of times in the Gobi Desert.
Tiny animals with pale and shiny coats live in the presence of human beings and are hardly likely to be victims of any cruelty. However, there is some evidence of the condition occurring in both humans and animals in remote regions, which might have similarities to "black moles" that coexist on the skin.
It is thought that Unicorns could be inside themselves, that they have different scents depending on their current environment, or just fall out and there are plenty of legends of how they have survived. Experts speculate that the moths and other animals could be remnants of the Yezidi Isis and Charon, which literally is both the word which means great bird, and the Greek word for sound. It is said that the Isis and Charon taught their young the use of voice in the form of calling out to others.
The scientists think that it could be ancient folklore that has survived and is no longer attributed to a real entity
----------------------------------------------------------------------------------------------------
6.1.3. 应用比照搜寻生成文本
代码:
output = model.generate(input_ids, max_length=512, penalty_alpha=0.6, top_k=4)
print("Output:\n" + 100 *'-')
print(tokenizer.decode(output[0], skip_special_tokens=True))
print("" + 100 *'-')
模型输入:
Output:
----------------------------------------------------------------------------------------------------
In a shocking finding, scientist discovered a herd of unicorns living in a remote, previously unexplored valley, in the Andes Mountains. Even more surprising to the researchers was the fact that the unicorns spoke perfect English.
According to the BBC, a team of scientists led by Dr David MacKay, from the University of Bristol, spent two years searching for the unicorn herd, which they discovered during a survey of the area.
"It's a very rare find," MacKay told the BBC. "There are a few in the Himalayas, but this is the first time we've been able to find one in such a remote area."
The team was surprised to find a herd of unicorns living in a region that has been known to be a hotbed of poaching, with many of the animals poached for their horns, which are used in traditional Chinese medicine to treat everything from rheumatism to cancer.
"We knew that the area was rich in rhino horn, but we had no idea how many there were, or what they were doing there," MacKay said. "This is an area of high poaching pressure, and we wanted to find out what was going on."
In order to do so, the team used GPS collars to track the animals as they moved around the mountain and the surrounding area. The GPS data was then compared with information gathered from local villagers, who had a wealth of information about the animals' movements, including where they were eating, what they were doing at night, and how much time they spent in the mountains each day.
After analyzing the data, the team determined that the herd consisted of at least three species of unicorns, including a male and two females. One of the females was the mother of the male, and the other two were her daughters. All three had the same horn color, which is believed to be a sign of purity in the animal kingdom.
While the discovery is exciting, it's not the first time scientists have discovered an animal that speaks English. Last year, scientists discovered a species of porcupine that can be heard by humans, and has been dubbed "Porcupine Man" for his ability to converse with the human race.
----------------------------------------------------------------------------------------------------
6.2. 示例二: OPT
本节中,咱们应用 Meta 最近公布的 OPT 模型 [5],并从驰名的 ResNet 论文 [6] 摘要中选取前两个句子来作为前缀文本。
Deeper neural networks are more difficult to train. We present a residual learning framework to ease the training of networks that are substantially deeper than those used previously.
加载语言模型并筹备前缀文本:
import torch
from transformers import AutoTokenizer, OPTForCausalLM
model_name = r'facebook/opt-1.3b'
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = OPTForCausalLM.from_pretrained(model_name)
prefix_text = r"Deeper neural networks are more difficult to train. We present a residual learning framework to ease the training of networks that are substantially deeper than those used previously."
input_ids = tokenizer(prefix_text, return_tensors='pt').input_ids
6.2.1. 应用贪婪搜寻生成文本
代码:
output = model.generate(input_ids, max_length=256)
print("Output:\n" + 100 *'-')
print(tokenizer.decode(output[0], skip_special_tokens=True))
print("" + 100 *'-')
模型输入:
Output:
----------------------------------------------------------------------------------------------------
Deeper neural networks are more difficult to train. We present a residual learning framework to ease the training of networks that are substantially deeper than those used previously. We show that the residual learning framework can be used to train deep neural networks that are significantly more difficult to train than those used previously. We also show that the residual learning framework can be used to train deep neural networks that are significantly more difficult to train than those used previously.
The paper presents a new residual learning framework for deep neural networks that is based on the concept of residuals. The residuals are the residuals of the network that are not used in the training process. The residuals are computed by taking the residuals of the network that are used in the training process and subtracting the residuals of the network that are not used in the training process. The residuals are then used to train the network. The residuals are computed by taking the residuals of the network that are used in the training process and subtracting the residuals of the network that are not used in the training process. The residuals are then used to train the network. The residuals are computed by taking the residuals of the network that are used in the training process and subtracting the residuals of the
----------------------------------------------------------------------------------------------------
6.2.2. 应用核采样生成文本
代码:
torch.manual_seed(0.)
output = model.generate(input_ids, do_sample=True, max_length=256, top_p=0.95, top_k=0)
print("Output:\n" + 100 *'-')
print(tokenizer.decode(output[0], skip_special_tokens=True))
print("" + 100 *'-')
模型输入:
Output:
----------------------------------------------------------------------------------------------------
Deeper neural networks are more difficult to train. We present a residual learning framework to ease the training of networks that are substantially deeper than those used previously. The theory focuses on several aspects of learning, including the dynamics of replicative and non-replicative aspects of learning. This framework emphasizes learning by entropy. New randomized algorithms enable training networks with residual learning, so that deep networks can be deployed as reliably and as efficiently as their more conventional counterparts.
----------------------------------------------------------------------------------------------------
6.2.3. 应用比照搜寻生成文本
代码:
output = model.generate(input_ids, max_length=256, penalty_alpha=0.6, top_k=6)
print("Output:\n" + 100 *'-')
print(tokenizer.decode(output[0], skip_special_tokens=True))
print("" + 100 *'-')
模型输入:
Output:
----------------------------------------------------------------------------------------------------
Deeper neural networks are more difficult to train. We present a residual learning framework to ease the training of networks that are substantially deeper than those used previously.
In this paper, we propose a model-based residual learning (MBRL) framework that is based on neural networks trained on data that is sparse in terms of dimensionality (e.g., 1, 2, 3, etc.). The network parameters are chosen such that there is a high probability of convergence, i.e., the number of iterations is large enough to minimize the variance of the residuals. This is achieved by training the network on a set of training data, in which the data is sparse in terms of dimensionality, and then discarding the nonparametric part of the data after training is complete.
We show that MBRL outperforms other methods for deep reinforcement learning (RL) and deep convolutional neural networks (CNNs) by a factor of at least 2. In addition, we show that, compared to CNNs, MBRL performs better in two-dimensional (2D) and three-dimensional (3D) cases.
----------------------------------------------------------------------------------------------------
7. 更多资源
无关比照搜寻的更多详细信息,请查看咱们的论文和代码,如下:
- A Contrastive Framework for Neural Text Generation: 论文、官网实现
- Contrastive Search Is What You Need For Neural Text Generation: 论文、官网实现
8. 援用
@inproceedings{su2022a,
title={A Contrastive Framework for Neural Text Generation},
author={Yixuan Su and Tian Lan and Yan Wang and Dani Yogatama and Lingpeng Kong and Nigel Collier},
booktitle={Advances in Neural Information Processing Systems},
editor={Alice H. Oh and Alekh Agarwal and Danielle Belgrave and Kyunghyun Cho},
year={2022},
url={https://openreview.net/forum?id=V88BafmH9Pj}
}
@article{su2022contrastiveiswhatyouneed,
title={Contrastive Search Is What You Need For Neural Text Generation},
author={Su, Yixuan and Collier, Nigel},
journal={arXiv preprint arXiv:2210.14140},
year={2022}
}
参考文献
[1] Su et al., 2022 “A Contrastive Framework for Neural Text Generation”, NeurIPS 2022
[2] Su and Collier, 2022 “Contrastive Search Is What You Need For Neural Text Generation”, Arxiv 2022
[3] Fan et al., 2018 “Hierarchical Neural Story Generation”, ACL 2018
[4] Holtzman et al., 2020 “The Curious Case of Neural Text Degeneration”, ICLR 2020
[5] Zhang et al., 2022 “OPT: Open Pre-trained Transformer Language Models”, Arxiv 2022
[6] He et al., 2016 “Deep Residual Learning for Image Recognition”, CVPR 2016
– 本文由 Yixuan Su 和 Tian Lan 撰写
致谢
咱们要感激 Joao Gante (@joaogante)、Patrick von Platen (@patrickvonplaten) 和 Sylvain Gugger (@sgugger),感激他们在咱们将本文中的比照搜寻集成进 transformers
库的过程中给予的帮忙和领导。
英文原文: https://hf.co/blog/introducing-csearch
原文作者: Tian Lan
译者: Matrix Yao (姚伟峰),英特尔深度学习工程师,工作方向为 transformer-family 模型在各模态数据上的利用及大规模模型的训练推理。
审校/排版: zhongdongy (阿东)
发表回复