简介
语音辨认是一项将语音转换为文本的技术,设想一下它如何在游戏中发挥作用?收回命令操纵控制面板或者游戏角色、间接与 NPC 对话、晋升交互性等等,都有可能。本文将介绍如何应用 Hugging Face Unity API 在 Unity 游戏中集成 SOTA 语音辨认性能。
您能够拜访 itch.io 网站 下载 Unity 游戏样例,亲自尝试一下语音辨认性能。
先决条件
浏览文本可能须要理解一些 Unity 的基本概念。除此之外,您还需装置 Hugging Face Unity API,能够点击 之前的博文 浏览 API 装置阐明。
步骤
1. 设置场景
在本教程中,咱们将设置一个非常简单的场景。玩家能够点击按钮来开始或进行录制语音,辨认音频并转换为文本。
首先咱们新建一个 Unity 我的项目,而后创立一个蕴含三个 UI 组件的画布 (Canvas):
- 开始按钮: 按下以开始录制语音。
- 进行按钮: 按下以进行录制语音。
- 文本组件 (TextMeshPro): 显示语音辨认后果文本的中央。
2. 创立脚本
创立一个名为 SpeechRecognitionTest
的脚本,并将其附加到一个空的游戏对象 (GameObject) 上。
在脚本中,首先定义对 UI 组件的援用:
[SerializeField] private Button startButton;[SerializeField] private Button stopButton;[SerializeField] private TextMeshProUGUI text;
在 inspector 窗口中调配对应组件。
而后,应用 Start()
办法为开始和进行按钮设置监听器:
private void Start() { startButton.onClick.AddListener(StartRecording); stopButton.onClick.AddListener(StopRecording);}
此时,脚本中的代码应该如下所示:
using TMPro;using UnityEngine;using UnityEngine.UI;public class SpeechRecognitionTest : MonoBehaviour { [SerializeField] private Button startButton; [SerializeField] private Button stopButton; [SerializeField] private TextMeshProUGUI text; private void Start() { startButton.onClick.AddListener(StartRecording); stopButton.onClick.AddListener(StopRecording); } private void StartRecording() { } private void StopRecording() { }}
3. 录制麦克风语音输入
当初,咱们来录制麦克风语音输入,并将其编码为 WAV 格局。这里须要先定义成员变量:
private AudioClip clip;private byte[] bytes;private bool recording;
而后,在 StartRecording()
中,应用 Microphone.Start()
办法实现开始录制语音的性能:
private void StartRecording() { clip = Microphone.Start(null, false, 10, 44100); recording = true;}
下面代码实现以 44100 Hz 录制最长为 10 秒的音频。
当录音时长达到 10 秒的最大限度,咱们心愿录音行为主动进行。为此,须要在 Update()
办法中写上以下内容:
private void Update() { if (recording && Microphone.GetPosition(null) >= clip.samples) { StopRecording(); }}
接着,在 StopRecording()
中,截取录音片段并将其编码为 WAV 格局:
private void StopRecording() { var position = Microphone.GetPosition(null); Microphone.End(null); var samples = new float[position * clip.channels]; clip.GetData(samples, 0); bytes = EncodeAsWAV(samples, clip.frequency, clip.channels); recording = false;}
最初,咱们须要实现音频编码的 EncodeAsWAV()
办法,这里间接应用 Hugging Face API,只须要将音频数据筹备好即可:
private byte[] EncodeAsWAV(float[] samples, int frequency, int channels) { using (var memoryStream = new MemoryStream(44 + samples.Length * 2)) { using (var writer = new BinaryWriter(memoryStream)) { writer.Write("RIFF".ToCharArray()); writer.Write(36 + samples.Length * 2); writer.Write("WAVE".ToCharArray()); writer.Write("fmt ".ToCharArray()); writer.Write(16); writer.Write((ushort)1); writer.Write((ushort)channels); writer.Write(frequency); writer.Write(frequency * channels * 2); writer.Write((ushort)(channels * 2)); writer.Write((ushort)16); writer.Write("data".ToCharArray()); writer.Write(samples.Length * 2); foreach (var sample in samples) { writer.Write((short)(sample * short.MaxValue)); } } return memoryStream.ToArray(); }}
残缺的脚本如下所示:
using System.IO;using TMPro;using UnityEngine;using UnityEngine.UI;public class SpeechRecognitionTest : MonoBehaviour { [SerializeField] private Button startButton; [SerializeField] private Button stopButton; [SerializeField] private TextMeshProUGUI text; private AudioClip clip; private byte[] bytes; private bool recording; private void Start() { startButton.onClick.AddListener(StartRecording); stopButton.onClick.AddListener(StopRecording); } private void Update() { if (recording && Microphone.GetPosition(null) >= clip.samples) { StopRecording(); } } private void StartRecording() { clip = Microphone.Start(null, false, 10, 44100); recording = true; } private void StopRecording() { var position = Microphone.GetPosition(null); Microphone.End(null); var samples = new float[position * clip.channels]; clip.GetData(samples, 0); bytes = EncodeAsWAV(samples, clip.frequency, clip.channels); recording = false; } private byte[] EncodeAsWAV(float[] samples, int frequency, int channels) { using (var memoryStream = new MemoryStream(44 + samples.Length * 2)) { using (var writer = new BinaryWriter(memoryStream)) { writer.Write("RIFF".ToCharArray()); writer.Write(36 + samples.Length * 2); writer.Write("WAVE".ToCharArray()); writer.Write("fmt ".ToCharArray()); writer.Write(16); writer.Write((ushort)1); writer.Write((ushort)channels); writer.Write(frequency); writer.Write(frequency * channels * 2); writer.Write((ushort)(channels * 2)); writer.Write((ushort)16); writer.Write("data".ToCharArray()); writer.Write(samples.Length * 2); foreach (var sample in samples) { writer.Write((short)(sample * short.MaxValue)); } } return memoryStream.ToArray(); } }}
如要测试该脚本代码是否失常运行,您能够在 StopRecording()
办法开端增加以下代码:
File.WriteAllBytes(Application.dataPath + "/test.wav", bytes);
好了,当初您点击 Start
按钮,而后对着麦克风谈话,接着点击 Stop
按钮,您录制的音频将会保留为 test.wav
文件,位于工程目录的 Unity 资产文件夹中。
4. 语音辨认
接下来,咱们将应用 Hugging Face Unity API 对编码音频实现语音辨认。为此,咱们创立一个 SendRecording()
办法:
using HuggingFace.API;private void SendRecording() { HuggingFaceAPI.AutomaticSpeechRecognition(bytes, response => { text.color = Color.white; text.text = response; }, error => { text.color = Color.red; text.text = error; });}
该办法实现将编码音频发送到语音辨认 API,如果发送胜利则以红色显示响应,否则以红色显示谬误音讯。
别忘了在 StopRecording()
办法的开端调用 SendRecording()
:
private void StopRecording() { /* other code */ SendRecording();}
5. 最初润色
最初来晋升一下用户体验,这里咱们应用交互性按钮和状态音讯。
开始和进行按钮应该仅在适当的时候才产生交互成果,比方: 筹备录制、正在录制、进行录制。
在录制语音或期待 API 返回辨认后果时,咱们能够设置一个简略的响应文原本显示对应的状态信息。
残缺的脚本如下所示:
using System.IO;using HuggingFace.API;using TMPro;using UnityEngine;using UnityEngine.UI;public class SpeechRecognitionTest : MonoBehaviour { [SerializeField] private Button startButton; [SerializeField] private Button stopButton; [SerializeField] private TextMeshProUGUI text; private AudioClip clip; private byte[] bytes; private bool recording; private void Start() { startButton.onClick.AddListener(StartRecording); stopButton.onClick.AddListener(StopRecording); stopButton.interactable = false; } private void Update() { if (recording && Microphone.GetPosition(null) >= clip.samples) { StopRecording(); } } private void StartRecording() { text.color = Color.white; text.text = "Recording..."; startButton.interactable = false; stopButton.interactable = true; clip = Microphone.Start(null, false, 10, 44100); recording = true; } private void StopRecording() { var position = Microphone.GetPosition(null); Microphone.End(null); var samples = new float[position * clip.channels]; clip.GetData(samples, 0); bytes = EncodeAsWAV(samples, clip.frequency, clip.channels); recording = false; SendRecording(); } private void SendRecording() { text.color = Color.yellow; text.text = "Sending..."; stopButton.interactable = false; HuggingFaceAPI.AutomaticSpeechRecognition(bytes, response => { text.color = Color.white; text.text = response; startButton.interactable = true; }, error => { text.color = Color.red; text.text = error; startButton.interactable = true; }); } private byte[] EncodeAsWAV(float[] samples, int frequency, int channels) { using (var memoryStream = new MemoryStream(44 + samples.Length * 2)) { using (var writer = new BinaryWriter(memoryStream)) { writer.Write("RIFF".ToCharArray()); writer.Write(36 + samples.Length * 2); writer.Write("WAVE".ToCharArray()); writer.Write("fmt ".ToCharArray()); writer.Write(16); writer.Write((ushort)1); writer.Write((ushort)channels); writer.Write(frequency); writer.Write(frequency * channels * 2); writer.Write((ushort)(channels * 2)); writer.Write((ushort)16); writer.Write("data".ToCharArray()); writer.Write(samples.Length * 2); foreach (var sample in samples) { writer.Write((short)(sample * short.MaxValue)); } } return memoryStream.ToArray(); } }}
恭喜!当初您能够在 Unity 游戏中集成 SOTA 语音辨认性能了!
如果您有任何疑难,或想更多地参加 Hugging Face for Games 系列,能够退出 Hugging Face Discord 频道!
英文原文: https://hf.co/blog/unity-asr
作者: Dylan Ebert
译者: SuSung-boy
审校/排版: zhongdongy (阿东)