Hololens上语音输入有三种形式,分别是:
- 语音命令 Voice Command
- 听写 Diction
- 语法识别 Grammar Recognizer
在 HoloLens开发手记 - 语音识别(语音命令) 博客已经介绍了 Voice Command 的用法。本文将介绍听写的用法:
听写识别 Diction
听写就是语音转化成文字 (Speech to Text)。此特性在HoloLens上使用的场所一般多用于需要用到键入文字的地方,例如在HoloLens中使用 Edge 搜索时,由于在HoloLens上一般是非常规的物理键盘输入,使用手势点按虚拟键盘键入文字的具体操作需要用户转动头部将Gaze射线光标定位到想输入的虚拟键盘字母上,再用Gesture点按手势确认选定此字母,由此可见还是有极大的不便性。
所以语音转为文字实现键入内容的操作将能大大提高效率。
听写特性用于将用户语音转为文字输入,同时支持内容推断和事件注册特性。Start()和Stop()方法用于启用和禁用听写功能,在听写结束后需要调用Dispose()方法来关闭听写页面。GC会自动回收它的资源,如果不Dispose会带来额外的性能开销。
使用听写识别应该注意的是:
- 在你的应用中必须打开 Microphone 特性。设置如下:Edit -> Project Settings -> Player -> Windows Store -> Publishing Settings > Capabilities 中确认勾上Microphone。
- 必须确认HoloLens连接上了wifi,这样听写识别才能工作。
DictationRecognizer.cs
using HoloToolkit;
using System.Collections;
using System.Text;
using UnityEngine;
using UnityEngine.UI;
using UnityEngine.Windows.Speech;
public class MicrophoneManager : MonoBehaviour
{
[Tooltip("A text area for the recognizer to display the recognized strings.")]
public Text DictationDisplay;
private DictationRecognizer dictationRecognizer;
// Use this string to cache the text currently displayed in the text box.
//使用此字符串可以缓存当前显示在文本框中的文本。
private StringBuilder textSoFar;
void Awake()
{
/* TODO: DEVELOPER CODING EXERCISE 3.a */
//Create a new DictationRecognizer and assign it to dictationRecognizer variable.
dictationRecognizer = new DictationRecognizer();
//Register for dictationRecognizer.DictationHypothesis and implement DictationHypothesis below
// This event is fired while the user is talking. As the recognizer listens, it provides text of what it's heard so far.
//注册听写假设事件。此事件在用户说话时触发。当识别器收听时,提供到目前为止所听到的内容文本
dictationRecognizer.DictationHypothesis += DictationRecognizer_DictationHypothesis;
//Register for dictationRecognizer.DictationResult and implement DictationResult below
// This event is fired after the user pauses, typically at the end of a sentence. The full recognized string is returned here.
//注册听写结果事件。此事件在用户暂停后触发,通常在句子的结尾处,返回完整的已识别字符串
dictationRecognizer.DictationResult += DictationRecognizer_DictationResult;
//Register for dictationRecognizer.DictationComplete and implement DictationComplete below
// This event is fired when the recognizer stops, whether from Stop() being called, a timeout occurring, or some other error.
//注册听写完成事件。无论是调用Stop()函数、发生超时或者其他的错误使得识别器停止都会触发此事件
dictationRecognizer.DictationComplete += DictationRecognizer_DictationComplete;
//Register for dictationRecognizer.DictationError and implement DictationError below
// This event is fired when an error occurs.
//注册听写错误事件。当发生错误时调用此事件,通常是为连接网络或者在识别过程中网络发生中断等时产生错误
dictationRecognizer.DictationError += DictationRecognizer_DictationError;
// Shutdown the PhraseRecognitionSystem. This controls the KeywordRecognizers
//PhraseRecognitionSystem控制的是KeywordRecognizers,关闭语音命令关键字识别。只有在关闭这个后才能开启听写识别
PhraseRecognitionSystem.Shutdown();
//Start dictationRecognizer
//开启听写识别
dictationRecognizer.Start();
}
/// <summary>
/// This event is fired while the user is talking. As the recognizer listens, it provides text of what it's heard so far.
/// </summary>
/// <param name="text">The currently hypothesized recognition.</param>
private void DictationRecognizer_DictationHypothesis(string text)
{
// Set DictationDisplay text to be textSoFar and new hypothesized text
// We don't want to append to textSoFar yet, because the hypothesis may have changed on the next event
DictationDisplay.text = textSoFar.ToString() + " " + text + "...";
}
/// <summary>
/// This event is fired after the user pauses, typically at the end of a sentence. The full recognized string is returned here.
/// </summary>
/// <param name="text">The text that was heard by the recognizer.</param>
/// <param name="confidence">A representation of how confident (rejected, low, medium, high) the recognizer is of this recognition.</param>
private void DictationRecognizer_DictationResult(string text, ConfidenceLevel confidence)
{
// 3.a: Append textSoFar with latest text
textSoFar.Append(text + "");
// 3.a: Set DictationDisplay text to be textSoFar
DictationDisplay.text = textSoFar.ToString();
}
/// <summary>
/// This event is fired when the recognizer stops, whether from Stop() being called, a timeout occurring, or some other error.
/// Typically, this will simply return "Complete". In this case, we check to see if the recognizer timed out.
/// </summary>
/// <param name="cause">An enumerated reason for the session completing.</param>
private void DictationRecognizer_DictationComplete(DictationCompletionCause cause)
{
// If Timeout occurs, the user has been silent for too long.
// With dictation, the default timeout after a recognition is 20 seconds.
// The default timeout with initial silence is 5 seconds.
//如果在听写开始后第一个5秒内没听到任何声音,将会超时
//如果识别到了一个结果但是之后20秒没听到任何声音,也会超时
if (cause == DictationCompletionCause.TimeoutExceeded)
{
Microphone.End(deviceName);
DictationDisplay.text = "Dictation has timed out. Please press the record button again.";
SendMessage("ResetAfterTimeout");
}
}
/// <summary>
/// This event is fired when an error occurs.
/// </summary>
/// <param name="error">The string representation of the error reason.</param>
/// <param name="hresult">The int representation of the hresult.</param>
private void DictationRecognizer_DictationError(string error, int hresult)
{
// 3.a: Set DictationDisplay text to be the error string
DictationDisplay.text = error + "\nHRESULT: " + hresult;
}
// Update is called once per frame
void Update () {
}
void OnDestroy()
{
dictationRecognizer.Stop();
dictationRecognizer.DictationHypothesis -= DictationRecognizer_DictationHypothesis;
dictationRecognizer.DictationResult -= DictationRecognizer_DictationResult;
dictationRecognizer.DictationComplete -= DictationRecognizer_DictationComplete;
dictationRecognizer.DictationError -= DictationRecognizer_DictationError;
dictationRecognizer.Dispose();
}
}
HoloLens只能运行单个语音识别 (run at a time),所以若要使用听写识别的话,必须要关闭KeywordRecognizer。
DictationRecognizer中设置有两个超时:
- 如果识别器启用并且在5秒内没有听到任何声音,将会超时。
- 如果识别器识别到了结果,但是在20秒内没有听到声音,将会超时。