简介
机器学习需要大量的数据,语音标注工具可以对录制的音频进行文本标注、清洗,以应用于语音识别、语音唤醒所需的数据。
开源工具
ritazh/EchoML
https://github.com/ritazh/EchoML
A web app to play, visualize, and annotate your audio files for machine learning。
但其只支持Azure blob storage作为后台存储。
Activation: 2018.08
语言: Javascript、Typescript
wavesurfer.js
https://github.com/katspaugh/wavesurfer.js
Navigable waveform built on Web Audio and Canvas wavesurfer-js.org
作为Web呈现操作组件的好工具。
Activation: 2019.07
CrowdCurio/audio-annotator
https://github.com/CrowdCurio/audio-annotator
A JavaScript interface for annotating and labeling audio files.
Developed by StefanieMikloska, CrowdLab @ Univertsity of Waterloo and MARL @ New York University.
Activation: 2017.08
dynilib/dynitag
Dynitag is a web-based collaborative audio annotator tool, heavily based on CrowdCurio/audio-annotator.
基于Fork进行几十个commit改进。
Activation: 2018.07
ShivarajMeti/cicada-audio-annotation-tool
https://github.com/ShivarajMeti/cicada-audio-annotation-tool
Cicada is an open source audio annotation tool that lets you annotate audio files in .wav format and also enables you to look into each audio spectrogram while annotating
纯Python程序。界面不是太友好。requirement.txt加了很多不相关的,比如Darkflow。
Activation: 2019.06
标注平台
京东众智
百度众包