语音识别领域三十年来重要论文合集及其下载地址

整理：公众号【深度学习每日摘要】

语音识别的研究历史已经有三十多年了，从上个世纪八十年代的隐马尔可夫模型，到二十一世纪初的帧级别的深度神经网络模型，到2006年的CTC模型，到2012年的深度循环神经网络模型，再到2014年的注意力机制运用到语音识别，2015年基于seq2seq模型的语音识别系统也被提出，再到2016年深度卷积神经网络被用于大规模的语音识别系统。语音识别系统从最初的手动提取特征到如今的端对端的神经网络模型，准确率已经接近97%。

本文列举了自从1982年至今语音识别领域的相关论文，涵盖了以上所有的模型，同时附上第一作者信息以及pdf文件下载链接。

论文清单已经按照发表年份以及首字母排序，完整论文清单以及下载链接请访问：

https://github.com/zzw922cn/awesome-speech-recognition-papers

An Introduction to the Application of the Theory of Probabilistic Functions of a Markov Process to Automatic Speech Recognition(1982), S. E. LEVINSON et al. [pdf]

A Maximum Likelihood Approach to Continuous Speech Recognition(1983), LALIT R. BAHL et al. [pdf]

Heterogeneous Acoustic Measurements and Multiple Classifiers for Speech Recognition(1986), Andrew K. Halberstadt. [pdf]

Maximum Mutual Information Estimation of Hidden Markov Model Parameters for Speech Recognition(1986), Lalit R. Bahi et al. [pdf]

Hidden Markov Models for Speech Recognition(1991), B. H. Juang et al. [pdf]

Framewise phoneme classification with bidirectional LSTM and other neural network architectures(2005), Alex Graves et al. [pdf]

Applying Convolutional Neural Networks concepts to hybrid NN-HMM model for speech recognition(2012), Ossama Abdel-Hamid et al. [pdf]

Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks(2006), Alex Graves et al. [pdf]

Context-Dependent Pre-Trained Deep Neural Networks for Large-Vocabulary Speech Recognition(2012), George E. Dahl et al. [pdf]

Deep Neural Networks for Acoustic Modeling in Speech Recognition(2012), Geoffrey Hinton et al. [pdf]

Sequence Transduction with Recurrent Neural Networks(2012), Alex Graves et al. [pdf]

Deep convolutional neural networks for LVCSR(2013), Tara N. Sainath et al. [pdf]

Improving deep neural networks for LVCSR using rectified linear units and dropout(2013), George E. Dahl et al. [pdf]

Improving low-resource CD-DNN-HMM using dropout and multilingual DNN training(2013), Yajie Miao et al. [pdf]

Improvements to deep convolutional neural networks for LVCSR(2013), Tara N. Sainath et al. [pdf]

Machine Learning Paradigms for Speech Recognition: An Overview(2013), Li Deng et al. [pdf]

Recent advances in deep learning for speech research at Microsoft(2013), Li Deng et al. [pdf]

Speech recognition with deep recurrent neural networks(2013), Alex Graves et al. [pdf]

Convolutional deep maxout networks for phone recognition(2014), László Tóth et al. [pdf]

Convolutional Neural Networks for Speech Recognition(2014), Ossama Abdel-Hamid et al. [pdf]

Combining time- and frequency-domain convolution in convolutional neural network-based phone recognition(2014), László Tóth. [pdf]

Deep Speech: Scaling up end-to-end speech recognition(2014), Awni Y. Hannun et al. [pdf]

End-to-end Continuous Speech Recognition using Attention-based Recurrent NN: First Results(2014), Jan Chorowski et al. [pdf]

First-Pass Large Vocabulary Continuous Speech Recognition using Bi-Directional Recurrent DNNs(2014), Andrew L. Maas et al. [pdf]

Long short-term memory recurrent neural network architectures for large scale acoustic modeling(2014), Hasim Sak et al. [pdf]

Robust CNN-based speech recognition with Gabor filter kernels(2014), Shuo-Yiin Chang et al. [pdf]

Stochastic pooling maxout networks for low-resource speech recognition(2014), Meng Cai et al. [pdf]

Towards End-to-End Speech Recognition with Recurrent Neural Networks(2014), Alex Graves et al. [pdf]

Attention-Based Models for Speech Recognition(2015), Jan Chorowski et al. [pdf]

Analysis of CNN-based speech recognition system using raw speech as input(2015), Dimitri Palaz et al. [pdf]

Convolutional, Long Short-Term Memory, fully connected Deep Neural Networks(2015), Tara N. Sainath et al. [pdf]

Deep convolutional neural networks for acoustic modeling in low resource languages(2015), William Chan et al. [pdf]

Deep Neural Networks for Single-Channel Multi-Talker Speech Recognition(2015), Chao Weng et al. [pdf]

Fast and Accurate Recurrent Neural Network Acoustic Models for Speech Recognition(2015), Hasim Sak et al. [pdf]

Listen, Attend and Spell(2015), William Chan et al. [pdf]

Online Sequence Training of Recurrent Neural Networks with Connectionist Temporal Classification(2015), Kyuyeon Hwang et al. [pdf]

Advances in All-Neural Speech Recognition(2016), Geoffrey Zweig et al. [pdf]

Advances in Very Deep Convolutional Neural Networks for LVCSR(2016), Tom Sercu et al. [pdf]

End-to-end attention-based large vocabulary speech recognition(2016), Dzmitry Bahdanau et al. [pdf]

Deep Convolutional Neural Networks with Layer-Wise Context Expansion and Attention(2016), Dong Yu et al. [pdf]

Deep Speech 2: End-to-End Speech Recognition in English and Mandarin(2016), Dario Amodei et al. [pdf]

End-to-end attention-based distant speech recognition with Highway LSTM(2016), Hassan Taherian. [pdf]

Joint CTC-Attention based End-to-End Speech Recognition using Multi-task Learning(2016), Suyoun Kim et al. [pdf]

Listen, attend and spell: A neural network for large vocabulary conversational speech recognition(2016), William Chan et al. [pdf]

Latent Sequence Decompositions(2016), William Chan et al. [pdf]

Segmental Recurrent Neural Networks for End-to-End Speech Recognition(2016), Liang Lu et al. [pdf]

Towards better decoding and language model integration in sequence to sequence models(2016), Jan Chorowski et al. [pdf]

Very Deep Convolutional Neural Networks for Noise Robust Speech Recognition(2016), Yanmin Qian et al. [pdf]

Very Deep Convolutional Networks for End-to-End Speech Recognition(2016), Yu Zhang et al. [pdf]

Very deep multilingual convolutional neural networks for LVCSR(2016), Tom Sercu et al. [pdf]

Wav2Letter: an End-to-End ConvNet-based Speech Recognition System(2016), Ronan Collobert et al. [pdf]

WaveNet: A Generative Model for Raw Audio(2016), Aäron van den Oord et al. [pdf]

An enhanced automatic speech recognition system for Arabic(2017), Mohamed Amine Menacer et al. [pdf]

A network of deep neural networks for distant speech recognition(2017), Mirco Ravanelli et al. [pdf]

An Unsupervised Speaker Clustering Technique based on SOM and I-vectors for Speech Recognition Systems(2017), Hany Ahmed et al. [pdf]

Building DNN acoustic models for large vocabulary speech recognition(2017), Andrew L. Maas et al. [pdf]

Direct Acoustics-to-Word Models for English Conversational Speech Recognition(2017), Kartik Audhkhasi et al. [pdf]

English Conversational Telephone Speech Recognition by Humans and Machines(2017), George Saon et al. [pdf]

ESE: Efficient Speech Recognition Engine with Sparse LSTM on FPGA(2017), Song Han et al. [pdf]

Deep LSTM for Large Vocabulary Continuous Speech Recognition(2017), Xu Tian et al. [pdf]

Multichannel End-to-end Speech Recognition(2017), Tsubasa Ochiai et al. [pdf]

Multi-task Learning with CTC and Segmental CRF for Speech Recognition(2017), Liang Lu et al. [pdf]

Multichannel Signal Processing With Deep Neural Networks for Automatic Speech Recognition(2017), Tara N. Sainath et al. [pdf]

Residual Convolutional CTC Networks for Automatic Speech Recognition(2017), Yisen Wang et al. [pdf]

最后编辑于：2017.12.07 03:24:37

人面猴
序言：七十年代末，一起剥皮案震惊了整个滨河市，随后出现的几起案子，更是在滨河造成了极大的恐慌，老刑警刘岩，带你破解...
沈念sama阅读 206,482评论 6赞 481
死咒
序言：滨河连续发生了三起死亡事件，死亡现场离奇诡异，居然都是意外死亡，警方通过查阅死者的电脑和手机，发现死者居然都...
沈念sama阅读 88,377评论 2赞 382
救了他两次的神仙让他今天三更去死
文/潘晓璐我一进店门，熙熙楼的掌柜王于贵愁眉苦脸地迎上来，“玉大人，你说我怎么就摊上这事。” “怎么了？”我有些...
开封第一讲书人阅读 152,762评论 0赞 342
道士缉凶录：失踪的卖姜人
文/不坏的土叔我叫张陵，是天一观的道长。经常有香客问我，道长，这世上最难降的妖魔是什么？我笑而不...
开封第一讲书人阅读 55,273评论 1赞 279
港岛之恋（遗憾婚礼）
正文为了忘掉前任，我火速办了婚礼，结果婚礼上，老公的妹妹穿的比我还像新娘。我一直安慰自己，他们只是感情好，可当我...
茶点故事阅读 64,289评论 5赞 373
恶毒庶女顶嫁案：这布局不是一般人想出来的
文/花漫我一把揭开白布。她就那样静静地躺着，像睡着了一般。火红的嫁衣衬着肌肤如雪。梳的纹丝不乱的头发上，一...
开封第一讲书人阅读 49,046评论 1赞 285
城市分裂传说
那天，我揣着相机与录音，去河边找鬼。笑死，一个胖子当着我的面吹牛，可吹牛的内容都是我干的。我是一名探鬼主播，决...
沈念sama阅读 38,351评论 3赞 400
双鸳鸯连环套：你想象不到人心有多黑
文/苍兰香墨我猛地睁开眼，长吁一口气：“原来是场噩梦啊……” “哼！你这毒妇竟也来了？” 一声冷哼从身侧响起，我...
开封第一讲书人阅读 36,988评论 0赞 259
万荣杀人案实录
序言：老挝万荣一对情侣失踪，失踪者是张志新（化名）和其女友刘颖，没想到半个月后，有当地人在树林里发现了一具尸体，经...
沈念sama阅读 43,476评论 1赞 300
护林员之死
正文独居荒郊野岭守林人离奇死亡，尸身上长有42处带血的脓包…… 初始之章·张勋以下内容为张勋视角年9月15日...
茶点故事阅读 35,948评论 2赞 324
白月光启示录
正文我和宋清朗相恋三年，在试婚纱的时候发现自己被绿了。大学时的朋友给我发了我未婚夫和他白月光在一起吃饭的照片。...
茶点故事阅读 38,064评论 1赞 333
活死人
序言：一个原本活蹦乱跳的男人离奇死亡，死状恐怖，灵堂内的尸体忽然破棺而出，到底是诈尸还是另有隐情，我是刑警宁泽，带...
沈念sama阅读 33,712评论 4赞 323
日本核电站爆炸内幕
正文年R本政府宣布，位于F岛的核电站，受9级特大地震影响，放射性物质发生泄漏。R本人自食恶果不足惜，却给世界环境...
茶点故事阅读 39,261评论 3赞 307
男人毒药：我在死后第九天来索命
文/蒙蒙一、第九天我趴在偏房一处隐蔽的房顶上张望。院中可真热闹，春花似锦、人声如沸。这庄子的主人今日做“春日...
开封第一讲书人阅读 30,264评论 0赞 19
一桩弑父案，背后竟有这般阴谋
文/苍兰香墨我抬头看了看天上的太阳。三九已至，却和暖如春，着一层夹袄步出监牢的瞬间，已是汗流浃背。一阵脚步声响...
开封第一讲书人阅读 31,486评论 1赞 262
情欲美人皮
我被黑心中介骗来泰国打工，没想到刚下飞机就差点儿被人妖公主榨干…… 1. 我叫王不留，地道东北人。一个月前我还...
沈念sama阅读 45,511评论 2赞 354
代替公主和亲
正文我出身青楼，却偏偏与公主长得像，于是被迫代替她去往敌国和亲。传闻我的和亲对象是个残疾皇子，可洞房花烛夜当晚...
茶点故事阅读 42,802评论 2赞 345

语音识别领域三十年来重要论文合集及其下载地址

推荐阅读更多精彩内容