最近沉迷於語音喚醒，順便在學術界上把語音喚醒摸個底，稍後可能放出語音喚醒的相關調研報告

帶鏈接的都是有源碼的

按照時間線劃分

第一部分來自arXiv

arXiv 中搜索關鍵詞 “Small-footprint Keyword Spotting” 的 2018 - 2020 的paper

arXiv:2002.10851 [pdf, other]
Small-Footprint Open-Vocabulary Keyword Spotting with Quantized LSTM Networks

arXiv:1912.07575 [pdf, other] cs.CL cs.LG
Predicting detection filters for small footprint open-vocabulary keyword spotting

arXiv:1912.05124 [pdf, other] cs.SD cs.CL cs.LG eess.AS
Small-footprint Keyword Spotting with Graph Convolutional Network

arXiv:1911.02086 [pdf, other] eess.AS cs.CL cs.SD
Small-Footprint Keyword Spotting on Raw Audio Data with Sinc-Convolutions

https://paperswithcode.com/paper/small-footprint-keyword-spotting-on-raw-audio

arXiv:1910.05171 [pdf, other] cs.LG cs.CL eess.AS stat.ML
Query-by-example on-device keyword spotting

arXiv:1907.01448 [pdf, other] eess.AS cs.SD
Sub-band Convolutional Neural Networks for Small-footprint Spoken Term Classification

arXiv:1906.09417 [pdf, other] cs.SD cs.HC cs.LG eess.AS
Keyword Spotting for Hearing Assistive Devices Robust to External Speakers

arXiv:1906.08415 [pdf, other] cs.SD cs.LG cs.MM eess.AS
A Monaural Speech Enhancement Method for Robust Small-Footprint Keyword Spotting

arXiv:1811.07684 [pdf, other] cs.LG cs.CL cs.SD eess.AS stat.ML
Efficient keyword spotting using dilated convolutions and gating

https://paperswithcode.com/paper/efficient-keyword-spotting-using-dilated

arXiv:1811.00348 [pdf, ps, other] cs.SD eess.AS
Sequence-to-sequence Models for Small-Footprint Keyword Spotting

arXiv:1803.10916 [pdf, other] cs.SD cs.CL eess.AS
Attention-based End-to-End Models for Small-Footprint Keyword Spotting

第二部分

知乎、論文、簡書中摘取

2019年

Temporal Convolution for Real-time Keyword Spotting on Mobile Devices
- https://paperswithcode.com/paper/temporal-convolution-for-real-time-keyword

2018年

Shan, et al., “Attention-based end-to-end models for small-footprint keyword spotting”, Interspeech, 2018. 注意力
Zhang H, Zhang J, Wang Y. Sequence-to-sequence models for small-footprint keywordspotting[J]. arXiv preprint arXiv:1811.00348, 2018.
- 基於序列到序列的喚醒詞識別模型
Deep residual learning for small-footprint keyword spotting[C].IEEE InternationalConference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, Calgary, AB, Canada,Apr.15-20, 2018: 5484-5488
- https://paperswithcode.com/paper/deep-residual-learning-for-small-footprint
- 深度殘差學習和擴展卷積的喚醒詞識別方法

2017 年

Audhkhasi, et al., “End-to-end ASR-free keyword search from speech”, ICASSP, 2017.
- 使用一個 CRNN 語言模型把喚醒詞編碼成一個嵌入向量。
Honk: A PyTorch Reimplementation of Convolutional Neural Networks for Keyword Spotting
- https://paperswithcode.com/paper/honk-a-pytorch-reimplementation-of
He, et al., “Streaming small-footprint keyword spotting using sequence-to-sequence models”, ASRU, 2017.
- 基於 RNN 的端到端訓練的序列到序列的喚醒詞模型
Arık, et al., “Convolutional recurrent neural networks for small-footprint keyword spotting”, arxiv:1703.05390. 百度
基於CRNN 的喚醒詞識別方法
Hello Edge: Keyword Spotting on Microcontrollers
- https://paperswithcode.com/paper/hello-edge-keyword-spotting-on
F. Ge and Y. Yan, “Deep neural network based wake-up-word speech recognition with two-stage detection”, ICASSP, 2017.
- 固定長度的嵌入向量，用序列形式
- 基於DNN的兩階段檢測的喚醒詞識別系統
Compressed time delay neural network for small-footprint keyword spotting - 2017 INTERSPEECH
- 爲了解決 DNN 帶來的搜索延遲和低階特性
- 低秩權重矩陣改進了 DNN 網絡 23
Kumatani, et al., “Direct modeling of raw audio with DNNs for wake word detection”, ASRU, 2017.
提取MFCC特徵通過DNN進行訓練，類似的有陳果果2014

2016年

Sun M, Raju A, Tucker G, et al. Max-pooling loss training of long short-term memory networksfor small-footprint keyword spotting[C].IEEE Spoken Language Technology Workshop (SLT).IEEE, San Diego, CA, USA, Dec.13-16, 2016: 474-480.
- 用後驗平滑的評估方法估計喚醒詞識別性能
- 最大池化的損失函數訓練 LSTM 網絡
“Investigating neural network based query-by-example keyword spotting approach for personalized wake-up word detection in Mandarin Chinese”, Int’l Symposium on Chinese Spoken Language Processing, 2016.
- 提出模板匹配，LSTM提取特徵，固定長度和特徵向量

2015年

T. N. Sainath and C. Parada, “Convolutional neural networks for small-footprint keyword spotting”, Interspeech, 2015.
- 基於 CNN 的喚醒詞識別的方法
Chen, et al., “Query-by-example keyword spotting using long short-term memory networks”, ICASSP, 2015.
先用神經網絡提取特徵然後用時間動態規整對喚醒詞進行判斷

2014年

G. Chen, et al., “Small-footprint keyword spotting using deep neural networks”, ICASSP, 2014.
- 經典，DNN，陳果果，拜讀

other 往前就是傳統的文章了，暫時不建議閱讀

2006年，提出喚醒詞和喚醒詞識別

2009年，韻律特徵研究

HMM 訓練聲學模型，用SVM劃分是否喚醒詞

動態時間規整算法

模板匹配，距離測量

麥克風陣列檢測喚醒詞

2014年，嵌入式平臺的喚醒詞識別系統開發