Facebook AI 提出改進跨語言遷移學習的新方法,以實現端到端語音識別。

{"type":"doc","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"從高資源語言進行遷移學習是一種提高低資源語言端到端自動語音識別(automatic speech recognition,ASR)的有效方法。然而,預訓練編碼器 \/ 解碼器模型並不能共享同一語言的語言模型,這使得它不適用於外來目標語言。爲進一步吸收目標語言的知識,並能從目標語言轉換,語音到文本翻譯(speech-to-text translation,ST)是輔助任務。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"該方法通過語音到文本翻譯作爲中間步驟,改進了針對端到端自動語音識別的跨語言(高資源到低資源)遷移學習。它使學習遷移成爲一個兩步過程,提高了模型的性能。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"目前,該方法是基於注意力的編碼器 \/ 解碼器結構。然而,該團隊打算將這種遷移學習方法擴展到端到端架構,如 CTC 和 RNN Transducer。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.infoq.cn\/resource\/image\/a0\/cf\/a05581b794875b1b36a1497c68d9cbcf.jpg","alt":null,"title":"","style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":"","fromPaste":false,"pastePass":false}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"與之前利用轉換數據方法不同,這種方法不需要對自動語言識別模型架構進行任何修改。語言到文本翻譯和目標自動語言識別都具有相同的基於注意力的編碼器 \/ 解碼器架構和詞彙表。高資源的自動語音識別轉錄本被翻譯成目標低資源語言來訓練語音到文本翻譯模型。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"這種方法不使用文本到文本的轉換數據進行語音到文本翻譯訓練,而是利用語音到文本翻譯的數據,這避免了編碼器中的語音到文本的模態自適應。它僅利用了機器翻譯僞標籤來訓練語音到文本翻譯,並且不需要高資源的機器翻譯訓練數據。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"這表明,用人工翻譯訓練語音到文本翻譯並不是必要的,因爲用機器翻譯僞標籤訓練語音到文本翻譯可以帶來兼容的結果。這就克服了實際語音到文本翻譯數據的不足,並且不斷地給遷移學習帶來收益。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"基於僞標籤訓練的語音翻譯"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"詞級或序列級知識提取(knowledge distillaction,KD)降低了噪聲,簡化了訓練集中的數據分佈,有助於訓練機器翻譯和語音到文本翻譯模型。訓練端到端的語音到文本翻譯模型很有挑戰性,因爲它們需要同時學習聲學建模、語言建模和校準。此外,語音到文本翻譯標籤的獲取成本更高。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"現有語音到文本翻譯的文本語料庫的規模和語言集的侷限性,使得訓練語音到文本翻譯模型變得更困難。因此,團隊提出了基於機器翻譯僞標籤自動語音識別語料庫,提供了一個更加多樣化、更大規模的數據集來訓練語音到文本翻譯。用機器翻譯僞標籤訓練的語音到文本翻譯模型可以被識別爲序列級知識提取過程。雖然僞標籤可能會降低模型的效率,但真正的標籤很難學習,而僞標籤是一個舒適的選擇。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"實驗還表明,使用僞標籤訓練的語音到文本翻譯模型比使用實際標籤的模型表現更好。機器翻譯僞標籤還簡化了語音到文本翻譯模型訓練,並允許束搜索(beam-searching)各種標籤以減少過擬合。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.infoq.cn\/resource\/image\/bd\/0a\/bdc66fe611be673db353058718c1b10a.jpg","alt":null,"title":"","style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":"","fromPaste":false,"pastePass":false}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"語音翻譯的預訓練自動語音識別"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"目標自動語音識別是對源到目標的語音到文本翻譯進行預訓練,而不是直接在(多語言)源(高資源)自動語音識別上進行預訓練目標(低資源)自動語音識別。後者根據源自動語音識別進行預訓練,並利用源自動語音識別數據上利用機器翻譯僞標籤進行訓練。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"這種分兩步進行的方法有助於將語言建模(解碼器)和聲學建模(編碼器)的傳遞解耦,使得遷移學習變得無故障且更加有效。用自動語音識別對語音到文本翻譯進行預訓練可以熱啓動聲學建模,"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"因此,語音到文本翻譯訓練可以專注於學習語言建模和對齊。語音到文本翻譯模型利用目標語言的附加數據(機器學習僞標籤),從而更好地對目標語言進行建模。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"自動語音識別和語音到文本翻譯模型使用相同的模型架構以便於遷移: ASRSource → STSource-Target 1 and STSource-Target → ASRTarget."}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"論文"},{"type":"text","text":":https:\/\/arxiv.org\/abs\/2006.05474"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"作者介紹:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"Tanushree Shenwai,是 MarktechPost 的諮詢實習生。她目前在 Bhubaneswar 的印度理工學院攻讀理工科學士學位。她還是數據科學愛好者,對人工智能在各個領域的應用有濃厚的興趣,熱衷於探索新技術的進步及其在現實生活中的應用。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"原文鏈接:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"https:\/\/www.marktechpost.com\/2020\/11\/02\/facebook-ai-proposes-new-method-to-improve-cross-lingual-transfer-learning-for-end-to-end-speech-recognition-with-speech-translation"}]}]}
發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章