復現END-TO-END CODE-SWITCHED TTS WITH MIX OF MONOLINGUAL RECORDINGS論文, 理解以及代碼, 以及實驗結果.

原創

2020-07-08 07:08

Show us the samples please? By the way, you had better change the mel loss function into MAE and watch the alignment again.

These plots show that BahdanauMonotonic Attention is better.

What are the advantages of Location Sensitive Attention?

Maybe it is better to let the network learn without any monotonic pressure. However https://arxiv.org/abs/1803.09047 claims to use GMM on Tacotron and obtain better results, especially for longer sequences.

do you have a change related to guided attention?

I am thinking use phone duration information to generate the guided attention for training; 對, 只提供"參考價值", 不用完全相信. 設計網絡.

can you provide the code for the GMM attention? I cannot find a working version that gives good alignments anywhere.

I don't have it either anymore. I totally ditched it. You can pick that out from "voice loop" repo.

FORWARD ATTENTION IN SEQUENCE-TO-SEQUENCE ACOUSTIC MODELING FOR SPEECH SYNTHESIS

https://github.com/geneing/WaveRNN-Pytorch Fast WaveRNN

https://github.com/mozilla/TTS/blob/master/notebooks/Benchmark.ipynb

“On-line and linear-time attention by enforcing monotonic align-ments,

機器學習中，是否有給注意力機制加先驗的工作或者特殊的初始化方法？

如題，有些問題裏注意力有比較明顯的規律，例如機器翻譯中有些語言對的語序基本一致，這時候能否給注意力讀寫頭加入適當的先驗，讓網絡快速收斂？

自問自答一下，因爲今天突然看到一篇文章，已被 ICML 2017 接受：

Online and Linear-Time Attention by Enforcing Monotonic Alignments

去搜索這個的名字, 可能能找到對應的結構. (1)

大意是用拋硬幣的方法決定要不要繼續往後走，每次只選一個 encoder 的狀態來做 context，從而實現 attention 從前往後只走一遍

先近似使用:

注意力有content-based和location-based兩種，我覺得location-based很像你說的先驗。

參考：http://papers.nips.cc/paper/58

開始寫代碼: LDE

determined by the language boundary information in the CS text.

performing discriminative code lookup 對於 speaker id來說, 先近似實現, 是不是有可以差異化初始化或者查詢的方法?

This design enables the gen-erated speech in a single speaker’s voice. The language embedding and discriminative embedding are jointly learned with the model by back-propagation. 這個也是一個切入點.

The discriminative embedding is obtained by performing discriminative code lookup, and is concate-nated with previous time-step decoder output and context informa-tion before being sent to decoder RNN. 這一點原版論文和大家理解的是不一樣的, 這一版代碼跑的是原版的Tacotron-2, 而不是微軟理解的Tacotron-2.