智能語音技術在字節跳動內容平臺的演進和應用實踐

{"type":"doc","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"AI 技術正在成爲輔助內容生產和傳播的“利器”。尤其隨着現在語音、文本、圖像、視頻等不同模態的信息層出不窮,以AI技術作爲“創作工具”將爲內容生產帶來新變革。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"以字節跳動爲例,字節跳動擁有全球化的內容平臺,內容形式經歷了圖文、音頻和視頻各個階段,在這個過程中,內部對智能語音技術的需求日益增強,如有聲書內容生產,短視頻中的內容審覈、自動字幕和配音功能等。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"自2017年開始重點佈局智能語音技術以來,該技術已廣泛應用在字節跳動內部的教育、視頻、小說、客服、硬件、音樂、辦公、遊戲、廣告等業務場景。實踐證明,作爲新型的生產工具,智能語音技術能夠極大地提升AI內容生產和創作領域的生產力。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"在將於 11 月 5 日 - 6日舉辦的 "},{"type":"link","attrs":{"href":"https:\/\/aicon.infoq.cn\/2021\/beijing\/presentation\/3769","title":"xxx","type":null},"content":[{"type":"text","text":"AICon "}]},{"type":"text","text":"全球人工智能與機器學習大會(北京站)2021 上,字節跳動AI-Lab 智能語音\/語音合成Leader 殷翔博士將作爲“AI與產業互聯網結合”專場的講師,"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"InfoQ有幸提前對殷翔博士進行了專訪,他詳細介紹了智能語音技術上在字節跳動的研發進展以及應用實踐,分享了智能語音賦能內容生產的思考。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"以下爲InfoQ與殷翔博士對話全文:"}]},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"字節跳動的智能語音技術佈局"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"blockquote","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"InfoQ :殷老師您好,很高興有機會採訪您,首先請您做一下自我介紹,您自何時加入字節跳動,以及目前主要負責的工作?"}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"殷翔:"},{"type":"text","text":" 我是2018年加入字節跳動"},{"type":"link","attrs":{"href":"https:\/\/mp.weixin.qq.com\/s?__biz=MzU1NDA4NjU2MA==&mid=2247493946&idx=1&sn=55301dee74e95aa8a8585aafcde5a10e&chksm=fbea50f5cc9dd9e322fede96c7e12f39d4fa5cbfe1d4bfb40ace91f3753232355c35be3b687e&scene=27#wechat_redirect","title":"xxx","type":null},"content":[{"type":"text","text":"人工智能實驗室"}]},{"type":"text","text":",負責音頻生成算法團隊,研究方向包括語音合成、聲音轉換、歌唱合成、虛擬形象。團隊研發技術落地於番茄小說、大力教育、剪映、客服機器人、聽頭條、遊戲V、行業ToB等。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"blockquote","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"InfoQ:字節跳動大概是從什麼時候開始佈局智能語音技術的?公司內部對智能語音技術的需求主要來自哪些場景?"}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"殷翔:"},{"type":"text","text":" 字節跳動是從2017年底重點投入智能語音技術的。公司對智能語音技術需求主要來自於短視頻中的內容審覈、自動字幕和配音功能、辦公軟件飛書的會議轉寫、客服外呼機器人的語音交互鏈路、教育口語評測、小說音頻內容生成、教育硬件下的語音增強、音樂消重和聽歌識曲、外部ToB需求等。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"blockquote","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"InfoQ:字節跳動如何定位智能語音技術,如何看待它在公司整體的AI佈局中所處的位置?"}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"殷翔:"},{"type":"text","text":" 字節跳動擁有全球化的內容平臺,內容形式經歷了圖文、音頻和視頻各階段,如何高效理解、創作、互動和分發內容,給AI技術帶來了機遇和挑戰。隨着深度學習和機器算力的不斷髮展,智能語音技術已經邁進了端到端時代,並藉助豐富場景下的海量數據,顯著提升了內容理解的精度、內容創作的質量。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"智能語音技術在公司整體AI佈局中扮演的角色之一是內容創作工具,例如:通過自然語言理解、語音合成和音樂生成等技術開發的有聲書內容生產,能夠將番茄小說海量網文轉成有聲書,供用戶聆聽。在短視頻方面,能夠輔助用戶通過字幕自動添加、個性化配音和濾鏡玩法創造出內容豐富的作品。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"blockquote","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"InfoQ:除了您所在的團隊(AILab),字節跳動內部還有哪些團隊在做語音技術的研究,各部門的側重點分別是什麼,又是如何協作的?"}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"殷翔:"},{"type":"text","text":" 字節跳動產品研發和工程架構部門也在做相關研究。AILab-智能語音屬於AI中臺,使命是做“大而全”的技術支持,對某些需要深入合作的業務部門,會專門派同學BP,進行“精且深”的解決方案打磨。最終我們的願景是將AI中臺能力做成定製方案,提供ToB。產品研發和工程架構部的語音部門需要集中支持所屬部門業務方向,與業務一起成長,做到BU化。在協作方面,對於共有能力,會依照業務場景進行劃分。對於差異化能力,會依照業務方的需求,形成組合方案,提供支持。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"blockquote","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"InfoQ:近期字節跳動在智能語音領域取得了哪些重要技術成果?"}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"殷翔:"},{"type":"text","text":" 在語音識別方面,通過無監督預訓練+少量有監督的技術,參加國際低資源多語言語音識別挑戰賽(MUCS21),取得多語言語音識別賽道第二名;音樂技術方面,我們參加了MIREX2020翻唱識別競賽,取得第一名,mAP領先第二名8%;語音合成上,我們發表了業界首個基於seq2seq鏈路的中文歌唱合成系統ByteSing以及搭建了seq2seq的中文前端多任務模型並用於線上業務;"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"blockquote","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"InfoQ:端到端語音識別時代已來臨,端到端識別技術近些年成爲了學術界和業界研究的熱點。目前,字節跳動在端到端識別算法的研究和應用進展如何?"}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"殷翔:"},{"type":"text","text":" 我們在RNN-T上做了不少原創性工作,包括加速RNN-T的訓練和推理,結合端雲一體進行了多項創新,目前已將該技術上線到各類業務場景中。同時,我們還在打造下一代端到端識別算法框架,並已取得了較大的進展。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"blockquote","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"InfoQ:接下來字節跳動在語音技術領域的重點研究方向是什麼;您所在的團隊接下來的規劃是?"}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"殷翔:"},{"type":"text","text":" 以語音識別和合成爲例。語音識別方向上,重點研究方向包括結合無監督預訓練提升低資源語種的識別率、結合多模態信息的場景分類和語音識別、"},{"type":"link","attrs":{"href":"https:\/\/www.infoq.cn\/article\/P1xdVuHOP7w81icp8rOD","title":"xxx","type":null},"content":[{"type":"text","text":"新一代端到端識別框架"}]},{"type":"text","text":";語音合成方向上,重點研究方向包括文本到波形的端到端聯合建模、低質少量的跨語種音色復刻、直播流式場景下的音色轉換、多模態感知型虛擬形象等。我們團隊接下來重點發展的規劃,包括多語種的視頻字幕和配音、多模態語音交互鏈路、搭建有聲內容生產平臺等。"}]},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"智能語音廣泛落地於字節跳動的內容平臺"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"blockquote","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"InfoQ:你們團隊研發的語音技術目前在字節跳動內部的哪些場景應用?公司外部,有哪些應用場景?"}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"殷翔:"},{"type":"text","text":" 團隊研發技術落地於教育、視頻、小說、客服、硬件、音樂、辦公、ToB、遊戲、廣告等內部業務場景,主要以服務調用或SDK的形式使用。對於外部場景,會通過火山引擎控制檯對外提供服務。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"blockquote","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"InfoQ:你們如何評估智能語音技術在各個場景的應用效果?"}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"殷翔:"},{"type":"text","text":" 我們會通過調用量、語音處理\/生成時長等純技術指標來衡量使用情況,同時也會通過業務側拆解出的DAU、留存、滲透時長、效率提升等指標來衡量效果。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"blockquote","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"InfoQ:在字節跳動的內容平臺由圖文-音頻-視頻不斷演進的過程中,如何看待語音技術的重要性?"}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"殷翔:"},{"type":"text","text":" 語音技術在內容形式不斷的演化過程中,可以持續通過對語音和語義的理解,來幫助平臺篩選出違禁的內容。通過機器+人工的方式,大大提升審覈的效率;同時,通過對語義的理解和語音\/圖像信號的重建,爲平臺提供豐富的不同模態內容,供用戶消費。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"blockquote","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"InfoQ:字節跳動的語音技術在有聲書合成應用場景中,有沒有一些難突破的技術點,是怎樣解決的?最終達到的朗讀效果與真人朗讀相比,還有哪些差距?"}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"殷翔:"},{"type":"text","text":" 在有聲書合成應用場景下,存在的難點主要是如何接近真人播講的效果,使得最終的合成音頻能夠體現出不同角色在不同上下文環境裏的效果。我們會通過小說篇章理解來將網文轉換成劇本,標識出每句對話是哪個角色來讀和以什麼情感來讀,再採用對應音色結合情感合成得到音頻內容。最終的朗讀效果與真人相比,差距在於無法做到根據不同上下文語境展現不同風格,只能表現出音庫單一的錄製風格。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"blockquote","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"InfoQ:針對有聲書合成,共訓練了多少種聲音,是否有根據不同(年齡)人羣的需要和興趣定製更個性化的聲音或者多(分)角色有感情朗讀?或者是否有配合不同的書籍類型去創作不同的聲音?"}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"殷翔:"},{"type":"text","text":" 針對有聲書合成共訓練了30多種聲音,我們目前根據番茄小說平臺用戶喜歡的頭部小說,歸類出頭部書中最感興趣的角色,再通過機器+人工的方式建立書中人物和音色的關係。從而讓用戶能夠享受合適的多角色情感朗讀。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"blockquote","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"InfoQ:近幾年短視頻非常火熱,單字節跳動的短視頻平臺就擁有數億日活的用戶,每天會產生數量龐大的短視頻,在針對短視頻的二次智能創作方面,語音技術具體會進行哪些創作?用戶的反饋效果如何?"}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"殷翔:"},{"type":"text","text":" 語音技術會進行字幕添加、文字配音和模版玩法等,從而提升視頻的豐富度。這很大程度上促進了用戶的投稿率,成爲視頻工具不可或缺的功能。"}]},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"智能語音技術賦能內容生產: 差異化、效果優、迭代快、低成本是未來趨勢"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"blockquote","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"InfoQ:與以前的視頻化相比,現在已經進入了超視頻化時代,內容在更多地往視頻演進。超視頻時代的來臨,對智能語音技術在視頻場景的應用帶來了哪些機會和挑戰?"}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"殷翔:"},{"type":"text","text":" 機會在於我們可以面向廣大用戶羣體提供豐富的內容創作工具,挑戰則在於我們需要更理解用戶,分析清楚採用何種功能能夠激發他們的創作興趣。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"blockquote","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"InfoQ:字節跳動的智能語音技術,從研究方向和落地應用來看,有哪些獨特的優勢?"}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"殷翔:"},{"type":"text","text":" 我們的研究方向是結合實際落地需求和前沿學術趨勢逐漸形成的,一切均圍繞着如何將AI技術更好、更快、成本更低地落地於實際場景中來。針對落地應用,我們會BP到不同的業務部門,與他們的業務指標對齊,再拆解成技術指標跟進。因此,AI中臺既能夠直接拿到業務收益,業務目標也可以與AI技術目標同步。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"blockquote","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"InfoQ:據您判斷,接下來智能語音技術在AI內容生產和創作領域還有哪些發展趨勢?"}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"殷翔:"},{"type":"text","text":" 智能語音技術作爲生產工具,是可以極大提升AI內容生產和創作領域的生產力的。未來行業內,一定是會圍繞着差異化、效果優、迭代快、低成本等方向持續發展,藉助技術的提升,不斷推動AI產業化和規模化的進步。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"採訪嘉賓介紹:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"殷翔博士,字節跳動AI-Lab 智能語音\/語音合成Leader。2011年畢業於中國科學技術大學電子工程與信息科學系,2016於本校語音及語言信息處理國家工程實驗室獲得博士學位,研究方向爲語音合成中的神經網絡聲學建模方法研究。2018年加入字節跳動人工智能實驗室,負責音頻生成算法團隊,研究方向包括語音合成、聲音轉換、歌唱合成、虛擬形象。團隊研發技術落地於番茄小說、大力教育、剪映、客服機器人、聽頭條、遊戲V、行業ToB等。在各類國際語音會議和刊物中發表論文13篇,國內外專利10餘篇。"}]}]}
發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章