語音情緒識別和語音識別等語音處理任務的語音數據振幅歸一化方法 how to normalize the amplitude of audio with python

網上找了一圈都沒有找到比較靠譜的方法, 有一篇文章提到用什麼do_pcm工具, 但是全網都沒有相關的內容, 這裏貼上一個調用pydub.effects.normalize方法進行振幅歸一化的方法. 方便後面的同學.

主要思想
取一段語料中幅度最大的點將其幅度拉大到接近1，記錄拉大的比例，再將其他所有點均按這個比例拉伸。
pydub.effects.normalize源碼

@register_pydub_effect
def normalize(seg, headroom=0.1):  # 傳入一個pydub的AudioSegment對象<class 'pydub.audio_segment.AudioSegment'>
    """
    headroom is how close to the maximum volume to boost the signal up to (specified in dB)
    headroom是多遠接近最大音量(振幅)以提升信號（以dB爲單位）
    """
    peak_sample_val = seg.max  # 計算傳入的sound的最大振幅作爲 峯值樣本振幅值
    
    # if the max is 0, this audio segment is silent, and can't be normalized
    # 如果最大值爲0，則此音頻段是靜默的，無法標準化 直接返回seg就好
    if peak_sample_val == 0:
        return seg
    
    target_peak = seg.max_possible_amplitude * db_to_float(-headroom)  # 目標峯值 = seg的最大可能振幅 * 轉化成浮點數的理論最大振幅

    needed_boost = ratio_to_db(target_peak / peak_sample_val)  # 用 目標峯值/峯值樣本振幅值 得到 浮點形式的音頻縮放比例 再用ratio_to_db方法把這個值從浮點單位轉成dB單位
    return seg.apply_gain(needed_boost)  # 調用apply_gain對seg執行needed_boost尺度的全局增益

python調用pydub.effects.normalize進行振幅歸一化的方法

from pydub import effects
_sound = AudioSegment.from_file("./input.wav", "wav")
sound = effects.normalize(_sound)
sound.export("./output.wav", format="wav")

實際效果

before normalized input.wav

after normalized output.wav

發表評論

所有評論

還沒有人評論，想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.

語音情緒識別和語音識別等語音處理任務的語音數據振幅歸一化方法 how to normalize the amplitude of audio with python

釘釘打卡速度慢

使用neovim打造go ide(支持代碼跳轉, 代碼補全, 實時語法檢查)

Nginx R31 doc 官方文檔-01-nginx 如何安裝

Python 潮流週刊#51：用 Python 繪製美觀的圖表

Qt/C++音視頻開發74-合併標籤圖形/生成yolo運算結果圖形/文字和圖形合併成一個/水印濾鏡

挑戰程序設計競賽 2.2章習題 POJ - 3617 Best Cow Line 貪心

字節面試：MySQL什麼時候鎖表？如何防止鎖表？

.NET8連接SQL SERVER 2008 R2 報：證書鏈是由不受信任的頒發機構頒發的

golang開發環境搭建(win10)

python計算機視覺學習筆記——PIL庫的用法

語音情緒識別和語音識別等語音處理任務的語音數據振幅歸一化方法 how to normalize the amplitude of audio with python

騰訊雲網站備案域名解析 SSL證書申請 Ubuntu16.04下爲Nginx服務器配置SSL證書

keras load model的時候，報錯('Keyword argument not understood:', u'******')如何解決

Win10下配置Visual Studio Code的Python開發環境

vs code顯示縮進

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結

語音情緒識別和語音識別等語音處理任務的語音數據 振幅歸一化方法 how to normalize the amplitude of audio with python

語音情緒識別和語音識別等語音處理任務的語音數據振幅歸一化方法 how to normalize the amplitude of audio with python