聲音的數字形式

數字聲音
聲壓和分貝
頻率聲音變化的速度
聲音(wave)的讀寫播放
反向播放聲音
播放純音: $sin(t), cos(t)....$
方波
合併聲音
添加噪聲
八度
Karplus-Strong發出吉他的聲音
聲音的降採樣

Digital Sound

數字聲音是一個序列 $\pmb x = \{x_i\}_{i=0}^{N-1}$ , 以固定的速率 $f_s$ 對聲音進行記錄.
$x_k = f(k/f_s), for : k = 0, 1, 2, .... , N-1$
$f_s$ : 採樣頻率(1s中記錄聲音信號的次數)
$x_k$ : 採樣
$T_s$ : 採樣週期
$bit rate$ : 每秒記錄採樣的二進制數字的位數

電話

採樣頻率:8000/s
聲音採樣:8bit
bit rate : $8 \times 8000 = 64000 bits$ , 64kb/s

CD

採樣頻率:44100/s
聲音採樣:16bit
bit rate : 立體聲, $44100 \times 2 \times 16bits/s = 1411.2kb/s$

其他格式

DVD, DVD-video, DVD-audio, Super Audio CD
採樣頻率 : 192000/s
採樣: 最大24bit
通道: 最大7

Sound Pressure and Decibel(聲壓和分貝)

$L_p = 10 log_{10}\left(\frac{p^2}{p_{ref}^2}\right) = 20 log_{10}\left(\frac{p}{P_{ref}}\right)$

$p$ : 聲壓
$p_{ref}$ : 恰好可以感知到聲音的聲壓,通常爲0.00002 $P_a$

頻率(聲音變化的速度,非聲音採樣頻率)

Periodic Function(周期函數)
- $f(t+T) = f(t)$ , 週期: $T$
Frequency(頻率)
- example : $f(t) = sin(2\pi vt)$ (pure tone)
- $v$ : 頻率( $Hz$ )人類感知聲音變化的頻率在20~20000之間
- $T$ : $\frac{1}{v}$
- $cos(2\pi vt) = sin(2\pi vt + \pi / 2)$
- $e^{\pm2\pi ivt} = cos(2\pi vt) \pm i cos(2\pi vt)$ (pure tone)

import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline

sin(t)

cos(t)

聲音文件(wave)讀,寫,播放

1. 讀wave文件

import wave

def audioread(filename):
    """
    return sounds frames and framerate
    """
    max_amplitude = 2**15-1 # 最大振幅
    ifile = wave.open(filename)
    channels = ifile.getnchannels()
    fs = ifile.getframerate()
    frames = ifile.getnframes()
    x = ifile.readframes(frames) # frames array
    x = np.frombuffer(x, dtype=np.uint16) 
    x=x.astype(np.int16)
    x=x.astype(float)/max_amplitude
    soundx = x
    if channels > 1:
        soundx = x.reshape( ( int(len(x)/channels), channels ) )
    return soundx, fs

# read audio file
#frames_array , fs = audioread('../sounds/castanets.wav')

2. 寫wave文件

import wave
import numpy as np

def audiowrite(filename, x, fs):
    """
    filename : 保存文件的位置
    x : np.array()
    fs : 幀率
    """
    ofile = wave.open(filename, 'w')
    ofile.setsampwidth(2)
    ofile.setframerate(fs)
    if x.ndim == 1:
        ofile.setnchannels(1)
    else:
        m, n = x.shape
        ofile.setnchannels(n)
        x=x.flatten()
    #x=max_amplitude*x
    x = x*(2**15-1)
    x=x.astype(np.int16)
    x=x.astype(np.uint16)
    ofile.writeframesraw(x.tostring())
    ofile.close()

# wirte audio file
#audiowrite('../sounds/sin2.wav', np.sin(2*np.pi*2*np.arange(0, 16000, 0.1)), fs=8000)

3. 播放wave文件

import os
import sys
import subprocess

def get_status_output(*args, **kwargs):
    p = subprocess.Popen(*args, **kwargs)
    stdout, stderr = p.communicate()
    return p.returncode, stdout, stderr
    
def play_audio(filename):
    # filename : wave文件的位置
    platform = sys.platform
    if platform[:5] == 'linux':
        open_commands = ['gnome-open', 'kmfclient exec', 'exo-open', 'xdg-open', 'open']
        for cmd in open_commands:
            status, output, err = get_status_output([cmd, filename])
            if status == 0:
                break
        if status != 0:
            print('Unable to open sound file.')
    elif platform[:3] == 'win':
        status = os.path.exists(filename)
        if status:
            os.system('start %s' %filename)
        else:# windows 
            print('Unable to find sound file')

# play audio
#play_audio('../sounds/castanets.wav')

反向播放聲音

正向: $\pmb x = (x_i)_{i=0}^{N-1}$
反向: $\pmb y = (x_{N-i-1})_{i=0}^{N-1}$

播放純音 $sin(2\pift)$

$f$ : 聲音頻率
$f_s$ : 聲音的採樣頻率,幀率

f_440 = 440
f_1500 = 1500
fs = 8000   #幀率
sin_440 = np.sin(2*np.pi*f_440*np.linspace(0, 5, fs*5))
sin_1500 = np.sin(2*np.pi*f_1500*np.linspace(0, 5, fs*5))

方波(Square Wave)

$f_s(t)=\left\{ \begin{array}{lr} 1, 0 <= t < \frac{T}{2} \\ -1, \frac{T}{2} <= t < T \\ \end{array} \right.$

fs = 8000 # 每秒採樣的次數
f = 440   # 頻率
# fs / f = 一個週期內聲音被採樣的數量
samples_one_T = fs/f
square_wave = np.tile(np.concatenate([np.ones(int(samples_one_T/2), dtype=float),
                                      -np.ones(int(samples_one_T/2),dtype=float)]), f*3)

plt.plot(square_wave[:100])

三角波(traingle wave)

$f_s(t)= \begin{cases} \frac{4t}{T}-1, & \text{if $0\leq t < T/2$}; \\ 3-\frac{4t}{T}, & \text{if $T/2 \leq t < T$}. \end{cases}$

fs = 8000 # 每秒採樣的次數
f = 440   # 頻率
# fs / f = 一個週期內聲音被採樣的數量
samples_one_T = fs/f
traingle_wave = np.tile(np.concatenate([np.linspace(-1, 1, int(samples_one_T/2), dtype=float),
                                       np.linspace(1, -1, int(samples_one_T/2),dtype=float)]), f*3)

plt.plot(traingle_wave[:100])

合併兩種聲音

$f(t) = asin(2\pi f_1 t) + bsin(2\pi f_2 t)$
- $f_1$ : 440
- $f_2$ : 4400
$f(t) = sin(t)+cos(t)$
$f(t) = square_wave + traingle_wave$

在聲音中加入隨機噪聲

給聲音去噪很有挑戰性,但是加噪聲還是很easy的

z = x + c*(2*np.random.random(shape(x))-1)

$c$ : 抑制噪聲的常數[0 ~ 1]之間

original_sounds , fs = audioread('../sounds/castanets.wav')

add_noise_sounds = original_sounds + 0.03*(2*np.random.random(original_sounds.shape[0])).reshape(-1, 1)
add_noise_sounds /= abs(add_noise_sounds).max()

plt.figure(figsize=(10, 4))
plt.plot(add_noise_sounds[:,0],alpha=0.7, label='original')
plt.plot(original_sounds[:,0],alpha=0.8, label='add noise')
plt.legend(loc=1)

八度Octave

$\frac{f_{12}}{f_0} = 2$ 兩兩相鄰兩個音之間的頻率比值相等

$\frac{f_1}{f_0}=\frac{f_2}{f_1}=\cdots=\frac{f_{12}}{f_{11}}=2^{1/12}.$

f_s = 44100
num_sec = 6
k = 2**(1/12)
f = 440
t = np.linspace(0, num_sec, f_s*num_sec)
sounds = {}
for s in range(13):
    frames = np.sin(2*np.pi*f*t)
    sounds['f%d'%s] = frames
    audiowrite('../sounds/puretones_%d.wav'%s, frames, fs)
    f *= k

Karplus-Strong發出吉他的聲音

隨機初始化輸入的序列: $[x_0,...,x_p] \in (-1, 1)$

$x_{n+p+1}-\frac{1}{2}(x_{n+1}+x_n) = 0$

def karplus_strong(x_init, f_s):
    p=np.shape(x_init)[0] -1
    num_sec = 10
    num_samples = f_s*num_sec
    z = np.zeros(num_samples)
    z[0:(p+1)] = x_init
    for k in np.arange(p+1, num_samples):
        z[k] = 0.5*(z[k-p]+z[k-p-1])
    return z , f_s 
    
p = 100
f_s = 44100
x_init = 2*np.random.random(p + 1) - 1

frames, fs = karplus_strong(x_init, f_s)

聲音的降採樣

在一個滑動窗口內對音頻進行最大最小值採樣

def plot_frames(frames):
    plt.figure(figsize=(14, 4))
    plt.subplot(1, 2, 1)
    plt.plot(frames)
    #plt.ylim(0, 1)
    plt.title('Waveform')
    plt.subplot(1, 2, 2)
    plt.hist(frames)
    plt.title('distplot')
    
def sampling(frames, windos_size=10, step=1):
    new_frames = []
    si = np.arange(0, frames.shape[0]-windos_size, step=step)
    ei = si + windos_size
    for sindex, eindex in zip(si, ei):
        temp_index = np.argmax(np.abs(frames[sindex : eindex]))
        new_frames.append(frames[sindex : eindex][temp_index])
    return np.array(new_frames)

s1, fs = audioread('../sounds/castanets.wav') 
s2 = sampling(s1[:,0], 3, 2)

採樣前後聲音長度和分佈的變化

雙通道
單通道

1.聲音的數字形式

聲音的數字形式

Digital Sound

電話

CD

其他格式

Sound Pressure and Decibel(聲壓和分貝)

頻率(聲音變化的速度,非聲音採樣頻率)

聲音文件(wave)讀,寫,播放

1. 讀wave文件

2. 寫wave文件

3. 播放wave文件

反向播放聲音

播放純音 $sin(2\pift)$

方波(Square Wave)

三角波(traingle wave)

合併兩種聲音

在聲音中加入隨機噪聲

八度Octave

Karplus-Strong發出吉他的聲音

聲音的降採樣

採樣前後聲音長度和分佈的變化

安利一下,公衆號(可聽本文音頻)

PDManer [元數建模]-v4.9.0 發佈：一款簡單好用的數據庫建模平臺

使用neovim打造go ide(支持代碼跳轉, 代碼補全, 實時語法檢查)

sql求連續值問題

cs01 CSS Syntax

挑戰程序設計競賽 2.3章習題 poj 3046 Ant Counting

[MASM拾遺]Offset僞指令

h30 HTML Layout Elements

瞭解顯卡

一款基於C#開發的通訊調試工具（支持Modbus RTU、MQTT調試）

Linux/Golang/glibC系統調用

PySpark-Recipes : I/O操作（txt, json, hdfs, csv...）

3.Python data types

線性代數 : 矩陣消元

OpenCV : 仿射變換

PySpark : Structured Streaming

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結

1.聲音的數字形式

聲音的數字形式

Digital Sound

電話

CD

其他格式

Sound Pressure and Decibel(聲壓和分貝)

頻率(聲音變化的速度,非聲音採樣頻率)

聲音文件(wave)讀,寫,播放

1. 讀wave文件

2. 寫wave文件

3. 播放wave文件

反向播放聲音

播放純音sin(2π∗f∗t)sin(2\pi*f*t)sin(2π∗f∗t)

方波(Square Wave)

三角波(traingle wave)

合併兩種聲音

在聲音中加入隨機噪聲

八度Octave

Karplus-Strong發出吉他的聲音

聲音的降採樣

採樣前後聲音長度和分佈的變化

安利一下,公衆號(可聽本文音頻)

播放純音 $sin(2\pift)$