python實現歌曲相似度比較
2019/9/20
最近學信號與系統,想着弄個小項目來提高學習興趣。特此記錄一下。
這是大概想到的準備工作,一邊推進,一邊學吧!!!
2019/9/21
頻域信號處理
FFT變換所得的複數的含義:
- 下標爲0的實數表示時域信號的直流部分
- 下標爲i的複數爲a+bj表示時域信號中週期爲N/i個取樣值的正弦波和餘弦波的成分,其中a表示餘弦波形的成分,b表示正弦波形的成分
- 複數的模的兩倍爲對應頻率的餘弦波的振幅
- 複數的輻角表示對應頻率的餘弦波的相位
import numpy as np
from scipy.fftpack import fft, ifft
import matplotlib.pyplot as plt
from matplotlib.pylab import mpl
x = np.arange(0, 2*np.pi, 2*np.pi/128)
y = 0.3*np.cos(x) + 0.5*np.cos(2*x+np.pi/4) + 0.8*np.cos(3*x-np.pi/3) + np.sin(4*x) + np.cos(x)
yf = fft(y)/len(y)
print(np.array_str(yf[:5], suppress_small=True))
for ii in range(0, 5):
print(np.abs(yf[ii]), np.rad2deg(np.angle(yf[ii])))
運行上述程序可以觀察得到以上結論
合成時域信號
需要着重解釋的是多個餘弦信號合成任意時域信號的過程:
FFT轉換得到的N個複數組成的數組A,表示第個子信號,其中的子信號表示直流信號,且表示直流信號的振幅。
利用前個自信號合成過程用數學表達式表示:
代碼如下所示
import numpy as np
from scipy.fftpack import fft, ifft
import matplotlib.pyplot as plt
from matplotlib.pylab import mpl
mpl.rcParams['font.sans-serif'] = ['SimHei'] #顯示中文
mpl.rcParams['axes.unicode_minus'] = False #顯示負號
# x = np.arange(0, 2*np.pi, 2*np.pi/128)
# y = 0.3*np.cos(x) + 0.5*np.cos(2*x+np.pi/4) + 0.8*np.cos(3*x-np.pi/3) + np.sin(4*x) + np.cos(x)
# yf = fft(y)/len(y)
# print(np.array_str(yf[:5], suppress_small=True))
# for ii in range(0, 5):
# print(np.abs(yf[ii]), np.rad2deg(np.angle(yf[ii])))
def triangle_wave(size):
x = np.arange(0, 1, 1.0/size)
y = np.where(x < 0.5, x, 0)
y = np.where(x >= 0.5, 1-x, y)
return x, y
###
def fft_comnbine(bins, n, loops):
length = len(bins)*loops
data=np.zeros(length)
index=loops*np.arange(0, 2*np.pi, (2*np.pi)/length)
for k, p in enumerate(bins[:n]):
if k != 0:
p *= 2
###合成時域信號的過程
data += np.real(p)*np.cos(k*index)
data -= np.imag(p)*np.sin(k*index)
return index, data
fft_size = 256
###對三角波進行FFT
x, y = triangle_wave(fft_size)
fy = fft(y)/fft_size
loops = 4
y = np.tile(y, (1, loops))
print(y.shape)
y.shape = (fft_size*loops, )#畫圖python的特殊癖好
###
fig, axes = plt.subplots(2, 1, figsize=(8, 6))
eps = 1e-5
# axes[0].plot(np.clip(20*np.log10(np.abs(fy[:20])+eps), -120, 120), "o")
axes[0].plot(np.abs(fy[:20]), "o")
axes[0].set_xlabel(u"頻率窗口(frequency bin)")
axes[0].set_ylabel(u"幅值(dB)")
axes[1].plot(y, label=u"原始三角波", linewidth = 2)
for ii in [0, 1, 3, 5, 7, 9]:
index, data = fft_comnbine(fy, ii+1, loops)
axes[1].plot(data, label="N=%s" % ii, alpha=0.6)
print(index[:20])
axes[1].legend(loc="best")
plt.show()
理論部分後面學了信號與系統在深究吧
哈——現在的我已經學完信號與系統了,回覆幾個一開始學習遇到的問題。
-
複數的模的兩倍爲什麼對應頻率的餘弦波的振幅?
ans:若爲實信號,那麼
由傅里葉變換:
得到推導過程中運用了歐拉公式使得餘弦波的振幅乘上1/2,而相位不變。 -
爲什麼週期爲N的離散信號,它的傅里葉變換的週期也是N?
ans:這個就是離散信號的傅里葉變換的週期性。可以在奧本海默的信號與系統的傅里葉性質表看到。
順便提及奈奎斯特頻率:
在採樣定理中,採樣頻率必須大於,這個j就稱作奈奎斯特頻率,目的是防止採樣信號的頻率防止重疊,其中爲原始信號的頻域的最大值。
利用pydub和ffmpeg處理音頻
寫在前面:
RuntimeWarning: Couldn’t find ffmpeg or avconv - defaulting to ffmpeg, but may not work 解決辦法——ffmpeg的bin 目錄添加到path變量裏,注意是path變量而不僅僅是簡單的加到系統變量中!!!然後重啓。
一、將mp3轉換爲wav格式,並將歌曲劃分爲幾個部分
說在前面:
1.將歌曲劃分爲幾部分主要是爲了將特徵的時間順序體現出來
2. wav:非壓縮文件格式。
3.mp3:壓縮文件格式。
代碼如下:
tail, track = os.path.split(mp3_path)
song_name = track.split('.')
wav_path = os.path.join(tail, 'w_session', song_name[0]+'.wav')
sound = AudioSegment.from_file(mp3_path, format='mp3')
sound.export(wav_path, format='wav')
獲取wav文件信息:
w = wave.open(wav_path)
params = w.getparams()
print(params)
#聲道數、量化位數(byte)、採樣頻率、採樣點數
nchannels, sampwidth, framerate, nframes = params[:4]
t = np.arange(0, nframes)*(1/framerate)#文件時間
strData = w.readframes(nframes)#讀取音頻,字符串格式
waveData = np.fromstring(strData,dtype=np.int16)#將字符串轉化爲int
#waveData = waveData*1.0/(max(abs(waveData)))#wave幅值歸一化
waveData = np.reshape(waveData,[nchannels, nframes])#雙通道數
劃分歌曲:
for ii in range(nchannels):
for jj in range(0, 4):
end_time = start_time + chunk[jj]
blockData = waveData[0, start_time*framerate:end_time*framerate]
start_time = end_time
二、音頻特徵提取
按照處理空間區分
- 要提取的特徵 詳情請點擊:
時域特徵:
線性預測係數、過零率
頻域特徵:
Mel係數、LPC倒頻譜系數、熵特徵、光譜質心
時頻特徵:
小波係數 - TOOLS:pyAudioAnalysis
下載以及安裝方法:安裝方法
個人感覺這個工具包滿新的,github上有各種issues。issues詳見
同時有一篇論文有對這個工具包有詳細的描述:論文
下面摘抄一部分:
Feature Extraction
Audio Features
- the audio signal is first divided into short-term windows (frames) and for each frame all 34 features are calculated. This results in a sequence of short-term feature vectors of 34 elements each. Widely accepted short-term window sizes are 20 to 100 ms.
- Typical values of the mid-term segment size can be 1 to 10 seconds.
- In cases of long recordings (e.g. music tracks) a long-term averaging of the mid-term features can be applied so that the whole signal is represented by an average vector of mid-term statistics.
- Extract mid-term features and long-term averages in order to produce one feature vector per audio signal.
三、計算相似矩陣
論文中提到:A similarity matrix is computed based on the cosine distances of the individual feature vectors.
但是在實際操作的過程中發現不同特徵的量綱不同,導致用餘弦相似度來計算特徵相似度不準確。例:
7 | 8 | 9 |
---|---|---|
2.62281350727428e-10 | 0 | -50.5964425626941 |
2.29494356256208e-11 | 0 | -50.5964425626941 |
4.55467645371887e-11 | 0 | -50.5964425626941 |
所以我決定計算不同特徵的相對比值,然後取平均值。
def similarity(v1, v2):
# 計算平均相似度
temp = []
sim = []
p = 0
q = 1
for ii in range(v1.shape[0]):
for jj in range(v1.shape[1]):
if v1[ii, jj]!=0 or v2[ii, jj]!=0 :
temp.append((1 -
abs(v1[ii, jj]-v2[ii, jj])/max(abs(v1[ii,
jj]),abs(v2[ii, jj]))))
q += 1
sim.append(np.mean(temp[p:q]))
p = q
print(sim)
return sim
此外可以嘗試馬氏距離——參考文章:
下文爲部分摘抄。
-
使用場景:
1、度量兩個服從同一分佈並且其協方差矩陣爲C的隨機變量X與Y的差異程度
2、度量X與某一類的均值向量的差異程度,判別樣本的歸屬。此時,Y爲類均值向量. -
馬氏距離的優缺點:
優點:量綱無關,排除變量之間的相關性的干擾
缺點:不同的特徵不能差別對待,可能誇大弱特徵
四、減少運行代碼的時間
之前將歌曲劃分爲等差序列的長度demo,可計算一個片段的特徵就要好久。我等不下去,所以決定想法子降低複雜度。我想到兩個辦法:
- 在原來等差序列的片段上隨機選取四秒片段,計算特徵相似度。如果大於0.5,那麼在計算完整片段的特徵相似度。
- 將原來採樣頻率44.1kHz縮小四倍
k = 4
ii = 0
w_decrease = [[], []]
# 降低音頻分辨率
for kk in (0, 1):
while ii < len(w[:, kk]):
if ii + k < len(w[:, kk]):
w_decrease[kk].append(np.mean(w[ii:ii+k, kk]))
else:
w_decrease[kk].append(np.mean(
w[ii:len(w[:, kk])+1, kk]))
ii = ii + k
w = w_decrease
五、完整代碼
# -*- coding: utf-8 -*-
"""
Created on Fri Jan 10 21:51:38 2020
@author: yoona
"""
import os
import sys
import wave
import numpy as np
#import struct
from pydub import AudioSegment
import matplotlib.pyplot as plt
from pyAudioAnalysis import audioFeatureExtraction as afe
import eyed3
import random
import math
def Features(path, mode):
x = wave.open(path)
params = x.getparams()
print(params)
if params[0] != 2:
raise ValueError('通道數不等於2')
strData = x.readframes(params[3])
w = np.frombuffer(strData, dtype=np.int16)
w = np.reshape(w,[params[3], params[0]])
k = 4
ii = 0
w_decrease = [[], []]
if mode == 'second':
# 降低音頻分辨率
for kk in (0, 1):
while ii < len(w[:, kk]):
if ii + k < len(w[:, kk]):
w_decrease[kk].append(np.mean(w[ii:ii+k, kk]))
else:
w_decrease[kk].append(np.mean(
w[ii:len(w[:, kk])+1, kk]))
ii = ii + k
w = w_decrease
eigen_vector_0 = afe.mtFeatureExtraction(
w[:, 0], params[2],30.0, 30.0, 2, 2)
eigen_vector_1 = afe.mtFeatureExtraction(
w[:, 1], params[2],30.0, 30.0, 2, 2)
return eigen_vector_0, eigen_vector_1
def read_wave(wav_path):
w = wave.open(wav_path)
params = w.getparams()
# print(params)
# 聲道數、量化位數(byte)、採樣頻率、採樣點數
nchannels, sampwidth, framerate, nframes = params[:4]
# 文件時間
t = np.arange(0, nframes)*(1/framerate)
strData = w.readframes(nframes)#讀取音頻,字符串格式
waveData = np.frombuffer(strData, dtype=np.int16)#將字符串轉化爲int
waveData = waveData*1.0/(max(abs(waveData)))#wave幅值歸一化
waveData = np.reshape(waveData,[nframes, nchannels])#雙通道數
# plot the wave
plt.figure()
plt.subplot(4,1,1)
plt.plot(t,waveData[:, 0])
plt.xlabel("Time(s)")
plt.ylabel("Amplitude")
plt.title("Ch-1 wavedata")
plt.grid('on')#標尺,on:有,off:無
plt.subplot(4,1,3)
plt.plot(t,waveData[:, 1])
plt.xlabel("Time(s)")
plt.ylabel("Amplitude")
plt.title("Ch-2 wavedata")
plt.grid('on')#標尺,on:有,off:無
plt.show()
def similarity(v1, v2):
# 計算平均相似度
temp = []
sim = []
p = 0
q = 1
for ii in range(v1.shape[0]):
for jj in range(v1.shape[1]):
if v1[ii, jj]!=0 or v2[ii, jj]!=0 :
temp.append((1 -
abs(v1[ii, jj]-v2[ii, jj])/max(abs(v1[ii, jj]),abs(v2[ii, jj]))))
q += 1
sim.append(np.mean(temp[p:q]))
p = q
print(sim)
return sim
def compute_chunk_features(mp3_path):
# =============================================================================
# 計算相似度第一步
# =============================================================================
# 獲取歌曲時長
mp3Info = eyed3.load(mp3_path)
time = int(mp3Info.info.time_secs)
print(time)
tail, track = os.path.split(mp3_path)
# 創建兩個文件夾
dirct_1 = tail + r'\wavSession'
dirct_2 = tail + r'\wavBlock'
if not os.path.exists(dirct_1):
os.makedirs(dirct_1)
if not os.path.exists(dirct_2):
os.makedirs(dirct_2)
# 獲取歌曲名字
song_name = track.split('.')
# 轉換格式
wav_all_path = os.path.join(tail, song_name[0]+'.wav')
sound = AudioSegment.from_file(mp3_path, format='mp3')
sound.export(wav_all_path, format='wav')
read_wave(wav_all_path)
# 劃分音頻
gap = 4
diff = time/10 - 8
start_time = 0
end_time = math.floor(diff)
vector_0 = np.zeros((10, 68))
vector_1 = np.zeros((10, 68))
info = []#記錄片段開始時間點
for jj in range(5):
wav_name = song_name[0]+str(jj)+'.wav'
wav_path = os.path.join(tail, 'wavSession', wav_name)
# 隨機產生四秒片段
rand_start = random.randint(start_time, end_time)
blockData = sound[rand_start*1000:(rand_start+gap)*1000]
## 音頻切片,時間的單位是毫秒
# blockData = sound[start_time*1000:end_time*1000]
blockData.export(wav_path, format='wav')
eigVector_0, eigVector_1 = Features(wav_path, [])
print(jj)# 標記程序運行進程
# 得到一個片段的特徵向量
vector_0[jj, :] = np.mean(eigVector_0[0], 1)
vector_1[jj, :] = np.mean(eigVector_1[0], 1)
# 迭代
diff = diff + 4
info.append((start_time, end_time))
start_time = end_time
end_time = math.floor(start_time + diff)
# 承上啓下
end_time = start_time
for kk in range(5, 10):
# 迭代
diff = diff - 4
info.append((start_time, start_time + diff))
start_time = end_time
end_time = math.floor(start_time + diff)
wav_name = song_name[0]+str(kk)+'.wav'
wav_path = os.path.join(tail, 'wavSession', wav_name)
# 隨機產生四秒片段
rand_start = random.randint(start_time, end_time)
blockData = sound[rand_start*1000:(rand_start+gap)*1000]
# blockData = sound[start_time*1000:end_time*1000]
blockData.export(wav_path, format='wav')
eigVector_0, eigVector_1 = Features(wav_path, [])
print(kk)#標記程序運行進程
# 得到一個片段的特徵向量
vector_0[kk, :] = np.mean(eigVector_0[0], 1)
vector_1[kk, :] = np.mean(eigVector_1[0], 1)
return vector_0, vector_1, info# 雙通道各自的特徵向量
def Compute_Bolck_Features(info, mp3_path):
# =============================================================================
# 計算相似度第二步
# =============================================================================
# 獲取歌曲時長
mp3Info = eyed3.load(mp3_path)
time = int(mp3Info.info.time_secs)
print(time)
# 獲取歌曲名字
tail, track = os.path.split(mp3_path)
song_name = track.split('.')
# 轉換格式
sound = AudioSegment.from_file(mp3_path, format='mp3')
vector_0 = np.zeros((len(info), 68))
vector_1 = np.zeros((len(info), 68))
for kk in range(len(info)):
# 獲取歌曲完整片段的特徵
wav_name = song_name[0]+str(kk)+'.wav'
wav_path = os.path.join(tail, 'wavBlock', wav_name)
# 截取完整片段
blockData = sound[info[kk][0]*1000:info[kk][1]*1000]
blockData.export(wav_path, format='wav')
eigVector_0, eigVector_1 = Features(wav_path, 'second')
print(kk)#標記程序運行進程
# 得到一個片段的特徵向量
vector_0[kk, :] = np.mean(eigVector_0[0], 1)
vector_1[kk, :] = np.mean(eigVector_1[0], 1)
return vector_0, vector_1
def file_exists(file_path):
if os.path.splitext(file_path) == '.mp3':
if os.path.isfile(file_path):
return file_path
else:
raise TypeError('文件不存在')
else:
raise TypeError('文件格式錯誤,後綴不爲.mp3')
if __name__ == '__main__':
#for path, dirs, files in os.walk('C:/Users/yoona/Desktop/music_test/'):
# for f in files:
# if not f.endwith('.mp3'):
# continue
# 把路徑組裝到一起
#path = r'C:\Users\yoona\Desktop\musictest'
#f = 'CARTA - Aranya (Jungle Festival Anthem).mp3'
#mp3_path = os.path.join(path, f)
# =============================================================================
# sa_b:a表示歌曲的序號,b表示歌曲的通道序號
# =============================================================================
#s1_0, s1_1, info1= compute_chunk_features(mp3_path)
# path_1 = file_exists(sys.argv[1])
# path_2 = file_exists(sys.argv[2])]
path_1 = r'C:\Users\yoona\Desktop\music\薛之謙 - 別.mp3'
path_2 = r'C:\Users\yoona\Desktop\music\薛之謙 - 最好.mp3'
s1_1, s1_2, info1 = compute_chunk_features(path_1)
s2_1, s2_2, info2 = compute_chunk_features(path_2)
sim_1 = similarity(s1_1, s2_1)#通道數1
sim_2 = similarity(s1_2, s2_2)#通道數2
info1_new = []
info2_new = []
for i, element in enumerate(sim_1):
if element >= 0.5:
info1_new.append(info1[i])
if not info1_new:
pos = np.argmax(sim_1)
info1_new.append(info1[pos])
s1_1, s1_2 = Compute_Bolck_Features(info1_new, path_1)
for i, element in enumerate(sim_2):
if element >= 0.5:
info2_new.append(info2[i])
if not info2_new:
pos = np.argmax(sim_2)
info2_new.append(info2[pos])
s2_1, s2_2 = Compute_Bolck_Features(info2_new, path_2)
sim_1 = similarity(s1_1, s2_1)#通道數1
sim_2 = similarity(s1_2, s2_2)#通道數2
六、結果分析
第一組實驗對象:
A:薛之謙 - 最好.mp3
B:薛之謙 - 別.mp3
第二組實驗對象:
A:Karim Mika - Superficial Love.mp3
B: Burgess/JESSIA - Eclipse.mp3
第三組實驗對象:
A: CARTA - Aranya (Jungle Festival Anthem).mp3
B: 薛之謙 - 別.mp3