一、前言
依然是博主畢設的手語檢測,好多圖片要處理哦!
今天要處理視頻,接觸了一下,本來是畢設partner另一小姐姐主要研究的。
記錄下我在視頻處理方面的簡單分析~
機器視覺中不可分離的一部分——視頻識別,當然了,視頻識別需要處理數據幀,用opencv是極好的;視頻提取圖像,在視頻上繪製關鍵特徵,分割圖像,保存圖像都是別叫重要的模塊。
我們大多數時候都是對全視頻幀數處理,會因爲視頻過大處理每一幀數據非常耗時;但在特定場合下,我們沒有必要處理無效的視頻幀數,That’s too bad;所以我們需要提取關鍵幀,即有效識別幀數。
一段視頻:(手語:學校)
視頻截取每一幀保存爲圖片:
手語識別的需求:
(圖片截取於:《基於神經網絡的中小詞彙量中國手語識別研究》_李曉旭)
事實上,我們真正只需要識別關鍵幀,!
二、視頻中保存每幀圖片
可選部分~
主要是:cv2.imwrite()
函數
import cv2
import os
# 從.avi 類型的視頻中提取圖像
def splitFrames(sourceFileName):
# 在這裏把後綴接上
video_path = os.path.join('video/', sourceFileName + '.avi')
outPutDirName = 'video/img_' + sourceFileName + '/'
if not os.path.exists(outPutDirName):
#如果文件目錄不存在則創建目錄
os.makedirs(outPutDirName)
cap = cv2. VideoCapture(video_path) # 打開視頻文件
num = 1
while True:
# success 表示是否成功,data是當前幀的圖像數據;.read讀取一幀圖像,移動到下一幀
success, data = cap.read()
if not success:
break
# im = Image.fromarray(data, mode='RGB') # 重建圖像
# im.save('C:/Users/Taozi/Desktop/2019.04.30/' +str(num)+".jpg") # 保存當前幀的靜態圖像
cv2.imwrite( outPutDirName +str(num)+".jpg", data)
num = num + 1
# if num % 20 == 0:
# cv2.imwrite('./Video_dataset/figures/' + str(num) + ".jpg", data)
print(num)
cap.release()
# 從.mp4 數據類型的視頻中提取圖像
def splitFrames_mp4(sourceFileName):
# 在這裏把後綴接上
video_path = os.path.join('video/', sourceFileName + '.mp4')
times = 0
# 提取視頻的頻率,每25幀提取一個
# frameFrequency = 25
# 輸出圖片到當前目錄vedio文件夾下
outPutDirName = 'video/video_' + sourceFileName + '/'
# 如果文件目錄不存在則創建目錄
if not os.path.exists(outPutDirName):
os.makedirs(outPutDirName)
camera = cv2.VideoCapture(video_path)
while True:
times+=1
res, image = camera.read()
if not res:
# print('not res , not image')
break
# if times%frameFrequency==0:
# cv2.imwrite(outPutDirName + str(times)+'.jpg', image)
# print(outPutDirName + str(times)+'.jpg')
cv2.imwrite(outPutDirName + str(times) + '.jpg', image)
print(times,end='\t')
print('\n圖片提取結束')
camera.release()
if __name__ == '__main__':
im_file = 'video/'
# for im_name in im_names:
for im_name in os.listdir(im_file):
suffix_file = os.path.splitext(im_name)[-1]
if suffix_file == '.mp4':
print('~~~~~~~~~~ 從.mp4 視頻提取圖像 ~~~~~~~~~~~~~~~')
sourceFileName = os.path.splitext(im_name)[0]
splitFrames_mp4(sourceFileName)
elif suffix_file == '.avi' :
print('~~~~~~~~~~ 從.avi 視頻提取圖像 ~~~~~~~~~~~~~~~')
sourceFileName = os.path.splitext(im_name)[0]
splitFrames(sourceFileName)
三、幀間差法
1.兩間查分法
步驟:
- 首先,我們加載視頻並計算每幀之間的幀間差異
- 然後,選擇以下三種提取有效幀的方法中的一種來提取關鍵幀
-
使用差值順序
前幾幀具有最大的幀間平均差被認爲是關鍵幀。 -
使用差分閾值
平均幀間差大於平均幀間差的幀被認爲是關鍵幀。 -
使用本地最大平均幀間差爲局部最大值的幀爲被認爲是關鍵幀。
需要注意的是,平滑平均差值之前,計算局部最大值可以有效地消除噪聲,重複提取相似場景的幀。
(1)處理一段視頻
作者運用的是上述第三種方法——提取的是幀差最大值:
# -*- coding: utf-8 -*-
"""
Created on Tue Dec 4 16:48:57 2018
keyframes extract tool
this key frame extract algorithm is based on interframe difference.
The principle is very simple
First, we load the video and compute the interframe difference between each frames
Then, we can choose one of these three methods to extract keyframes, which are
all based on the difference method:
1. use the difference order
The first few frames with the largest average interframe difference
are considered to be key frames.
2. use the difference threshold
The frames which the average interframe difference are large than the
threshold are considered to be key frames.
3. use local maximum
The frames which the average interframe difference are local maximum are
considered to be key frames.
It should be noted that smoothing the average difference value before
calculating the local maximum can effectively remove noise to avoid
repeated extraction of frames of similar scenes.
After a few experiment, the third method has a better key frame extraction effect.
The original code comes from the link below, I optimized the code to reduce
unnecessary memory consumption.
https://blog.csdn.net/qq_21997625/article/details/81285096
@author: zyb_as
"""
import cv2
import operator # 內置操作符函數接口(後面排序用到)
import numpy as np
import matplotlib.pyplot as plt
import os
import sys
from scipy.signal import argrelextrema # 極值點
def smooth(x, window_len=13, window='hanning'):
"""使用具有所需大小的窗口使數據平滑。
This method is based on the convolution of a scaled window with the signal.
The signal is prepared by introducing reflected copies of the signal
(with the window size) in both ends so that transient parts are minimized
in the begining and end part of the output signal.
該方法是基於一個標度窗口與信號的卷積。
通過在兩端引入信號的反射副本(具有窗口大小)來準備信號,
使得在輸出信號的開始和結束部分中將瞬態部分最小化。
input:
x: the input signal輸入信號
window_len: the dimension of the smoothing window平滑窗口的尺寸
window: the type of window from 'flat', 'hanning', 'hamming', 'bartlett', 'blackman'
flat window will produce a moving average smoothing.
平坦的窗口將產生移動平均平滑
output:
the smoothed signal平滑信號
example:
import numpy as np
t = np.linspace(-2,2,0.1)
x = np.sin(t)+np.random.randn(len(t))*0.1
y = smooth(x)
see also:
numpy.hanning, numpy.hamming, numpy.bartlett, numpy.blackman, numpy.convolve
scipy.signal.lfilter
TODO: 如果使用數組而不是字符串,則window參數可能是窗口本身
"""
print(len(x), window_len)
# if x.ndim != 1:
# raise ValueError, "smooth only accepts 1 dimension arrays."
#提高ValueError,“平滑僅接受一維數組。”
# if x.size < window_len:
# raise ValueError, "Input vector needs to be bigger than window size."
#提高ValueError,“輸入向量必須大於窗口大小。”
# if window_len < 3:
# return x
#
# if not window in ['flat', 'hanning', 'hamming', 'bartlett', 'blackman']:
# raise ValueError, "Window is on of 'flat', 'hanning', 'hamming', 'bartlett', 'blackman'"
s = np.r_[2 * x[0] - x[window_len:1:-1],
x, 2 * x[-1] - x[-1:-window_len:-1]]
#print(len(s))
if window == 'flat': # moving average平移
w = np.ones(window_len, 'd')
else:
w = getattr(np, window)(window_len)
y = np.convolve(w / w.sum(), s, mode='same')
return y[window_len - 1:-window_len + 1]
class Frame:
"""class to hold information about each frame
用於保存有關每個幀的信息
"""
def __init__(self, id, diff):
self.id = id
self.diff = diff
def __lt__(self, other):
if self.id == other.id:
return self.id < other.id
return self.id < other.id
def __gt__(self, other):
return other.__lt__(self)
def __eq__(self, other):
return self.id == other.id and self.id == other.id
def __ne__(self, other):
return not self.__eq__(other)
def rel_change(a, b):
x = (b - a) / max(a, b)
print(x)
return x
def getEffectiveFrame(videopath,dir):
# 如果文件目錄不存在則創建目錄
if not os.path.exists(dir):
os.makedirs(dir)
(filepath, tempfilename) = os.path.split(videopath)#分離路徑和文件名
(filename, extension) = os.path.splitext(tempfilename)#區分文件的名字和後綴
#Setting fixed threshold criteria設置固定閾值標準
USE_THRESH = False
#fixed threshold value固定閾值
THRESH = 0.6
#Setting fixed threshold criteria設置固定閾值標準
USE_TOP_ORDER = False
#Setting local maxima criteria設置局部最大值標準
USE_LOCAL_MAXIMA = True
#Number of top sorted frames排名最高的幀數
NUM_TOP_FRAMES = 50
#smoothing window size平滑窗口大小
len_window = int(50)
print("target video :" + videopath)
print("frame save directory: " + dir)
# load video and compute diff between frames加載視頻並計算幀之間的差異
cap = cv2.VideoCapture(str(videopath))
curr_frame = None
prev_frame = None
frame_diffs = []
frames = []
success, frame = cap.read()
i = 0
while(success):
luv = cv2.cvtColor(frame, cv2.COLOR_BGR2LUV)
curr_frame = luv
if curr_frame is not None and prev_frame is not None:
#logic here
diff = cv2.absdiff(curr_frame, prev_frame)#獲取差分圖
diff_sum = np.sum(diff)
diff_sum_mean = diff_sum / (diff.shape[0] * diff.shape[1])#平均幀
frame_diffs.append(diff_sum_mean)
frame = Frame(i, diff_sum_mean)
frames.append(frame)
prev_frame = curr_frame
i = i + 1
success, frame = cap.read()
cap.release()
# compute keyframe
keyframe_id_set = set()
if USE_TOP_ORDER:
# sort the list in descending order以降序對列表進行排序
frames.sort(key=operator.attrgetter("diff"), reverse=True)# 排序operator.attrgetter
for keyframe in frames[:NUM_TOP_FRAMES]:
keyframe_id_set.add(keyframe.id)
if USE_THRESH:
print("Using Threshold")#使用閾值
for i in range(1, len(frames)):
if (rel_change(np.float(frames[i - 1].diff), np.float(frames[i].diff)) >= THRESH):
keyframe_id_set.add(frames[i].id)
if USE_LOCAL_MAXIMA:
print("Using Local Maxima")#使用局部極大值
diff_array = np.array(frame_diffs)
sm_diff_array = smooth(diff_array, len_window)#平滑
frame_indexes = np.asarray(argrelextrema(sm_diff_array, np.greater))[0]#找極值
for i in frame_indexes:
keyframe_id_set.add(frames[i - 1].id)# 記錄極值幀數
plt.figure(figsize=(40, 20))
plt.locator_params("x", nbins = 100)
# stem 繪製離散函數,polt是連續函數
plt.stem(sm_diff_array,linefmt='-',markerfmt='o',basefmt='--',label='sm_diff_array')
plt.savefig(dir + filename+'_plot.png')
# save all keyframes as image將所有關鍵幀另存爲圖像
cap = cv2.VideoCapture(str(videopath))
curr_frame = None
keyframes = []
success, frame = cap.read()
idx = 0
while(success):
if idx in keyframe_id_set:
name = filename+'_' + str(idx) + ".jpg"
cv2.imwrite(dir + name, frame)
keyframe_id_set.remove(idx)
idx = idx + 1
success, frame = cap.read()
cap.release()
if __name__ == "__main__":
print(sys.executable)
#Video path of the source file源文件的視頻路徑
videopath= 'video/school.mp4'
#Directory to store the processed frames存儲已處理幀的目錄
dir = 'video/extract_result/'
getEffectiveFrame(videopath,dir)
效果:
(2)批量處理視頻
# -*- coding: utf-8 -*-
import cv2
import os
import time
import operator # 內置操作符函數接口(後面排序用到)
import numpy as np
import matplotlib.pyplot as plt
from scipy.signal import argrelextrema # 極值點
def smooth(x, window_len=13, window='hanning'):
"""使用具有所需大小的窗口使數據平滑。
"""
print(len(x), window_len)
s = np.r_[2 * x[0] - x[window_len:1:-1],
x, 2 * x[-1] - x[-1:-window_len:-1]]
#print(len(s))
if window == 'flat': # moving average平移
w = np.ones(window_len, 'd')
else:
w = getattr(np, window)(window_len)
y = np.convolve(w / w.sum(), s, mode='same')
return y[window_len - 1:-window_len + 1]
class Frame:
"""用於保存有關每個幀的信息
"""
def __init__(self, id, diff):
self.id = id
self.diff = diff
def __lt__(self, other):
if self.id == other.id:
return self.id < other.id
return self.id < other.id
def __gt__(self, other):
return other.__lt__(self)
def __eq__(self, other):
return self.id == other.id and self.id == other.id
def __ne__(self, other):
return not self.__eq__(other)
def rel_change(a, b):
x = (b - a) / max(a, b)
print(x)
return x
def getEffectiveFrame(videopath,dirfile):
# 如果文件目錄不存在則創建目錄
if not os.path.exists(dirfile):
os.makedirs(dirfile)
(filepath, tempfilename) = os.path.split(videopath)#分離路徑和文件名
(filename, extension) = os.path.splitext(tempfilename)#區分文件的名字和後綴
#Setting fixed threshold criteria設置固定閾值標準
USE_THRESH = False
#fixed threshold value固定閾值
THRESH = 0.6
#Setting fixed threshold criteria設置固定閾值標準
USE_TOP_ORDER = False
#Setting local maxima criteria設置局部最大值標準
USE_LOCAL_MAXIMA = True
#Number of top sorted frames排名最高的幀數
NUM_TOP_FRAMES = 50
#smoothing window size平滑窗口大小
len_window = int(50)
print("target video :" + videopath)
print("frame save directory: " + dirfile)
# load video and compute diff between frames加載視頻並計算幀之間的差異
cap = cv2.VideoCapture(str(videopath))
curr_frame = None
prev_frame = None
frame_diffs = []
frames = []
success, frame = cap.read()
i = 0
while(success):
luv = cv2.cvtColor(frame, cv2.COLOR_BGR2LUV)
curr_frame = luv
if curr_frame is not None and prev_frame is not None:
#logic here
diff = cv2.absdiff(curr_frame, prev_frame)#獲取差分圖
diff_sum = np.sum(diff)
diff_sum_mean = diff_sum / (diff.shape[0] * diff.shape[1])#平均幀
frame_diffs.append(diff_sum_mean)
frame = Frame(i, diff_sum_mean)
frames.append(frame)
prev_frame = curr_frame
i = i + 1
success, frame = cap.read()
cap.release()
# compute keyframe
keyframe_id_set = set()
if USE_TOP_ORDER:
# sort the list in descending order以降序對列表進行排序
frames.sort(key=operator.attrgetter("diff"), reverse=True)# 排序operator.attrgetter
for keyframe in frames[:NUM_TOP_FRAMES]:
keyframe_id_set.add(keyframe.id)
if USE_THRESH:
print("Using Threshold")#使用閾值
for i in range(1, len(frames)):
if (rel_change(np.float(frames[i - 1].diff), np.float(frames[i].diff)) >= THRESH):
keyframe_id_set.add(frames[i].id)
if USE_LOCAL_MAXIMA:
print("Using Local Maxima")#使用局部極大值
diff_array = np.array(frame_diffs)
sm_diff_array = smooth(diff_array, len_window)#平滑
frame_indexes = np.asarray(argrelextrema(sm_diff_array, np.greater))[0]#找極值
for i in frame_indexes:
keyframe_id_set.add(frames[i - 1].id)# 記錄極值幀數
plt.figure(figsize=(40, 20))
plt.locator_params("x", nbins = 100)
# stem 繪製離散函數,polt是連續函數
plt.stem(sm_diff_array,linefmt='-',markerfmt='o',basefmt='--',label='sm_diff_array')
plt.savefig(dirfile + filename+'_plot.png')
# save all keyframes as image將所有關鍵幀另存爲圖像
cap = cv2.VideoCapture(str(videopath))
curr_frame = None
keyframes = []
success, frame = cap.read()
idx = 0
while(success):
if idx in keyframe_id_set:
name = filename+'_' + str(idx) + ".jpg"
cv2.imwrite(dirfile + name, frame)
keyframe_id_set.remove(idx)
idx = idx + 1
success, frame = cap.read()
cap.release()
if __name__ == "__main__":
print("[INFO]Effective Frame.")
start = time.time()
videos_path= 'dataset/vedio/onehand/'
outfile = 'dataset/vedio/extract_result/'#處理完的幀
video_files = [os.path.join(videos_path, video_file) for video_file in os.listdir(videos_path)]
#
for video_file in video_files:
getEffectiveFrame(video_file,outfile)
print("[INFO]Extract Result time: ", time.time() - start)
(3)擴展
這是採用的第三種方法,在手語識別中適用性一般,存在的缺點:
-
提取出的關鍵幀數量較少,極準確特徵手勢表達不強
(增加除最高幀差點額外的點) -
間幀差別容易受極端幀影響
視頻處理工具查看每一幀情況:(這段視頻中前三幀有黑屏,導致平均幀差過大,提取不到關鍵幀。)
幀差曲線:(失真)
(對選取的視頻要做處理,有極端幀要去除)
是否採用:
-
使用差值順序
前幾幀具有最大的幀間平均差被認爲是關鍵幀。 -
使用差分閾值
平均幀間差大於平均幀間差的幀被認爲是關鍵幀。
有待驗證~
2.三間差分法
兩間差分法:
三間差分法:
雙目攝像機:
有一點難了,我還是等畢設partner的結果吧。