Camshift知識點

mean-shift 的特點是把支撐空間和特徵空間在數據密度的框架下綜合了起來。對圖像來講,支撐空間就是像素點的座標,特徵空間就是對應像素點的灰度或者RGB三分量。將這兩個空間綜合後,一個數據點就是一個5維的向量:[x,y,r,g,b]。

這在觀念上看似簡單,實質是一個飛躍,它是mean-shift方法的基點。

mean-shift方法很寶貴的一個特點就是在這樣迭代計算的框架下,求得的mean-shift向量必收斂於數據密度的局部最大點。可以細看[ComaniciuMeer2002]的文章。

寫了點程序,可以對圖像做簡單的mean-shift filtering,供參考:

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
function [DRGB, DSD, MSSD] = MScut(sMode, RGB_raw, hs, hf, m );
% designed for segmenting a colour image using mean-shift [ComaniciuMeer 2002]
% image must be color
% procedure in mean-shift
% 1. combine support space and feature space to make a mean-shift space
%    based data description
% 2. for every mean-shift space data
% 3.   do mean-shift filtering
%      until convergence
% 4. end
% 5. find the converged mean-shift space data that you are interested in
%    and label it
% 6. repeat the above steps

% a     -- data in support space
% b     -- data in feature space
% x     -- data in mean-shift space
% f(.)  -- data density function
% k(.)  -- profile function (implicit)
% g(.)  -- profile function (explicit)
% m     -- mean shift vector
% hs    -- bandwidth in support space
% hf    -- bandwidth in feature space
% M     -- threshold to make a distinct cluster
%% enter $hs$, $hf$, $m$ if necessary
if ~exist('hs')
    hs = input('please enter spatial bandwidth (hs):\n');
end
if ~exist('hf')
    hf = input('please enter feature bandwidth (hf):\n');
end
if ~exist('m')
    m = input('please enter minimum cluster size (m):\n');
end
switch upper(sMode)
    case 'RGB'
        RGB = double( RGB_raw );
    case 'gray'
        error('FCMcut must use colored image to do segmentation!')
end
sz = size(RGB);
mTCUT = Tcut( RGB(:,:,1) ); % trivial segmentation

%% project data into mean-shift space to make $MSSD$ (mean-shift space data)
mT = repmat([1:sz(1)]', 1, sz(2));
vX = mT(1:end)';             % row 
mT = repmat([1:sz(2)], sz(1), 1); 
vY = mT(1:end)';  % column
mT = RGB(:,:,1);
vR = mT(1:end)'; % red
mT = RGB(:,:,2);
vG = mT(1:end)'; % green
mT = RGB(:,:,3);
vB = mT(1:end)'; % blue
MSSD = [vX, vY, vR, vG, vB];
%% make $g$ - explicit profile function
disp('Using flat kernel: Epanechnikov kernel...')
g_s = ones(2*hs+1, 2); % 's' for support space
g_f = ones(2*hf+1, 3); % 'f' for feature space
%% main part $$
nIteration = 4;
nData   = length(MSSD); % total number of data
DSD     = MSSD*0; % 'DSD' for destination space data
for k = 1:nData 
    % 
    tMSSD = MSSD(k,:); % 't' for temp
    for l = 1:nIteration
        %
        mT = abs( MSSD - repmat(tMSSD, nData, 1));
        vT = logical( (mT(:,1)<=hs).*(mT(:,2)<=hs).*(mT(:,3)<=hf).*(mT(:,4)<=hf).*(mT(:,5)<=hf) );
        v  = MSSD(vT,:);
        % update $tMSSD$
        tMSSD = mean( v, 1 );
        if nIteration == l
            DSD(k,:) = tMSSD;
        end
    end
end
% show result
DRGB = RGB * 0;
DRGB(:,:,1) = reshape(DSD(:,3), sz(1), sz(2)); % red
DRGB(:,:,2) = reshape(DSD(:,4), sz(1), sz(2)); % red
DRGB(:,:,3) = reshape(DSD(:,5), sz(1), sz(2)); % red

figure, imshow(uint8(DRGB), [])

OpenCV實現的mean shift filtering/segmentation解析

  (2013-04-13 22:19:49)
   
Mean shift作爲一種有效地特徵空間分析方法,在圖像濾波,圖像分割,物體跟蹤等方面都有廣泛的應用。
Mean shift算法的詳細介紹,可以參見PAMI 2002的paper。
Comaniciu, D. and P. Meer (2002). "Mean shift: A robust approach toward feature space analysis." Pattern Analysis and Machine Intelligence, IEEE Transactions on 24(5): 603-619.

OpenCV分別實現了mean shift用來做跟蹤、分割和濾波的函數。
其中濾波的c++函數原型爲:
void pyrMeanShiftFiltering(InputArray src, OutputArray dst, double sp, double sr, intmaxLevel=1, TermCriteria termcrit=TermCriteria( TermCriteria::MAX_ITER+TermCriteria::EPS,5,1) )
src和dst分別爲輸入和輸出圖像,8 bit,3 channel,sp和sr爲空間域和顏色域的半徑,maxLevel爲分割用金字塔的最大層數,termcrit爲迭代的終止條件。

跟蹤的函數原型爲

int meanShift(InputArray probImage, Rect& window, TermCriteria criteria)
proImage爲生成的物體存在的概率圖,window爲初始化的搜索窗口(同時是輸出的搜索結果),criteria爲終止條件。

分割的函數原型爲:

void gpu::meanShiftSegmentation(const GpuMat& src, Mat& dst, int sp, int sr, int minsize, TermCriteria criteria=TermCriteria(TermCriteria::MAX_ITER + TermCriteria::EPS, 5, 1))
大部分參數與
pyrMeanShiftFiltering相同,minsize爲最小的分割區域大小,小於這個大小的區域會被合併。

OpenCV sample裏用
pyrMeanShiftFiltering和floodfill函數共同實現了簡單的分割的例子.(/samples/cpp/Meanshift_segmentation.cpp)。

Mean Shift算法,一般是指一個迭代的步驟,即先算出當前點的偏移均值,移動該點到其偏移均值,然後以此爲新的起始點,繼續移動,直到滿足一定的條件結束.

 1. Meanshift推導

給定d維空間Rd的n個樣本點 ,i=1,…,n,在空間中任選一點x,那麼Mean Shift向量的基本形式定義爲:                             

 Sk是一個半徑爲h的高維球區域,滿足以下關係的y點的集合,

k表示在這n個樣本點xi中,有k個點落入Sk區域中.

以上是官方的說法,即書上的定義,我的理解就是,在d維空間中,任選一個點,然後以這個點爲圓心,h爲半徑做一個高維球,因爲有d維,d可能大於2,所以是高維球。落在這個球內的所有點和圓心都會產生一個向量,向量是以圓心爲起點落在球內的點位終點。然後把這些向量都相加。相加的結果就是Meanshift向量。

如圖所以。其中黃色箭頭就是Mh(meanshift向量)。

再以meanshift向量的終點爲圓心,再做一個高維的球。如下圖所以,重複以上步驟,就可得到一個meanshift向量。如此重複下去,meanshift算法可以收斂到概率密度最大得地方。也就是最稠密的地方。

最終的結果如下:

Meanshift推導:

 把基本的meanshift向量加入核函數,核函數的性質在這篇博客介紹:http://www.cnblogs.com/liqizhou/archive/2012/05/11/2495788.html

那麼,meanshift算法變形爲

                                                         (1)

解釋一下K()核函數,h爲半徑,Ck,d/nhd  爲單位密度,要使得上式f得到最大,最容易想到的就是對上式進行求導,的確meanshift就是對上式進行求導.

(2)             

令:

K(x)叫做g(x)的影子核,名字聽上去聽深奧的,也就是求導的負方向,那麼上式可以表示

對於上式,如果才用高斯核,那麼,第一項就等於fh,k

第二項就相當於一個meanshift向量的式子:

 那麼(2)就可以表示爲

下圖分析的構成,如圖所以,可以很清晰的表達其構成。

要使得=0,當且僅當=0,可以得出新的圓心座標:

                          (3) 

 

上面介紹了meanshift的流程,但是比較散,下面具體給出它的算法流程。

  1. 選擇空間中x爲圓心,以h爲半徑爲半徑,做一個高維球,落在所有球內的所有點xi
  2. 計算,如果<ε(人工設定),推出程序。如果>ε, 則利用(3)計算x,返回1.

 

2.meanshift在圖像上的聚類:

真正大牛的人就能創造算法,例如像meanshift,em這個樣的算法,這樣的創新才能推動整個學科的發展。還有的人就是把算法運用的實際的運用中,推動整個工業進步,也就是技術的進步。下面介紹meashift算法怎樣運用到圖像上的聚類核跟蹤。

一般一個圖像就是個矩陣,像素點均勻的分佈在圖像上,就沒有點的稠密性。所以怎樣來定義點的概率密度,這纔是最關鍵的。

如果我們就算點x的概率密度,採用的方法如下:以x爲圓心,以h爲半徑。落在球內的點位xi   定義二個模式規則。

(1)x像素點的顏色與xi像素點顏色越相近,我們定義概率密度越高。

(2)離x的位置越近的像素點xi,定義概率密度越高。

所以定義總的概率密度,是二個規則概率密度乘積的結果,可以(4)表示

(4)

其中:代表空間位置的信息,離遠點越近,其值就越大,表示顏色信息,顏色越相似,其值越大。如圖左上角圖片,按照(4)計算的概率密度如圖右上。利用meanshift對其聚類,可得到左下角的圖。

 

 


Camshift算法原理及其Opencv實現

Camshift原理
camshift利用目標的顏色直方圖模型將圖像轉換爲顏色概率分佈圖,初始化一個搜索窗的大小和位置,並根據上一幀得到的結果自適應調整搜索窗口的位置和大小,從而定位出當前圖像中目標的中心位置。

分爲三個部分:
1--色彩投影圖(反向投影):
(1).RGB顏色空間對光照亮度變化較爲敏感,爲了減少此變化對跟蹤效果的影響,首先將圖像從RGB空間轉換到HSV空間。(2).然後對其中的H分量作直方圖,在直方圖中代表了不同H分量值出現的概率或者像素個數,就是說可以查找出H分量大小爲h的概率或者像素個數,即得到了顏色概率查找表。(3).將圖像中每個像素的值用其顏色出現的概率對替換,就得到了顏色概率分佈圖。這個過程就叫反向投影,顏色概率分佈圖是一個灰度圖像。

2--meanshift
meanshift算法是一種密度函數梯度估計的非參數方法,通過迭代尋優找到概率分佈的極值來定位目標。
算法過程爲:
(1).在顏色概率分佈圖中選取搜索窗W
(2).計算零階距:

計算一階距:

計算搜索窗的質心:

(3).調整搜索窗大小
寬度爲;長度爲1.2s;
(4).移動搜索窗的中心到質心,如果移動距離大於預設的固定閾值,則重複2)3)4),直到搜索窗的中心與質心間的移動距離小於預設的固定閾值,或者循環運算的次數達到某一最大值,停止計算。關於meanshift的收斂性證明可以google相關文獻。

3--camshift
將meanshift算法擴展到連續圖像序列,就是camshift算法。它將視頻的所有幀做meanshift運算,並將上一幀的結果,即搜索窗的大小和中心,作爲下一幀meanshift算法搜索窗的初始值。如此迭代下去,就可以實現對目標的跟蹤。
算法過程爲:
(1).初始化搜索窗
(2).計算搜索窗的顏色概率分佈(反向投影)
(3).運行meanshift算法,獲得搜索窗新的大小和位置。
(4).在下一幀視頻圖像中用(3)中的值重新初始化搜索窗的大小和位置,再跳轉到(2)繼續進行。

camshift能有效解決目標變形和遮擋的問題,對系統資源要求不高,時間複雜度低,在簡單背景下能夠取得良好的跟蹤效果。但當背景較爲複雜,或者有許多與目標顏色相似像素干擾的情況下,會導致跟蹤失敗。因爲它單純的考慮顏色直方圖,忽略了目標的空間分佈特性,所以這種情況下需加入對跟蹤目標的預測算法。



Camshift的opencv實現
原文http://blog.csdn.net/houdy/archive/2004/11/10/175739.aspx

1--Back Projection
計算Back Projection的OpenCV代碼。
(1).準備一張只包含被跟蹤目標的圖片,將色彩空間轉化到HSI空間,獲得其中的H分量:
IplImage* target=cvLoadImage("target.bmp",-1);  //裝載圖片
IplImage* target_hsv=cvCreateImage( cvGetSize(target), IPL_DEPTH_8U, 3 );
IplImage* target_hue=cvCreateImage( cvGetSize(target), IPL_DEPTH_8U, 3 );
cvCvtColor(target,target_hsv,CV_BGR2HSV);       //轉化到HSV空間
cvSplit( target_hsv, target_hue, NULL, NULL, NULL );    //獲得H分量

(2).計算H分量的直方圖,即1D直方圖:
IplImage* h_plane=cvCreateImage( cvGetSize(target_hsv),IPL_DEPTH_8U,1 );
int hist_size[]={255};          //將H分量的值量化到[0,255]
float* ranges[]={ {0,360} };    //H分量的取值範圍是[0,360)
CvHistogram* hist=cvCreateHist(1, hist_size, ranges, 1);
cvCalcHist(&target_hue, hist, 0, NULL);
在這裏需要考慮H分量的取值範圍的問題,H分量的取值範圍是[0,360),這個取值範圍的值不能用一個byte來表示,爲了能用一個byte表示,需要將H值做適當的量化處理,在這裏我們將H分量的範圍量化到[0,255]。

(3).計算Back Projection:
IplImage* rawImage;
//get from video frame,unsigned byte,one channel
IplImage* result=cvCreateImage(cvGetSize(rawImage),IPL_DEPTH_8U,1);
cvCalcBackProject(&rawImage,result,hist);
(4). result即爲我們需要的.

2--Mean Shift算法
質心可以通過以下公式來計算:
(1).計算區域內0階矩
for(int i=0;i< height;i++)
for(int j=0;j< width;j++)
M00+=I(i,j)

(2).區域內1階矩:
for(int i=0;i< height;i++)
for(int j=0;j< width;j++)
{
M10+=i*I(i,j);
M01+=j*I(i,j);
}

(3).則Mass Center爲:
Xc=M10/M00; Yc=M01/M00

在OpenCV中,提供Mean Shift算法的函數,函數的原型是:
int cvMeanShift(IplImage* imgprob,CvRect windowIn,
CvTermCriteria criteria,CvConnectedComp* out);
需要的參數爲:
(1).IplImage* imgprob:2D概率分佈圖像,傳入;
(2).CvRect windowIn:初始的窗口,傳入;
(3).CvTermCriteria criteria:停止迭代的標準,傳入;
(4).CvConnectedComp* out:查詢結果,傳出。
注:構造CvTermCriteria變量需要三個參數,一個是類型,另一個是迭代的最大次數,最後一個表示特定的閾值。例如可以這樣構造 criteria:
criteria=cvTermCriteria(CV_TERMCRIT_ITER|CV_TERMCRIT_EPS,10,0.1)。

3--CamShift算法
整個算法的具體步驟分5步:
Step 1:將整個圖像設爲搜尋區域。
Step 2:初始話Search Window的大小和位置。
Step 3:計算Search Window內的彩色概率分佈,此區域的大小比Search Window要稍微大一點。
Step 4:運行MeanShift。獲得Search Window新的位置和大小。
Step 5:在下一幀視頻圖像中,用Step 3獲得的值初始化Search Window的位置和大小。跳轉到Step 3繼續運行。

OpenCV代碼:
在OpenCV中,有實現CamShift算法的函數,此函數的原型是:
cvCamShift(IplImage* imgprob, CvRect windowIn,
CvTermCriteria criteria,
CvConnectedComp* out, CvBox2D* box=0);
其中:
imgprob:色彩概率分佈圖像。
windowIn:Search Window的初始值。
Criteria:用來判斷搜尋是否停止的一個標準。
out:保存運算結果,包括新的Search Window的位置和麪積。
box:包含被跟蹤物體的最小矩形。

更多參考:

帶有註釋的camshift算法的opencv實現代碼見:
http://download.csdn.net/source/1663015


Introduction To Mean Shift Algorithm

http://saravananthirumuruganathan.wordpress.com/2010/04/01/introduction-to-mean-shift-algorithm/

Its been quite some time since I wrote a Data Mining post . Today, I intend to post on Mean Shift – a really cool but not very well known algorithm. The basic idea is quite simple but the results are amazing. It was invented long back in 1975 but was not widely used till two papers applied the algorithm to Computer Vision.

I learned this algorithm in my Advanced Data Mining course and I wrote the lecture notes on it. So here I am trying to convert my lecture notes to a post. I have tried to simplify it – but this post is quite involved than the other posts.

It is quite sad that there exists no good post on such a good algorithm. While writing my lecture notes, I struggled a lot for good resources :) . The 3 “classic" papers on Mean Shift are quite hard to understand. Most of the other resources are usually from Computer Vision courses where Mean Shift is taught lightly as yet another technique for vision tasks  (like segmentation) and contains only the main intuition and the formulas.

As a disclaimer, there might be errors in my exposition – so if you find anything wrong please let me know and I will fix it. You can always check out the reference for more details. I have not included any graphics in it but you can check the ppt given in the references for an animation of Mean Shift.

Introduction

Mean Shift is a powerful and versatile non parametric iterative algorithm that can be used for lot of purposes like finding modes, clustering etc. Mean Shift was introduced in Fukunaga and Hostetler [1] and has been extended to be applicable in other fields like Computer Vision.This document will provide a discussion of Mean Shift , prove its convergence and slightly discuss its important applications.

Intuitive Idea of Mean Shift

This section provides an intuitive idea of Mean shift and the later sections will expand the idea. Mean shift considers feature space as a empirical probability density function. If the input is a set of points then Mean shift considers them as sampled from the underlying probability density function. If dense regions (or clusters) are present in the feature space , then they correspond to the mode (or local maxima) of the probability density function. We can also identify clusters associated with the given mode using Mean Shift.

For each data point, Mean shift associates it with the nearby peak of the dataset’s probability density function. For each data point, Mean shift defines a window around it and computes the mean of the data point . Then it shifts the center of the window to the mean and repeats the algorithm till it converges. After each iteration, we can consider that the window shifts to a more denser region of the dataset.

At the high level, we can specify Mean Shift as follows : 
1. Fix a window around each data point. 
2. Compute the mean of data within the window. 
3. Shift the window to the mean and repeat till convergence.

 

Preliminaries

Kernels :

A kernel is a function that satisfies the following requirements :

1. \int_{R^{d}}\phi(x)=1

2. \phi(x)\geq0

Some examples of kernels include :

1. Rectangular \phi(x)=\begin{cases} 1 & a\leq x\leq b\\ 0 & else\end{cases}

2. Gaussian \phi(x)=e^{-\frac{x^{2}}{2\sigma^{2}}}

3. Epanechnikov \phi(x)=\begin{cases} \frac{3}{4}(1-x^{2}) & if\;|x|\leq1\\ 0 & else\end{cases}

Kernel Density Estimation

Kernel density estimation is a non parametric way to estimate the density function of a random variable. This is usually called as the Parzen window technique. Given a kernel K, bandwidth parameter h , Kernel density estimator for a given set of d-dimensional points is

{\displaystyle \hat{f}(x)=\frac{1}{nh^{d}}\sum_{i=1}^{n}K\left(\frac{x-x_{i}}{h}\right)}

 

Gradient Ascent Nature of Mean Shift

Mean shift can be considered to based on Gradient ascent on the density contour. The generic formula for gradient ascent is ,

x_{1}=x_{0}+\eta f'(x_{0})

Applying it to kernel density estimator,

{\displaystyle \hat{f}(x)=\frac{1}{nh^{d}}\sum_{i=1}^{n}K\left(\frac{x-x_{i}}{h}\right)}

\bigtriangledown{\displaystyle \hat{f}(x)=\frac{1}{nh^{d}}\sum_{i=1}^{n}K'\left(\frac{x-x_{i}}{h}\right)}

Setting it to 0 we get,

{\displaystyle \sum_{i=1}^{n}K'\left(\frac{x-x_{i}}{h}\right)\overrightarrow{x}=\sum_{i=1}^{n}K'\left(\frac{x-x_{i}}{h}\right)\overrightarrow{x_{i}}}

Finally , we get

{\displaystyle \overrightarrow{x}=\frac{\sum_{i=1}^{n}K'\left(\frac{x-x_{i}}{h}\right)\overrightarrow{x_{i}}}{\sum_{i=1}^{n}K'\left(\frac{x-x_{i}}{h}\right)}}

 

Mean Shift

As explained above, Mean shift treats the points the feature space as an probability density function . Dense regions in feature space corresponds to local maxima or modes. So for each data point, we perform gradient ascent on the local estimated density until convergence. The stationary points obtained via gradient ascent represent the modes of the density function. All points associated with the same stationary point belong to the same cluster.

Assuming g(x)=-K'(x) , we have

{\displaystyle m(x)=\frac{\sum_{i=1}^{n}g\left(\frac{x-x_{i}}{h}\right)x_{i}}{\sum_{i=1}^{n}g\left(\frac{x-x_{i}}{h}\right)}-x}

The quantity m(x) is called as the mean shift. So mean shift procedure can be summarized as : For each point x_{i}

1. Compute mean shift vector m(x_{i}^{t})

2. Move the density estimation window by m(x_{i}^{t})

3. Repeat till convergence

 

Using a Gaussian kernel as an example,

1. y_{i}^{0}=x_{i} 
2. {\displaystyle y_{i}^{t+1}=\frac{\sum_{i=1}^{n}x_{j}e^{\frac{-|y_{i}^{t}-x_{j}|^{2}}{h^{2}}}}{\sum_{i=1}^{n}e^{\frac{-|y_{i}^{t}-x_{j}|^{2}}{h^{2}}}}}

 

Proof Of Convergence

Using the kernel profile,

{\displaystyle y^{t+1}=\frac{\sum_{i=1}^{n}x_{i}k(||\frac{y^{t}-x_{i}}{h}||^{2})}{\sum_{i=1}^{n}k(||\frac{y^{t}-x_{i}}{h}||^{2})}}

To prove the convergence , we have to prove that f(y^{t+1})\geq f(y^{t})

f(y^{t+1})-f(y^{t})={\displaystyle \sum_{i=1}^{n}}k(||\frac{y^{t+1}-x_{i}}{h}||^{2})-{\displaystyle \sum_{i=1}^{n}}k(||\frac{y^{t}-x_{i}}{h}||^{2})

But since the kernel is a convex function we have ,

k(y^{t+1})-k(y^{t})\geq k'(y^{t})(y^{t+1}-y^{t})

Using it ,

f(y^{t+1})-f(y^{t})\geq{\displaystyle \sum_{i=1}^{n}}k'(||\frac{y^{t}-x_{i}}{h}||^{2})(||\frac{y^{t+1}-x_{i}}{h}||^{2}-||\frac{y^{t}-x_{i}}{h}||^{2})

=\frac{1}{h^{2}}{\displaystyle \sum_{i=1}^{n}}k'(||\frac{y^{t}-x_{i}}{h}||^{2})(y^{(t+1)^{2}}-2y^{t+1}x_{i}+x_{i}^{2}-(y^{t^{2}}-2y^{t}x_{i}+x_{i}^{2}))

=\frac{1}{h^{2}}{\displaystyle \sum_{i=1}^{n}}k'(||\frac{y^{t}-x_{i}}{h}||^{2})(y^{(t+1)^{2}}-y^{t^{2}}-2(y^{t+1}-y^{t})^{T}x_{i})

=\frac{1}{h^{2}}{\displaystyle \sum_{i=1}^{n}}k'(||\frac{y^{t}-x_{i}}{h}||^{2})(y^{(t+1)^{2}}-y^{t^{2}}-2(y^{t+1}-y^{t})^{T}y^{t+1})

=\frac{1}{h^{2}}{\displaystyle \sum_{i=1}^{n}}k'(||\frac{y^{t}-x_{i}}{h}||^{2})(y^{(t+1)^{2}}-y^{t^{2}}-2(y^{(t+1)^{2}}-y^{t}y^{t+1}))

=\frac{1}{h^{2}}{\displaystyle \sum_{i=1}^{n}}k'(||\frac{y^{t}-x_{i}}{h}||^{2})(y^{(t+1)^{2}}-y^{t^{2}}-2y^{(t+1)^{2}}+2y^{t}y^{t+1})

=\frac{1}{h^{2}}{\displaystyle \sum_{i=1}^{n}}k'(||\frac{y^{t}-x_{i}}{h}||^{2})(-y^{(t+1)^{2}}-y^{t^{2}}+2y^{t}y^{t+1})

=\frac{1}{h^{2}}{\displaystyle \sum_{i=1}^{n}}k'(||\frac{y^{t}-x_{i}}{h}||^{2})(-1)(y^{(t+1)^{2}}+y^{t^{2}}-2y^{t}y^{t+1})

=\frac{1}{h^{2}}{\displaystyle \sum_{i=1}^{n}-}k'(||\frac{y^{t}-x_{i}}{h}||^{2})(||y^{t+1}-y^{t}||^{2})

\geq0

Thus we have proven that the sequence \{f(j)\}_{j=1,2...}is convergent. The second part of the proof in [2] which tries to prove the sequence \{y_{j}\}_{j=1,2,...}is convergent is wrong.

Improvements to Classic Mean Shift Algorithm

The classic mean shift algorithm is time intensive. The time complexity of it is given by O(Tn^{2}) where T is the number of iterations and n is the number of data points in the data set. Many improvements have been made to the mean shift algorithm to make it converge faster.

One of them is the adaptive Mean Shift where you let the bandwidth parameter vary for each data point. Here, the h parameter is calculated using kNN algorithm. If x_{i,k}is the k-nearest neighbor of x_{i} then the bandwidth is calculated as

h_{i}=||x_{i}-x_{i,k}||

Here we use L_{1}or L_{2} norm to find the bandwidth.

 

An alternate way to speed up convergence is to alter the data points 
during the course of Mean Shift. Again using a Gaussian kernel as 
an example, 

1. y_{i}^{0}=x_{i} 
2. {\displaystyle y_{i}^{t+1}=\frac{\sum_{i=1}^{n}x_{j}e^{\frac{-|y_{i}^{t}-x_{j}|^{2}}{h^{2}}}}{\sum_{i=1}^{n}e^{\frac{-|y_{i}^{t}-x_{j}|^{2}}{h^{2}}}}} 
3. x_{i}=y_{i}^{t+1}

Other Issues

1. Even though mean shift is a non parametric algorithm , it does require the bandwidth parameter h to be tuned. We can use kNN to find out the bandwidth. The choice of bandwidth in influences convergence rate and the number of clusters. 
2. Choice of bandwidth parameter h is critical. A large h might result in incorrect 
clustering and might merge distinct clusters. A very small h might result in too many clusters.

3. When using kNN to determining h, the choice of k influences the value of h. For good results, k has to increase when the dimension of the data increases. 
4. Mean shift might not work well in higher dimensions. In higher dimensions , the number of local maxima is pretty high and it might converge to a local optima soon. 
5. Epanechnikov kernel has a clear cutoff and is optimal in bias-variance tradeoff.

Applications of Mean Shift

Mean shift is a versatile algorithm that has found a lot of practical applications – especially in the computer vision field. In the computer vision, the dimensions are usually low (e.g. the color profile of the image). Hence mean shift is used to perform lot of common tasks in vision.

Clustering

The most important application is using Mean Shift for clustering. The fact that Mean Shift does not make assumptions about the number of clusters or the shape of the cluster makes it ideal for handling clusters of arbitrary shape and number.

Although, Mean Shift is primarily a mode finding algorithm , we can find clusters using it. The stationary points obtained via gradient ascent represent the modes of the density function. All points associated with the same stationary point belong to the same cluster.

An alternate way is to use the concept of Basin of Attraction. Informally, the set of points that converge to the same mode forms the basin of attraction for that mode. All the points in the same basin of attraction are associated with the same cluster. The number of clusters is obtained by the number of modes.

Computer Vision Applications

Mean Shift is used in multiple tasks in Computer Vision like segmentation, tracking, discontinuity preserving smoothing etc. For more details see [2],[8].

Comparison with K-Means

Note : I have discussed K-Means at K-Means Clustering Algorithm. You can use it to brush it up if you want.

K-Means is one of most popular clustering algorithms. It is simple,fast and efficient. We can compare Mean Shift with K-Means on number of parameters.

One of the most important difference is that K-means makes two broad assumptions – the number of clusters is already known and the clusters are shaped spherically (or elliptically). Mean shift , being a non parametric algorithm, does not assume anything about number of clusters. The number of modes give the number of clusters. Also, since it is based on density estimation, it can handle arbitrarily shaped clusters.

K-means is very sensitive to initializations. A wrong initialization can delay convergence or some times even result in wrong clusters. Mean shift is fairly robust to initializations. Typically, mean shift is run for each point or some times points are selected uniformly from the feature space [2] . Similarly, K-means is sensitive to outliers but Mean Shift is not very sensitive.

K-means is fast and has a time complexity O(knT) where k is the number of clusters, n is the number of points and T is the number of iterations. Classic mean shift is computationally expensive with a time complexity O(Tn^{2}).

Mean shift is sensitive to the selection of bandwidth, h. A small h can slow down the convergence. A large h can speed up convergence but might merge two modes. But still, there are many techniques to determine h reasonably well.

Update [30 Apr 2010] : I did not expect this reasonably technical post to become very popular, yet it did ! Some of the people who read it asked for a sample source code. I did write one in Matlab which randomly generates some points according to several gaussian distribution and the clusters using Mean Shift . It implements both the basic algorithm and also the adaptive algorithm. You can download my Mean Shift code here. Comments are as always welcome !

References

1. Fukunaga and Hostetler, "The Estimation of the Gradient of a Density Function, with Applications in Pattern Recognition", IEEE Transactions on Information Theory vol 21 , pp 32-40 ,1975 
2. Dorin Comaniciu and Peter Meer, Mean Shift : A Robust approach towards feature space analysis, IEEE Transactions on Pattern Analysis and Machine Intelligence vol 24 No 5 May 2002. 
3. Yizong Cheng , Mean Shift, Mode Seeking, and Clustering, IEEE Transactions on Pattern Analysis and Machine Intelligence vol 17 No 8 Aug 1995. 
4. Mean Shift Clustering by Konstantinos G. Derpanis 
5. Chris Ding Lectures CSE 6339 Spring 2010. 
6. Dijun Luo’s presentation slides. 
7. cs.nyu.edu/~fergus/teaching/vision/12_segmentation.ppt

8. Dorin Comaniciu, Visvanathan Ramesh and Peter Meer, Kernel-Based Object Tracking, IEEE Transactions on Pattern Analysis and Machine Intelligence vol 25 No 5 May 2003. 
9. Dorin Comaniciu, Visvanathan Ramesh and Peter Meer, The Variable Bandwidth Mean Shift and Data-Driven Scale Selection, ICCV 2001.

發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章