Viola-Jones人臉檢測--AdaptBoost特徵選擇

Viola-Jones人臉檢測算法的偉大之處不不僅僅在於其算法的實時效果，更重要的是其提出瞭解決目標檢測這一類問題的一種通用思路。該算法有兩個亮點，一個是積分圖技術，一個是Cascade訓練模型，一經提出便引起了極大關注，在很多優秀的論文中都能看到他們的身影。如TLD算法中Detector部分，以及BING objectness訓練時的兩層SVM模型等，很難說這沒有受到Viola-Jones算法的影響。下面就來介紹構成Cascade模型的其中的一個基本元素AdaptBoost吧。

AdaptBoost並不是Viola-Jones的原創算法，它是機器學習領域的產物，屬於Ensemble Learning中boosting的類別。Ensemble類的學習算法分爲bagging和boosting兩個類別，都是基於弱分類器構造強分類器的思想，其中bagging的代表算法是RandomForests，boosting的代表算法是AdaptBoost。這裏推薦一篇論文，介紹AdaptBoost算法理論的，《A Brief Introduction to Boosting》。

本着分享交流的目的，下面的內容包括對AdaptBoost算法的理論介紹及給出用標準C++實現AdaptBoost的代碼。對於不想依賴特定庫的夥伴們來說，標準C++的這個版本是個不錯的選擇。如果有什麼不正確的地方，請多多指教。

1.AdaptBoost原理

我們知道對於一個給定窗口大小的圖像，其Harr特徵的維度是很高的，如果用直接用對訓練樣本計算出的Harr特徵來訓練分類器這是不太可行的，我們需要對高維的Harr特徵進行選擇，選擇部分來進行分類器的訓練。而AdaptBoost恰好就符合這樣的思想，其基本思想是由弱分類器構造強分類器，用弱分類器的聯合分類結果作爲強分類器的結果。AdaptBoost的弱分類器可以是一個stump，也就是樹樁的意思，就是一個弱分類器是一個二分類樹。在衆多維的Harr特徵中進行特徵選擇的方法是，要求選擇一個特徵，及選擇一個該特徵下用於二分類的閾值，如果在該特徵和閾值下對訓練樣本的分類誤差最小，就以該特徵和其二分類閾值作爲一個訓練好的弱分類器，算法的具體實現可以參看實現部分的bestStump（）接口。在每一次爲弱分類器選擇特徵完成後，對於用於訓練的樣本的分佈（也就是各樣本的權重，初始值一般是相等的，都是1/N，N爲樣本個數）進行更新，每次的更新是由上一次的弱分類器的分類結果確定的，對於上一次弱分類器判斷錯誤的樣本，其權重會增大，判斷正確的樣本其權重會減小。AdaptBoost與RandomForest的一個區別是，在計算強分類器的結果時，AdaptBoost的弱分類器的權重是不一樣的，而RandomForest的弱分類器的權重是相等的。

AdaptBoost算法的僞代碼描述如下：

2.標準C++實現

下面的這個接口部分，包含train的接口不包含test的部分，你可以在這個基礎上增加test的接口部分。

#ifndef _ADAPTBOOST_H_
#define _ADAPTBOOST_H_
#include <vector>
#include <utility>
#include <cmath>
using namespace std;


/**
 * @brief decision stump declaration
 *
 * @param featureIndex
 * @param weightedError achieved weighted error
 * @param threshold
 * @param margin achieved margin
 * @param toggle +1 or -1
 */
struct StumpRule{
	int featureIndex;
	long double weightedError;
	double threshold;
	float margin;
	int toggle;
};


/**
　* @brief what's inside AdaptBoost
　*
　* @param nPositives number of positive examples
　* @param nNegatives number of negative examples
　* @param initialPositiveWeight how much weight we give to positives at the outset
　* @param ascendingFeatures for each feature, we have (float feature value, int exampleIndex)
　*
　* @param sampleCount nPositives + nNegatives
　* @param inTrain is this a training set or a validation set
　* @param exponentialRisk exponential risk for training set
　* @param positiveTotalWeight total weight received by positive examples currently 
　* @param negativeTotalWeight total weight received by negative examples currently
　* @param minWeight minimum weight among all weights currently
　* @param maxWeight maximum weight among all weights currently
　* @param weights weight vector for all examples involved
　* @param labels are they positive or negative examples
　* @param featureCount how many features are there
　* @param committee what's the learned committee
　*/
class AdaptBoost{
private:
	
	int nPositives;
	int nNegatives;
	long double initialPositiveWeight;
	vector< vector<pair<float, int>> > ascendingFeatures;

	int sampleCount;
	int featureCount;
	long double positiveTotalWeight;
	long double negativeTotalWeight;
	long double minWeight;
	long double maxWeight;
	long double exponentialRisk;
	vector<double> weights;
	vector<int> labels;
	vector<StumpRule> committee;

	/**
	 * @brief prevent copy and assignment 
	 */
	AdaptBoost(const AdaptBoost&);
	AdaptBoost operator=(const AdaptBoost&);

protected:
	/**
	 * @brief return for an element pointed by iterator and featureIndex its exampleIndex
	 */
	int getTrainingExampleIndex(int featureIndex, int iterator);

	/**
	 * @brief return for an element pointed by iterator and featureIndex its example value
	 */
	float getTrainingExampleFeature(int featureIndex, int iterator);

	/**
	 * @brief sort each featrue from different samples
	 */
	void sortFeatures(
		vector< vector<pair<float, int>> >& features
	);

	/**
	 * @brief best stump given a feature
	 */
	void decisionStump(
		int featureIndex
	, 	StumpRule & best
	);

	/**
	 * @brief best stump among all features
	 */
	StumpRule bestStump();

public:
	/**
	 * @brief constructor
	 * @param nPositives number of positives for training examples
	 * @param nNegatives number of negatives for training examples
	 * @param initialPositiveWeight initial weight of positives
	 * @param data for training examples, positves front and negatives back
	 */
	AdaptBoost(
		int nPositives
	,	int nNegatives
	,	long double initialPositiveWeight
	,	const vector< vector<float> >& data
	);

	/**
	 * @brief destructor
	 */
	~AdaptBoost();

	/**
	 * @brief perform one round of adaboost
	 */
	void oneRoundOfAdaboostTraining();

	/**
	 * @brief get committee adaptboost trained
	 */
	vector<StumpRule> getCommittee() {
		return committee;
	}

	/**
	 * @brief get committee size
	 */
	int getCommitteeSize() {
		return committee.size();
	}
	
	/**
	 * @brief given the number of weak classifiers train for a committee
	 * @param numOfWeakClassifier for number of weak classifiers of adapt boost
	 */
	void adaptBoostTraining(int numOfWeakClassifier);
	
	/**
	 * @brief evaluate how the committee fares on a training dataset
	 *
	 * @param tweak for predictLableOfTrainingExamples
	 * @return falsePositive
	 * @return detectionRate
	 * @vector<int> return a blackList,if element of balckList is 0, then it means that
	 *  this sample could be used again otherwise it means not usable
	 */
	vector<int> calcEmpiricalErrorInAdaBoostTraining(
		float tweak
	,	float & falsePositive
	,	float & detectionRate
	);
	
	/**
	 * @brief given a tweak and a committe, what prediction do you make as to the training examples
	 *
	 * @param thresholdTweak tweak
	 * @return prediction
	 * @param onlyMostRecent use all the committee or its most recent member (a weak learner)
	 */
	void predictLabelOfTrainingExamples(
		float tweakThreshold
	, 	vector<int> & prediction
	, 	bool onlyMostRecent=false
	);

};


#endif

#include <cassert>
#include <algorithm>
#include <iostream>
#include <iomanip>
#include "VJAdaptBoost.h"

using namespace std;
#define VERBOSE true

//fail and messaging
static void fail(const char* message){
	cerr << "Error:" <<  message << endl;
	exit(EXIT_FAILURE);
}
//order definition for this type of pairs
//compare only the feature values
static bool myPairOrder(
	const pair<float, int>& one
,	const pair<float, int>& other
){
	return one.first < other.first;
}
//why is one stump better than the other
static bool myStumpOrder(
	const StumpRule & one
,	const StumpRule & other
){
	if(one.weightedError < other.weightedError)
		return true;
	if(one.weightedError == other.weightedError && one.margin > other.margin)
		return true;
	return false;
}

int AdaptBoost::getTrainingExampleIndex(int featureIndex, int iterator){
	assert(ascendingFeatures.size() > 0 && ascendingFeatures[0].size() >0);
			
	return ascendingFeatures[featureIndex][iterator].second;
}

float AdaptBoost::getTrainingExampleFeature(int featureIndex, int iterator){
	assert(ascendingFeatures.size() > 0 && ascendingFeatures[0].size() >0);

	if(_isnan(ascendingFeatures[featureIndex][iterator].first)){
		cerr<<"ERROR: nan feature "<<featureIndex<<" detected for example "<<getTrainingExampleIndex(featureIndex, iterator)<<endl;
		exit(EXIT_FAILURE);
	}
	return ascendingFeatures[featureIndex][iterator].first;
}

//constructor
AdaptBoost::AdaptBoost(
	int positives
,	int negatives
,	long double positiveWeight
,	const vector< vector<float> >& data) {
	assert(positives > 0 && negatives > 0);
	assert(positiveWeight > 0 && positiveWeight < 1);
	assert(data.size() > 0 && data[0].size() > 0 );
	assert(data.size() == (positives + negatives));

	//add number of data info to features
	vector< vector<pair<float, int>> > features(data.size(), vector<pair<float, int>>(data[0].size(), pair<float, int>(0,0)));
	for(int i=0; i<features.size(); i++) {
		for(int j=0; j<features[0].size(); j++) {
			features[i][j] = pair<float, int>(data[i][j], i);
		}
	}

	//initialize the class attributes for the training set
	nPositives = positives;
	nNegatives = negatives;
	initialPositiveWeight = positiveWeight;
	sortFeatures(features);//initialize ascendingFeatures

	sampleCount = positives + negatives;
	featureCount = ascendingFeatures.size();
	positiveTotalWeight = positiveWeight;
	negativeTotalWeight = 1 - positiveWeight;
	long double posAverageWeight = positiveTotalWeight/(long double)nPositives;
	long double negAverageWeight = negativeTotalWeight/(long double)nNegatives;
	maxWeight = max(posAverageWeight, negAverageWeight);
	minWeight = min(posAverageWeight, negAverageWeight);
	exponentialRisk = 1;

	//set weights for each example
	for(int exampleIndex = 0; exampleIndex < sampleCount; exampleIndex++){
		weights.push_back(exampleIndex < nPositives ? posAverageWeight : negAverageWeight);
		labels.push_back(exampleIndex < nPositives ? 1 : -1);
	}

}

//destructor
AdaptBoost::~AdaptBoost() {

}

//adaptBoost interface for training
void AdaptBoost::adaptBoostTraining(int numOfWeakClassifier) {
	assert(numOfWeakClassifier > 0);
	for(int i=0; i<numOfWeakClassifier; i++) {
		oneRoundOfAdaboostTraining();
	}
}

//validation procedure using training examples
vector<int> AdaptBoost::calcEmpiricalErrorInAdaBoostTraining(
	float tweak
,	float & falsePositive
,	float & detectionRate
){
	vector<int> blackList;
	blackList.resize(nPositives, 0);
	blackList.resize(nPositives+nNegatives, 1);

	int nFalsePositive = 0;
	int nFalseNegative = 0;
	
	//initially let all be positive
	vector<int> prediction;
	prediction.resize(sampleCount,0);
	predictLabelOfTrainingExamples(tweak, prediction, false);

	//evaluate prediction errors
	vector<int> agree(sampleCount);
	for(int i=0; i<sampleCount; i++) {
		agree[i] = labels[i]*prediction[i];
	}
	for(int exampleIndex = 0; exampleIndex < sampleCount; exampleIndex++){
		if(agree[exampleIndex] < 0)     {
			if(exampleIndex < nPositives){
				nFalseNegative += 1;
				blackList[exampleIndex] = 1;
			}else{
				nFalsePositive += 1;
				blackList[exampleIndex] = 0;
			}
		}
	}

	//set the returned values
	falsePositive = nFalsePositive/(float)nNegatives;
	detectionRate = 1 - nFalseNegative/(float)nPositives;

	return blackList;
}

//given a tweak and a committe, what prediction does it make as to the training examples
void AdaptBoost::predictLabelOfTrainingExamples(
	float tweakThreshold
,	vector<int> & prediction
,	bool onlyMostRecent
){
	int committeeSize = committee.size();
	//no need to weigh a single member's decision
	onlyMostRecent = committeeSize == 1 ? true : onlyMostRecent;
	int start = onlyMostRecent ? committeeSize - 1 : 0;
	//double to be more precise
	vector<vector<double>> memberVerdict;
	for(int i=0; i<committeeSize; i++) {//initialize memberVerdict
		vector<double> row(sampleCount);
		memberVerdict.push_back(row);
	}
	vector<double> memberWeight(committeeSize);
	//members, go ahead
	for(int member = start; member < committeeSize; member++){
		//sanity check
		if(committee[member].weightedError == 0 && member != 0)
			fail("Boosting Error Occured!");
		//0.5 does not count here
		//if member's weightedError is zero, member weight is nan, but it won't be used anyway
		memberWeight[member] = log(1./committee[member].weightedError -1);
		int feature = committee[member].featureIndex;
		#pragma omp parallel for schedule(static)
		for(int iterator = 0; iterator < sampleCount; iterator++){
			int exampleIndex = getTrainingExampleIndex(feature, iterator);
			memberVerdict[member][exampleIndex] = (getTrainingExampleFeature(feature, iterator) >
				committee[member].threshold ? 1 : -1)*committee[member].toggle + tweakThreshold;
		}
	}
	//joint session
	if(!onlyMostRecent){
		vector<double> finalVerdict(sampleCount);
		for(int i=0; i<sampleCount; i++) {
			double predict = 0;
			for(int j=0; j<committeeSize; j++) {
				predict += (memberWeight[j] * memberVerdict[j][i]);
			}
			finalVerdict[i] = predict;
		}
		for(int exampleIndex = 0; exampleIndex < sampleCount; exampleIndex++)
			prediction[exampleIndex] = finalVerdict[exampleIndex] > 0 ? 1 : -1;
	}else{
		for(int exampleIndex = 0; exampleIndex < sampleCount; exampleIndex++)
			prediction[exampleIndex] = memberVerdict[start][exampleIndex] > 0 ? 1 : -1;
	}
}

void AdaptBoost::oneRoundOfAdaboostTraining(){
	//try to be friendly here
	static int trainPhase = 0;
	if(VERBOSE && trainPhase == 0){
		cout << "\n#############################ADABOOST MESSAGE EXPLAINED####################################################\n\n";
		cout << "INFO: Adaboost starts. Exponential Risk is expected to go down steadily and strictly," << endl;
		cout << "INFO: and Exponential Risk should bound the (weighted) Empirical Error from above." << endl;
		cout << "INFO: Train Phase is the current boosting iteration." << endl;
		cout << "INFO: Best Feature is the most discriminative feature selected by decision stump at this iteration." << endl;
		cout << "INFO: Threshold and Toggle are two parameters that define a real valued decision stump.\n" << endl;
	}
	trainPhase++;

	//get and store the rule
	StumpRule rule = bestStump();
	committee.push_back(rule);

	//how it fares
	vector<int> prediction(sampleCount);
	predictLabelOfTrainingExamples(
		0
	,	prediction
	,	/*onlyMostRecent*/ true);
	vector<bool> agree(sampleCount);
	for(int i=0; i<sampleCount; i++) {
		if(prediction[i] == labels[i]) {
			agree[i] = true;
		}else {
			agree[i] = false;
		}
	}

	//update weights
	vector<double> weightUpdate;
	weightUpdate.resize(sampleCount,1);
	bool errorFlag = false;
	for(int exampleIndex = 0; exampleIndex < sampleCount; exampleIndex++){
		//more weight for a difficult example
		if(!agree[exampleIndex]){
			weightUpdate[exampleIndex] = 1/rule.weightedError - 1;
			errorFlag = true;
		}
	}

	//update weights only if there is an error
	if(errorFlag){
		double weightSum = 0;
		for(int i=0; i<sampleCount; i++) {
			weights[i] *= weightUpdate[i];
			weightSum += weights[i];
		}
		for(int i=0; i<sampleCount; i++) {
			weights[i] /= weightSum;
		}

		double posTotalWeight = 0;
		for(int i=0; i<nPositives; i++) {
			posTotalWeight += weights[i];
		}
		positiveTotalWeight = posTotalWeight;
		negativeTotalWeight = 1-positiveTotalWeight;

		double min,max;
		min = max = weights[0];
		for(int i=0; i<sampleCount; i++) {
			if(weights[i] < min) {
				min = weights[i];
			}else if(weights[i] > max) {
				max = weights[i];
			}
		}
		minWeight = min;
		maxWeight = max;
	}

	//exponentialRisk can be zero at the first boosting
	exponentialRisk *= 2*sqrt((1-rule.weightedError)*rule.weightedError);

	//print some statistics
	if(VERBOSE){
		float tweak = 0;
		float falsePositive = 0;
		float detectionRate = 0;
		calcEmpiricalErrorInAdaBoostTraining(tweak, falsePositive, detectionRate);
		float empError = static_cast<float>(falsePositive*(1-initialPositiveWeight)+initialPositiveWeight*(1-detectionRate));
		cout << "Training Performance Explanation (before threshold tweaking): falsePositive " << falsePositive 
			 << " detectionRate " << detectionRate << endl;
		cout <<"###########################################################################################################\n";
		cout << "\nTrain Phase " << trainPhase << endl << endl;
//		whatFeature(rule.featureIndex);
		cout << "\tExponential Risk " << setw(12) << exponentialRisk << setw(19) << "Weighted Error " 
			 << setw(11) << rule.weightedError << setw(14) << "Threshold " << setw(10) << rule.threshold 
			 << setw(13) <<"Toggle " << setw(12) << rule.toggle <<  endl;
		cout << "\tPositive Weight" << setw(14) << positiveTotalWeight << setw(14) << "MinWeight " 
			 << setw(16) << minWeight << setw(14) << "MaxWeight " << setw(10) << maxWeight << setw(22) 
			 << "Empirical Error " << setw(10) << empError << endl << endl;
	}
}

//get a feature from features and put them in ascending order
//and record at the same time the permuted example order
void AdaptBoost::sortFeatures(vector< vector<pair<float, int>> >& features) {
	assert(features.size()!=0 && features[0].size() !=0 );

	for(unsigned int i=0; i<features[0].size(); i++) {
		vector<pair<float, int>> temp = vector<pair<float, int>>();
		for(unsigned int j=0; j<features.size(); j++) {
			temp.push_back(features[j][i]);
		}
		//sort
		sort(temp.begin(), temp.end(), myPairOrder);
		ascendingFeatures.push_back(temp);
	}

}


//base learner is a stump, a decision tree of depth 1
//decisionStump has to look at feature and return rule
void AdaptBoost::decisionStump(
	int featureIndex
,	StumpRule & best
){
	//a stump is determined by threshold and toggle, the other two attributes measures its performance
	//initialize with some crazy values
	best.featureIndex = featureIndex;
	best.weightedError = 2;
	best.threshold = getTrainingExampleFeature(featureIndex, 0) - 1;
	best.margin = -1;
	best.toggle = 0;

	StumpRule current = best;

	//error_p and error_n allow to set the best toggle
	long double error_p, error_n;

	//initialize: r denotes right hand side and l left hand side
	//convention: in TrainExamples nPositives positive samples are followed by negatives samples
	long double rPositiveWeight = positiveTotalWeight;
	long double rNegativeWeight = negativeTotalWeight;
	//yes, nothing to the left of the sample with the smallest feature
	long double lPositiveWeight = 0;
	long double lNegativeWeight = 0;

	//go through all these observations one after another
	int iterator = -1;

	//to build a decision stump, you need a toggle and an admissible threshold
	//which doesn't coincide with any of the observations
	while(true){

		//We've got a threshold. So determine the best toggle based on two types of error
		//toggle = 1, positive prediction if and only if the observed feature > the threshold
		//toggle = -1, positive prediction if and only if the observed feature < the threshold

		//error_p denotes the error introduced by toggle = 1, error_n the error by toggle = -1
		error_p = rNegativeWeight + lPositiveWeight;
		error_n = rPositiveWeight + lNegativeWeight;
		current.toggle = error_p < error_n ? 1 : -1;

		//sometimes shit happens, prevent error from being negative
		long double smallerError = min(error_p, error_n);
		//this prevents some spurious nonzero: for currentError must be at least equal to minWeight
		current.weightedError = smallerError < minWeight * 0.9 ? 0 : smallerError;

		//update if necessary
		if(myStumpOrder(current, best))
			best = current;

		//move on
		iterator++;

		//we don't actually need to look at the sample with the largest feature
		//because its rule is exactly equivalent to those produced
		//by the sample with the smallest feature on training observations
		//but it won't do any harm anyway
		if(iterator == sampleCount)
			break;

		//handle duplicates, update lr weights and find a new threshold
		while(true){

			//take this guy's attributes
			int exampleIndex = getTrainingExampleIndex(featureIndex, iterator);
			int label = labels[exampleIndex];
			long double weight = weights[exampleIndex];

			//update weights
			if(label < 0){
				lNegativeWeight += weight;
				rNegativeWeight -= weight;
			}else{
				lPositiveWeight += weight;
				rPositiveWeight -= weight;
			}

			//if a new threshold can be found, break
			//two cases are possible: either it is the last observation
			if(iterator == sampleCount - 1)
				break;
			//or no duplicate. If there is a duplicate, repeat
			if(getTrainingExampleFeature(featureIndex, iterator) != getTrainingExampleFeature(featureIndex, iterator + 1)){
				double test = ((double)getTrainingExampleFeature(featureIndex, iterator) 
					+ (double)getTrainingExampleFeature(featureIndex, iterator + 1))/2;
				//well that's a bit frustrating: I want to keep float because of memory constraint, but apparently
				//features are so close, sometimes, numerical precision arises as an unexpected problem, so I decide
				//to use a double threshold so as to separate float features
				if(getTrainingExampleFeature(featureIndex, iterator) < test && test < getTrainingExampleFeature(featureIndex, iterator + 1))
					break;
				else{
					#pragma omp critical
					{
						cout << "ERROR: numerical precision breached: problem feature values " 
							 << getTrainingExampleFeature(featureIndex, iterator) 
							 << " : " << getTrainingExampleFeature(featureIndex, iterator+1) 
							 << ". Problem feature " << featureIndex << " and problem example " 
							 << getTrainingExampleIndex(featureIndex, iterator) << " : " 
							 << getTrainingExampleIndex(featureIndex, iterator+1) << endl;
					}
					fail("fail to find a suitable threshold.");
				}
			}
			iterator++;
		}

		//update threshold
		if(iterator < sampleCount - 1){
			current.threshold = ((double)getTrainingExampleFeature(featureIndex, iterator) 
				+ (double)getTrainingExampleFeature(featureIndex, iterator + 1))/2;
			current.margin = getTrainingExampleFeature(featureIndex, iterator + 1) - getTrainingExampleFeature(featureIndex, iterator);
		}else{
			//slightly to the right of the biggest observation
			current.threshold = getTrainingExampleFeature(featureIndex, iterator) + 1;
			current.margin = 0;
		}
	}

}

//implement the feature selection's outer loop
//return the most discriminative feature and its rule
StumpRule AdaptBoost::bestStump(
){
	vector<StumpRule> candidates;
	candidates.resize(featureCount);
	#pragma omp parallel for schedule(static)
	for(int featureIndex = 0; featureIndex < featureCount; featureIndex++)
		decisionStump(featureIndex, candidates[featureIndex]);

	//loop over all the features
	//the best rule has the smallest weighted error and the largest margin
	StumpRule best = candidates[0];
	for(int featureIndex = 1; featureIndex < featureCount; featureIndex++){
		if(myStumpOrder(candidates[featureIndex], best))
			best = candidates[featureIndex];
	}

	//if shit happens, tell me
	if( best.weightedError >= 0.5 )
		fail("Decision Stump failed: base error >= 0.5");

	//return
	return best;
}

reference：

Yi-Qing Wang, An Analysis of the Viola-Jones Face Detection Algorithm, IPOL.

Viola-Jones人臉檢測--AdaptBoost特徵選擇

美團一面：項目中有 10000 個 if else 如何優化？想了半天，被問懵了！

京東面試：如何進行JVM調優？

Python 將PowerPoint (PPT/PPTX) 轉爲HTML

SQL優化-20231016

機器學習

人工神經網絡

Cartographer理論及實現淺析

機器學習

機器學習算法Review之分類

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結