快排思想

快排算法是基於分治策略的排序算法，其基本思想是，對於輸入的數組a[low, high]，按以下三個步驟進行排序。

(1)分解：以a[p]爲基準將a[low: high]劃分爲三段a[low：p-1]，a[p]和a[p+1：high]，使得a[low：p-1]中任何一個元素小於等於a[p]，而a[p+1: high]中任何一個元素大於等於a[p]。

(2)遞歸求解：通過遞歸調用快速排序算法分別對a[low：p-1]和a[p+1：high]進行排序。

(3)合併：由於對a[low：p-1]和a[p+1：high]的排序是就地進行的，所以在a[low：p-1]和a[p+1：high]都已排好序後，不需要執行任何計算，a[low：high]就已經排好序了。

快排基準的選擇

快速排序的運行時間與劃分是否對稱有關。最壞情況下，每次劃分過程產生兩個區域分別包含n-1個元素和1個元素，其時間複雜度會達到O(n^2)。在最好的情況下，每次劃分所取的基準都恰好是中值，即每次劃分都產生兩個大小爲n/2的區域。此時，快排的時間複雜度爲O(nlogn)。所以基準的選擇對快排而言至關重要。快排中基準的選擇方式有以下三種。

(1)固定基準

以下Partition()函數對數組進行劃分時，以元素x = a[low]作爲劃分的基準。

template <class T>
int Partition(T a[], int low, int high)
{
    int i = low, j = high+1;
    T x = a[low];
    while(true)
    {
        while(a[++i] < x && i < high);
        while(a[--j] > x);
        if(i >= j)break;
        Swap(a[i], a[j]);
    }
    a[low] = a[j];
    a[j] = x;
    return j;
}

快排過程一趟：

快排動圖(網上找的動圖，雖然基準選的不一樣，但排序過程還是一樣的)：

如果數組元素是隨機的，劃分過程不產生極端情況，那麼程序的運行時間不會有太大的波動。如果數組元素已經基本有序時，此時的劃分就容易產生最壞的情況，即快速排序變成冒泡排序，時間複雜度爲O(n^2)。

例如：序列[1][2][3][5][4][6]以固定基準進行快排時。

第一趟：[1][2][3][5][4][6]

第二趟：[1][2][3][5][4][6]

第三趟：[1][2][3][5][4][6]

第四趟：[1][2][3][4][5][6]

程序中要用的函數：(1)C++可以使用以下方法產生隨機數，而單純的使用rand()%M產生的是僞隨機數。

#define M 100
srand(time(0));
for(int i = 0; i < M; i++)
    a[i] = rand()%(M);

(2)方法一：獲得程序片段運行的時間：

clock_t start_time = clock();
//程序代碼片段
clock_t end_time = clock();
cout<<"Runing time is:"<<static_cast<double>(end_time-start_time)/CLOCKS_PER_SEC*1000<<"ms"<<endl;

方法二：獲得程序片段運行的時間：

#include<sys/time.h>
int gettimeofday(struct timeval*tv,struct timezone *tz )
struct timeval
{ 
    long tv_sec;/*秒*/ 
    long tv_usec;/*微妙*/
};
struct timezone
{ 
    int tz_minuteswest;
    /*和greenwich時間差*/
    int tz_dsttime; 
};

例如：

int main()
{
    float Time=0;
    struct timeval start;
    struct timeval end;

    gettimeofday(&start,NULL); //gettimeofday(&start,&tz);結果一樣
    //程序片段
    gettimeofday(&end,NULL);

    Time=(end.tv_sec-start.tv_sec)*1000000+(end.tv_usec-start.tv_usec);//微秒
    cout << Time <<endl;
    return 0;
}

完整代碼如下：

#include <iostream>
#include <time.h>
#include <stdlib.h>
#include <cstdlib>
#define M 1000000
using namespace std;

	template <class T>
void Print(T a[], int n, int m)
{
	for(int i = n; i < m; i++)
	{
		cout << "[" << a[i] << "]";
	}
	cout <<endl;
}

	template <class T>
void Swap(T &a, T &b)
{
	T asd;
	asd = a;
	a = b;
	b = asd;
}

	template <class T>
int Partition(T a[], int p, int r)
{
	int i = p, j = r+1;
	T x = a[p];
	while(true)
	{
		while(a[++i] < x && i < r);
		while(a[--j] > x);
		if(i >= j)break;
		Swap(a[i], a[j]);
	}
	a[p] = a[j];
	a[j] = x;
	return j;
}

	template <class T>
void QuickSort(T a[], int p, int r)
{
	if(p < r)
	{
		int q = Partition(a, p, r);
		QuickSort(a, p, q-1);
		QuickSort(a, q+1, r);
	}
}

int a[M] = {0};
int main()
{
	srand(time(0));
	for(int i = 0; i < M; i++)
		//a[i] = i+1;            //設置升序數組
		a[i] = rand()%(M);       //設置隨機數組
		//if(i < M/2-1)          //設置重複數組
		//	a[i] = 5;
		//else
		//	a[i] = 10;
        //在檢驗數據的時候可以用Print()函數將其打印出來
        //Print(a, 0, M);
	clock_t start_time = clock();
	QuickSort(a, 0, M-1);
	clock_t end_time = clock();
	cout<<"Runing time is:"<<static_cast<double>(end_time - start_time)/CLOCKS_PER_SEC*1000<<"ms"<<endl;
        //Print(a, 0, M);
	return 0;
}

p.s：(1)在Codeblocks下處理升序數組時，元素最好設置少一點。設置的太大可能會出現下圖提示：

(2)重複數組中的元素值只有兩個。

(3)隨機數組(較多重複元素)的設置是：a[i] = rand()%(M/100);。

數據如下：

固定基準對升序數組的分割極其糟糕，排序時間特別長，所以只設置了10萬個元素。

(2)隨機基準

在待排數組有序或基本有序的情況下，選擇使用固定基準影響快排的效率。爲了解決數組基本有序的問題，可以採用隨機基準的方式來化解這一問題。算法如下：

int Random(int a[], int low, int high)//在low和high間隨機選擇一元素作爲劃分的基準
{   
    srand(time(0));
    int pivot = rand()%(high - low) + low;
    Swap(a[pivot],a[low]); //把隨機基準位置的元素和low位置元素互換
    return a[low];
}

此時，原來Partition()函數裏的T x = a[low];相應的改爲T x = Random(a, low, high);

雖然使用隨機基準能解決待排數組基本有序的情況，但是由於這種隨機性的存在，對其他情況的數組也會有影響(若數組元素是隨機的，使用固定基準常常優於隨機基準)。隨機數算法(Sherwood算法)能有效的減少升序數組排序所用的時間，數組元素越多，隨機數算法的效果越好。可以試想，上述升序數組中有10萬個元素而且各不相同，那麼在第一次劃分時，基準選的最差的概率就是十萬分之一。當然，選擇最優基準的概率也是十萬分之一，隨機數算法隨機選擇一個元素作爲劃分基準，算法的平均性能較好，從而避免了最壞情況的多次發生。許多算法書中都有介紹隨機數算法，因爲算法對程序的優化程度和下面所講的三數取中方法很接近，所以我只記錄了一種方法的運行時間。

(3)三數取中

由於隨機基準選取的隨機性，使得它並不能很好的適用於所有情況(即使是同一個數組，多次運行的時間也大有不同)。目前，比較好的方法是使用三數取中選取基準。它的思想是：選取數組開頭，中間和結尾的元素，通過比較，選擇中間的值作爲快排的基準。其實可以將這個數字擴展到更大(例如5數取中，7數取中等)。這種方式能很好的解決待排數組基本有序的情況，而且選取的基準沒有隨機性。

例如：序列[1][1][6][5][4][7][7]，三個元素分別是[1]、[5]、[7]，此時選擇[5]作爲基準。

第一趟：[1][1][4][5][6][7][7]

三數取中算法如下：

int NumberOfThree(int arr[],int low,int high)
{
	int mid = low + ((high - low) >> 1);//右移相當於除以2

	if (arr[mid] > arr[high])
	{
		Swap(arr[mid],arr[high]);
	}
	if (arr[low] > arr[high])
	{
		Swap(arr[low],arr[high]);
	}
	if (arr[mid] > arr[low]) 
	{
		Swap(arr[mid],arr[low]);
	}
	//此時，arr[mid] <= arr[low] <= arr[high]
	return arr[low];
}

同理，Partition()函數裏的T x = a[low];相應的改爲T x = NumberOfThree(a, low, high);

數據如下：

三數取中(隨機數算法效果相同)在處理升序數組時有質的飛越，而且處理的還是100萬個元素。

快速排序的優化

優化1：序列長度達到一定大小時，使用插入排序

當快排達到一定深度後，劃分的區間很小時，再使用快排的效率不高。當待排序列的長度達到一定數值後，可以使用插入排序。由《數據結構與算法分析》(Mark Allen Weiness所著)可知，當待排序列長度爲5~20之間，此時使用插入排序能避免一些有害的退化情形。

template <class T>
void QSort(T arr[],int low,int high)
{
    int pivotPos;
    if (high - low + 1 < 10)
    {
        InsertSort(arr,low,high);
        return;
    }
    if(low < high)
    {
        pivotPos = Partition(arr,low,high);
        QSort(arr,low,pivotPos-1);
        QSort(arr,pivotPos+1,high);
    }
}

完整代碼如下：

/*
   次快排代碼採用了  三數取中&插入排序
 */
#include <iostream>
#include <time.h>
#include <stdlib.h>
using namespace std;

#define M 1000000
int NumberOfThree(int arr[],int low,int high);

	template <class T>
void Print(T a[], int n)
{
	for(int i = 0; i < n; i++)
	{
		cout << "[" << a[i] << "]";
	}
	cout <<endl;
}

	template <class T>
void Swap(T &a, T &b)
{
	T asd;
	asd = a;
	a = b;
	b = asd;
}

	template <class T>
int Partition(T a[], int p, int r)
{
	int i = p, j = r+1;
	T x = NumberOfThree(a, p, r);
	while(true)
	{
		while(a[++i] < x && i < r);
		while(a[--j] > x);
		if(i >= j)break;
		Swap(a[i], a[j]);
	}
	a[p] = a[j];
	a[j] = x;
	return j;
}


void InsertSort(int arr[], int m, int n)
{
	int i, j;
	int temp; // 用來存放臨時的變量
	for(i = m+1; i <= n; i++)
	{
		temp = arr[i];
		for(j = i-1; (j >= m)&&(arr[j] > temp); j--)
		{
			arr[j + 1] = arr[j];
		}
		arr[j + 1] = temp;
	}
}

int NumberOfThree(int arr[],int low,int high)
{
	int mid = low + ((high - low) >> 1);

	if (arr[mid] > arr[high])
	{
		Swap(arr[mid],arr[high]);
	}
	if (arr[low] > arr[high])
	{
		Swap(arr[low],arr[high]);
	}
	if (arr[mid] > arr[low]) 
	{
		Swap(arr[mid],arr[low]);
	}
	//此時，arr[mid] <= arr[low] <= arr[high]
	return arr[low];
}

	template <class T>
void QSort(T arr[],int low,int high)
{
	int pivotPos;
	if (high - low + 1 < 10)
	{
		InsertSort(arr,low,high);
		return;
	}
	if(low < high)
	{
		pivotPos = Partition(arr,low,high);
		QSort(arr,low,pivotPos-1);
		QSort(arr,pivotPos+1,high);
	}
}

int a[M] = {0};
int main()
{
	srand(time(0));
	for(int i=0;i<M;i++)
		//a[i] = i+1;              //設置升序數組
		a[i] = rand()%(M);         //設置隨機數組
		//if(i < M/2-1)            //設置重複數組
		//	a[i] = 1;
		//else
		//	a[i] = 10;
	//Print(a, M);
	clock_t start_time = clock();
	QSort(a, 0, M-1);
	clock_t end_time = clock();
	cout<<"Runing time is:"<<static_cast<double>(end_time - start_time)/CLOCKS_PER_SEC*1000<<"ms"<<endl;
	//Print(a, M);
	return 0;
}

數據如下：

如上所述，在劃分到很小的區間時，裏面的元素已經基本有序了，再使用快排，效率就不高了。所以，在結合插入排序後，程序的執行效率有所提高。

優化2：尾遞歸優化

快排算法和大多數分治排序算法一樣，都有兩次遞歸調用。但是快排與歸併排序不同，歸併的遞歸則在函數一開始，快排的遞歸在函數尾部，這就使得快排代碼可以實施尾遞歸優化。使用尾遞歸優化後，可以縮減堆棧的深度，由原來的O(n)縮減爲O(logn)。

尾遞歸概念：

如果一個函數中所有遞歸形式的調用都出現在函數的末尾，當遞歸調用是整個函數體中最後執行的語句且它的返回值不屬於表達式的一部分時，這個遞歸調用就是尾遞歸。尾遞歸函數的特點是在迴歸過程中不用做任何操作，這個特性很重要，因爲大多數現代的編譯器會利用這種特點自動生成優化的代碼。

尾遞歸原理：

當編譯器檢測到一個函數調用是尾遞歸的時候，它就覆蓋當前的活動記錄而不是在棧中去創建一個新的。編譯器可以做到這點，因爲遞歸調用是當前活躍期內最後一條待執行的語句，於是當這個調用返回時棧幀中並沒有其他事情可做，因此也就沒有保存棧幀的必要了。通過覆蓋當前的棧幀而不是在其之上重新添加一個，這樣所使用的棧空間就大大縮減了，這使得實際的運行效率會變得更高。

代碼如下：

#include <bits/stdc++.h>
using namespace std;

int fact(int n)             //線性遞歸
{
    if (n < 0)
        return 0;
    else if(n == 0 || n == 1)
        return 1;
    else
        return n * fact(n - 1);
}

int facttail(int n, int a)   //尾遞歸
{
    if (n < 0)
        return 0;
    else if (n == 0)
        return 1;
    else if (n == 1)
        return a;
    else
        return facttail(n - 1, n * a);
}

int main()
{
    int a = fact(5);
    int b = facttail(5, 1);
    cout << "A:" << a <<endl;
    cout << "B:" << b <<endl;
}

示例中的函數是尾遞歸的，因爲對facttail的單次遞歸調用是函數返回前最後執行的一條語句。在facttail中碰巧最後一條語句也是對facttail的調用，但這並不是必需的。換句話說，在遞歸調用之後還可以有其他的語句執行，只是它們只能在遞歸調用沒有執行時纔可以執行。尾遞歸是極其重要的，不用尾遞歸，函數的堆棧耗用難以估量，需要保存很多中間函數的堆棧。比如f(n, sum) = f(n-1) + value(n) + sum; 會保存n個函數調用堆棧，而使用尾遞歸f(n, sum) = f(n-1, sum+value(n)); 這樣則只保留後一個函數堆棧即可，之前的可優化刪去。

代碼當n=5時，線性遞歸的遞歸過程如下:

fact(5)
{5*fact(4)}
{5*{4*fact(3)}}
{5*{4*{3*fact(2)}}}
{5*{4*{3*{2*fact(1)}}}}
{5*{4*{3*{2*1}}}}
{5*{4*{3*2}}}
{5*{4*6}}
{5*24}
120

而尾遞歸的遞歸過程如下:

facttail(5,1)
facttail(4,5)
facttail(3,20)
facttail(2,60)
facttail(1,120)
120

關於尾遞歸及快排尾遞歸優化可以看這篇博文：尾遞歸及快排尾遞歸優化，其中包含了上述階乘問題、快排尾遞歸優化和Gdb調試等內容。

在Codeblocks裏運行快排代碼處理升序數組，一個進行尾遞歸優化，而另一個不變。沒有使用尾遞歸的代碼處理4萬個數組元素時，由於超過了棧的深度，程序會異常結束。而使用了尾遞歸的代碼，就算處理10萬個數組元素，也不會出現異常(結合三數取中，可以處理100萬個數組元素)。

2018年10月2日補充：結合我的另一篇博文《內存四區》，對上述問題有更全面的認識。

快排尾遞歸代碼如下：

template <class T>
void QSort(T arr[],int low,int high)
{
    int pivotPos;
    if (high - low + 1 < 10)
    {
        InsertSort(arr,low,high);
        return;
    }
    while(low < high)
    {
        pivotPos = Partition(arr,low,high);
        QSort(arr,low,pivotPos-1);
        low = pivotPos + 1;
    }
}

第一次遞歸以後，變量low就沒有用處了，也就是說第二次遞歸可以用迭代控制結構代替。快排尾遞歸過程如下，縱向是遞歸，橫向是迭代。

數據如下：

對遞歸的優化，主要是爲了減少棧深度。在處理隨機數組時，(三數取中+插排+尾遞歸)的組合並不一定比(三數取中+插排)的效率高。

優化3：聚集元素

聚集元素的思想：在一次分割結束後，將與本次基準相等的元素聚集在一起，再分割時，不再對聚集過的元素進行分割。具體過程有兩步，①在劃分過程中將與基準值相等的元素放入數組兩端，②劃分結束後，再將兩端的元素移到基準值周圍。

普通過程例如：[7][2][7][1][7][4][7][6][3][8] 由三數取中可得基準爲[7]

第一趟：[7] [2] [3] [1] [6] [4] [7] [7] [7] [8]

第二趟：[1] [2] [3] [4] [6] [7] [7] [7] [7] [8]

第三趟：[1] [2] [3] [4] [6] [7] [7] [7] [7] [8]

第四趟：[1] [2] [3] [4] [6] [7] [7] [7] [7] [8]

聚集相同元素：

第一步：[7] [7] [7] [1] [2] [4] [3] [6] [7] [8]

第二步：[6] [3] [4] [1] [2] [7] [7] [7] [7] [8]

接下來是對[6] [3] [4] [1] [2] 和 [8]進行快排。(具體過程可以按照以下代碼走一遍)

template <class T>
void QSort(T arr[],int low,int high)
{
	int first = low;
	int last = high;

	int left = low;
	int right = high;

	int leftLen = 0;
	int rightLen = 0;

	if (high - low + 1 < 10)
	{
		InsertSort(arr,low,high);
		return;
	}

	//一次分割
	int key =  NumberOfThree(arr,low,high);//使用三數取中選擇樞軸

	while(low < high)
	{
		while(high > low && arr[high] >= key)
		{
			if (arr[high] == key)//處理相等元素
			{
				Swap(arr[right],arr[high]);
				right--;
				rightLen++;
			}
			high--;
		}
		arr[low] = arr[high];
		while(high > low && arr[low] <= key)
		{
			if (arr[low] == key)
			{
				Swap(arr[left],arr[low]);
				left++;
				leftLen++;
			}
			low++;
		}
		arr[high] = arr[low];
	}
	arr[low] = key;

	//一次快排結束，把與基準相等的元素移到基準周圍
	int i = low - 1;
	int j = first;
	while(j < left && arr[i] != key)
	{
		Swap(arr[i],arr[j]);
		i--;
		j++;
	}
	i = low + 1;
	j = last;
	while(j > right && arr[i] != key)
	{
		Swap(arr[i],arr[j]);
		i++;
		j--;
	}
        QSort(arr,first,low - 1 - leftLen);
        QSort(arr,low + 1 + rightLen,last);
}

聚集元素第一步：

聚集元素第二步：

下一次就是對[6] [3] [4] [1] [2] 進行快排。當劃分區間達到插入排序的要求時，就使用插入排序完成後續工作，所以進入插入排序那一段代碼是停止繼續遞歸的標誌。

數據如下：

從上表中可以看到，通過對快排聚集元素的優化，在處理數組中的重複元素時有很大的提升。而對於升序數組而言，因爲其本身就是有序的，而且沒有重複元素，所以結果沒有(三數取中+插排)效率高。

優化4：多線程處理快排

分治法的基本思想是將一個規模爲n的問題分解爲k個規模較小的子問題，這些子問題互相獨立且與原問題相同。求解這些子問題，然後將各子問題的解合併，從而得到的原問題的解。由此，在處理快排的時候，可以使用多線程提高排序的效率。

要使用的函數：
(1)pthread_create
創建一個線程的函數是pthread_create。其定義如下：

#include <pthread.h>//(Linux下編譯需要加 -lpthread)
int pthread_create(pthread_t* thread, const pthread_attr_t* attr, void* (*start_routine)(void*), void *arg);

第一個參數是一個整數類型，它表示的是資源描述符，實際上，Linux上幾乎所有的資源標識符都是一個整型數。第二個attr參數用於設置新線程的屬性。給它傳遞NULL表示使用默認線程屬性。start_routine和arg參數分別指定新線程將運行的函數及其參數。
pthread_create()成功時返回0，失敗時返回錯誤碼。

(2)pthread_barrier_init
多線程編程時，可以使用這個函數來等待其它線程結束，例如：主線程創建一些線程，這些線程去完成一些工作，而主線程需要去等待這些線程結束(pthread_join也能實現一種屏障)。可以把屏障理解爲：爲了協同線程之間的工作而使得某一具體線程進入等待狀態的一種機制。其原型：

int pthread_barrier_init(pthread_barrier_t *restrict barrier, const pthread_barrierattr_t *restrict attr, unsigned int count);

函數執行成功返回 0，執行失敗則返回一個錯誤號，我們可以通過該錯誤號獲取相關的錯誤信息。
第一個參數：一個指向pthread_barrier_t 類型的指針，我們必須要指出的是pthread_barrier_init函數不會給指針分配相關內存空間，因此我們傳入的指針必須爲一個pthread_barrier_t 變量。
第二個參數：用於指定屏障的細節參數，我們這裏可以暫且不去管它，如果我們傳入NULL，那麼系統將按照默認情況處理。
第三個參數：設計屏障等待的最大線程數目。

(3)pthread_barrier_wait
當一個線程需要等待其它多個線程結束時，調用該函數。
原型：

int pthread_barrier_wait(pthread_barrier_t *barrier);

函數執行成功返回 0，執行失敗則返回一個錯誤碼，我們可以通過該錯誤碼獲取相關的錯誤信息。
函數參數：指向pthread_barrier_t 變量的指針。
注意：使用barrier這個屏障，無法獲取線程的結束狀態。若想要獲取相關線程結束狀態，則需要調用pthread_join函數。

代碼如下：

#include <cstdio> /*三數取中+插入+聚集元素+多線程組合  &&  三數取中+插入+尾遞歸+多線程組合*/
#include <iostream>
#include <stdlib.h>
#include <sys/time.h>
#include <pthread.h>
using namespace std;

const long MAX = 1000000L;                          //數組中最大數
const long long MaxNumber = 1000000L;               //排序數
const int thread = 4;                               //線程數
const long NumberOfSort = MaxNumber / thread;       //每個線程排序的個數

int array_a[MaxNumber];                         
int array_b[MaxNumber];                             //合併後，由b數組記錄最終序列
 
pthread_barrier_t barrier;

void initial()   //數組初始化函數
{
		srand(time(0));
		for(int i = 0; i < MaxNumber; ++i)
                array_a[i] = rand()%(MAX);
		//if(i < MaxNumber/2)
		//    array_a[i] = 5;
		//else
		//   array_a[i] = 10;
		//array_a[i] = i+1;
}


		template <class T>
void Print(T a[], int n)
{
		for(int i = 0; i < n; i++)
		{
				cout << "[" << a[i] << "]";
		}
		cout <<endl;
}

		template <class T>
void Swap(T &a, T &b)
{
		T asd;
		asd = a;
		a = b;
		b = asd;
}

void InsertSort(int arr[],int start,int end)
{
	int low,high,median,tmp;
	for(int i = start+1;i<= end;i++)
	{
		low = start;
		high = i-1;

		while(low <= high)
		{
			median = (low + high) /2;
			if(arr[i] < arr[median])
			{
				high = median -1;
			}else
			{
				low = median + 1;
			}
		}

		tmp = arr[i];

		for(int j = i-1;j>high;j--)
		{
			arr[j+1] = arr[j];
		}
		arr[high+1] = tmp;
	}
}

int NumberOfThree(int arr[],int low,int high)
{
		int mid = low + ((high - low) >> 1);
		if (arr[mid] > arr[high])
		{
				Swap(arr[mid],arr[high]);
		}
		if (arr[low] > arr[high])
		{
				Swap(arr[low],arr[high]);
		}
		if (arr[mid] > arr[low])
		{
				Swap(arr[mid],arr[low]);
		}
		return arr[low];
}

		template <class T>
int Partition(T a[], int p, int r)
{
		int i = p, j = r+1;
		T x = NumberOfThree(a, p, r);
		while(true)
		{
				while(a[++i] < x && i < r);
				while(a[--j] > x);
				if(i >= j)break;
				Swap(a[i], a[j]);
		}
		a[p] = a[j];
		a[j] = x;
		return j;
}

#if 1   //有聚集元素的快排
		template <class T>
void QSort(T arr[],int low,int high)
{
		int first = low;
		int last = high;
		int left = low;
		int right = high;
		int leftLen = 0;
		int rightLen = 0;

		if (high - low + 1 < 6)
		{
				InsertSort(arr,low,high);
				return;
		}

		//一次分割
		int key = NumberOfThree(arr,low,high);//使用三數取中法選擇樞軸

		while(low < high)
		{
				while(high > low && arr[high] >= key)
				{
						if (arr[high] == key)//處理相等元素
						{
								Swap(arr[right],arr[high]);
								right--;
								rightLen++;
						}
						high--;
				}
				arr[low] = arr[high];
				while(high > low && arr[low] <= key)
				{
						if (arr[low] == key)
						{
								Swap(arr[left],arr[low]);
								left++;
								leftLen++;
						}
						low++;
				}
				arr[high] = arr[low];
		}
		arr[low] = key;

		//一次快排結束
		//把與樞軸key相同的元素移到樞軸最終位置周圍
		int i = low - 1;
		int j = first;
		while(j < left && arr[i] != key)
		{
				Swap(arr[i],arr[j]);
				i--;
				j++;
		}
		i = low + 1;
		j = last;
		while(j > right && arr[i] != key)
		{
				Swap(arr[i],arr[j]);
				i++;
				j--;
		}
		QSort(arr,first,low - 1 - leftLen);
		QSort(arr,low + 1 + rightLen,last);
}
#endif

#if 0     //沒有聚集元素的快排
		template <class T>
void QSort(T arr[],int low,int high)
{
		int pivotPos;
		if (high - low + 1 < 10)
		{
				InsertSort(arr,low,high);
				return;
		}

		while(low < high)
		{
				pivotPos = Partition(arr,low,high);
				QSort(arr,low,pivotPos-1);
				low = pivotPos + 1;
		}
}
#endif // 1

void* work(void *arg)  //線程排序函數
{
		long length = (long)arg;
		QSort(array_a, length, length + NumberOfSort - 1);
		pthread_barrier_wait(&barrier);
		pthread_exit(NULL);
}

void meger()        //最終合併函數
{
		long index[thread];
		for (int i = 0; i < thread; ++i)
		{
				index[i] = i * NumberOfSort;
		}

		for(long i = 0; i < MaxNumber; ++i)
		{
				long min_index;
				long min_num = MAX;
				for(int j = 0; j < thread; ++j)
				{
				    if((index[j] < (j + 1) * NumberOfSort)&& (array_a[index[j]] < min_num))
						{
								min_index = j;
								min_num = array_a[index[j]];
						}
				}
				array_b[i] = array_a[index[min_index]];
				index[min_index]++;
		}
}


int main(int argc, char const *argv[])
{
		initial();
		//Print(array_a,MaxNumber);
		struct timeval start, end;
		pthread_t ptid;
		gettimeofday(&start, NULL);

		pthread_barrier_init(&barrier, NULL, thread + 1);
		for(int i = 0; i < thread; ++i)
				pthread_create(&ptid, NULL, work, (void *)(i * NumberOfSort));

		pthread_barrier_wait(&barrier);
		meger();

		gettimeofday(&end, NULL);
		long long s_usec = start.tv_sec * 1000000 + start.tv_usec;
		long long e_usec = end.tv_sec * 1000000 + end.tv_usec;

		double Time = (double)(e_usec - s_usec) / 1000.0;
		printf("sort use %.4f ms\n", Time);
		//Print(array_b, MaxNumber);
		return 0;
}

上傳完這段代碼，同學告訴我說，這段代碼在Linux和Codeblocks裏運行的時間不一樣(本篇博文的數據都是在Codeblocks上測得的)。然後我立馬就測試了一下，發現這之間存在誤差，初步猜測是由於編譯器引起的。由於我不是雙系統，是在虛擬機上運行的Linux系統，這可能是造成誤差原因之一(個人認爲可以忽略誤差，雖然每組數據在不同環境下平均運行時間有差距，但其整體優化的方向是不變的)。

數據如下：

從上表可以看出，結合了多線程的快排(三數+插排+多線程)在處理前三種數組時都有明顯的提升。重複數組處理時間增加的原因是：聚集元素在處理重複數組時的表現已經很好了，因爲在多線程的組合中，各個線程排完序後要合併，所以增加了(三數+插排+多線程)這一組合的排序時間。因爲時間原因，以上的數據，是運行相應代碼10次所取得平均值。如果想要得到更精確的數據，需要大量的運行上述代碼(即使存在一些不穩定的數據，也不會影響到代碼優化的方向)。PS.以上程序運行時間還與個人所使用的電腦配置有關。

參考：

http://blog.csdn.net/zuiaituantuan/article/details/5978009

https://blog.csdn.net/hacker00011000/article/details/52176100

https://baike.baidu.com/item/%E5%B0%BE%E9%80%92%E5%BD%92/554682

https://blog.csdn.net/qq_25425023/article/details/72705285

快速排序的4種優化

快排思想

快排基準的選擇

(1)固定基準

(2)隨機基準

(3)三數取中

快速排序的優化

優化1：序列長度達到一定大小時，使用插入排序

優化2：尾遞歸優化

優化3：聚集元素

優化4：多線程處理快排

工作中用到的腳本合集

24-5-18 X

計算機網絡：子網劃分、子網掩碼、CIDR 、路由聚合相關計算詳解

網絡安全：與 TCP 連接管理相關的網絡攻擊

進程間通信：消息隊列概念及代碼

尾遞歸及快排尾遞歸優化

C++ map 使用詳解(含C++20新特性)

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結