基本算法（algorithms）

Intel TBB提供的大多數並行算法支持泛型。但是這些受支持的類型必須實現必要的概念方法。並行算法可以嵌套，

例如，一個parallel_for的內部可以調用另一個parallel_for。目前版本的TBB（4.0）提供的基本算法如下所示：

parallel_for

parallel_reduce

parallel_scan

parallel_do

管道(pipeline、parallel_pipeline)

parallel_sort

parallel_invoke

parallel_for

l 摘要

parallel_for是在一個值域執行並行迭代操作的模板函數。

l 語法

template<typenameIndex, typename Func>

Funcparallel_for( Index first, Index_type last, const Func& f

[, task_group_context&group] );

template<typenameIndex, typename Func>

Funcparallel_for( Index first, Index_type last,

Index step, const Func&f

[, task_group_context&group] );

template<typenameRange, typename Body>

voidparallel_for( const Range& range, const Body& body,

[, partitioner[,task_group_context& group]] );

l 頭文件

#include “tbb/parallel_for.h”

l 描述

parallel_for(first, last,step, f)表示一個循環的並行執行：

for(auto i= first; i<last; i+=step) f(i);

注意以下幾點：

1、索引類型必須是整形

2、循環不能迴環

3、步長（step）必須爲正，如果省略了，隱指爲1

4、並沒有保證迭代操作以並行方式進行

5、較小的迭代等待更大的迭代可能會發生死鎖

6、分割策略總是auto_partitioner

parallel_for(range, body, partitioner)提供了並行迭代的泛型形式。它表示在區域的每個值，並行執行

body。partitioner選項指定了分割策略。Range類型必須符合Range概念模型。body必須符合下表的要求：

原型	語義
Body::Body(const Body&)	拷貝構造
Body::~Body()	析構
void Body::operator()(Range& range) const	對range對象應用body對象

採用最後一個模板以及stl中的vector容器改寫1-1

例：1-2

#include <iostream>
#include <vector>
#include <tbb/tbb.h>
#include <tbb/blocked_range.h>
#include <tbb/parallel_for.h>

using namespace std;
using namespace tbb;

typedef vector<int>::iterator IntVecIt;

struct body
{
   void operator()(const blocked_range<IntVecIt>&r)const
   {
      for(auto i = r.begin(); i!=r.end(); i++)

        cout<<*i<<' ';
   }
};

int main()
{
   vector<int> vec;
   for(int i=0; i<10; i++)
      vec.push_back(i);

   parallel_for(blocked_range< IntVecIt>(vec.begin(), vec.end())
      , body());
   return 0;
}

parallel_reduce

l 摘要

parallel_reduce模板在一個區域迭代，將由各個任務計算得到的部分結果合併，得到最終結果。

parallel_reduce對區域（range）類型的要求與parallel_for一樣。body類型需要分割構造函數以及一個

join方法。body的分割構造函數拷貝運行循環體需要的只讀數據，並分配並歸操作中初始化並歸變量

的標誌元素。join方法會組合並歸操作中各任務的結果。

l 語法

template<typenameRange, typename Value,

typename Func, typename Reduction>

Value parallel_reduce(const Range& range, const Value& identity,

const Func& func,const Reduction& reduction,

[, partitioner[,task_group_context& group]] );

template<typenameRange, typename Body>

void parallel_reduce(const Range& range, const Body& body

[, partitioner[,task_group_context& group]] );

l 頭文件

#include “tbb/parallel_reduce.h”

l 描述

parallel_reduce模板有兩種形式。函數形式是爲方便與lambda表達式一起使用而設計。

第二種形式是爲了最小化數據拷貝。

下面的表格總結了第一種形式中的identity,func,reduction的類型要求要求：

原型	摘要
Value Identity	Func::operator()的左標識元素
Value Func::operator()(const Range& range, const Value& x)	累計從初始值x開始的子區域的結果
Value Reduction::operator()(const Value& x, const Value& y);	合併x跟y的結果

第二種形式parallel_reduce（range，body）對range中的每個值執行body的並行並歸。

Range類型必須符合Range類型要求。body必須符合下表的要求：

原型	摘要
Body::Body(Body&, split)	分割構造函數。必須能跟operator()、join()併發運行。
Body::~Body()	析構函數
void Body::operator()(const Range& )	累計子區域的結果
void Body::join(Body& rhs)	將結果結合。rhs中的結果將合併到this中。

parallel_reduce使用分割構造函數來爲每個線程生成一個或多個body的拷貝。當它拷貝

body的時候，也許body的operator()或者join（）正在併發運行。要確保這種併發運行

下的安全。典型應用中，這種安全要求不會消耗你太多的精力。

l example

下面的例子將一個容器內的數值累加。

例1-3：

#include <iostream>
#include <tbb/parallel_reduce.h>
#include <tbb/blocked_range.h>
#include <vector> 

using namespace std;
using namespace tbb;

int main()
{
   vector<int> vec;
   for(int i=0; i<100; i++)
      vec.push_back(i);

int result = parallel_reduce(blocked_range<vector<int>::iterator>(vec.begin(), vec.end()),
      0,[](const blocked_range<vector<int>::iterator>& r, int init)->int{

        for(auto a = r.begin(); a!=r.end(); a++)
           init+=*a;
        return init;
      },

      [](int x, int y)->int{
        return x+y;
      }
      );
      cout<<"result:"<<result<<endl;
   return 0;

}

parallel_scan

l 摘要

並行計算前束（prefix）的函數模板。即輸入一個數組，生成一個數組，其中每個元素的值

都是原數組中在此元素之前的元素的某個運算符的結果的累積。比如求和：

輸入：[2, 8, 9, -4, 1, 3, -2, 7]

生成：[0, 2, 10, 19, 15, 16, 19, 17]

l 語法

template<typename Range, typename Body>

void parallel_scan( const Range& range, Body& body );

template<typename Range, typename Body>

void parallel_scan( const Range& range, Body& body, const

auto_partitioner& );

template<typename Range, typename Body>

void parallel_scan( const Range& range, Body& body, const

simple_partitioner& );

l 頭文件

#include “tbb/parallel_scan.h”

l 描述

數學裏對於並行前束的定義如下：

設⊕爲左標識元素id⊕的關聯運算符。在隊列X₀,X₁,…X_n-1執行⊕並行前束得到隊列Y₀,Y₁,Y₂,…Y_n-1:

y₀= id_⊕ ⊕x₀

y_i=y_i-1⊕ x_i

parallel_scan<Range,Body>以泛型形式實現並行前束。它的要求如下：

僞簽名	語義
void Body::operator()(const Range& r, pre_scan tag)	累積歸納區域r
void Body::operator()(const Range& r, final_scan tag)	歸納區域r以及計算掃描結果
Body::Body(Body& b, split)	分割b以便this和b能被單獨累積歸納。*this對象即本表下行的對象a
void Body::reverse_join(Body& a)	將a的歸納結果合併到this，this是先前從a的分割構造函數中創建的。*this對象即本表上一行中的對象b
void Body::assign(Body& b)	將b的歸納結果賦給this

l example

#include <tbb/parallel_scan.h>
#include <tbb/blocked_range.h>
#include <iostream>
using namespace tbb;
using namespace std; 

template<typename T>
class Body
{
   T _sum;
   T* const _y;
   const T* const _x;
public:
   Body(T y[], const T x[]):_sum(0), _x(x), _y(y){}
   T get_sum() const 
   {
      return _sum;
   }

   template<typename Tag>
   void operator()(const blocked_range<int>& r, Tag)
   {
      T temp = _sum;
      for(int i = r.begin(); i< r.end(); i++)
      {
        temp+=_x[i];
        if(Tag::is_final_scan())
           _y[i] = temp;
      }

      _sum = temp;
   }

   Body(Body&b, split):_x(b._x), _y(b._y), _sum(0){}
   void reverse_join(Body& a)
   {
     _sum+=a._sum;
   }
   void assign(Body& b)
   {
      _sum = b._sum;
   }

};

int main()
{
   int x[10] = {0,1,2,3,4,5,6,7,8,9};
   int y[10];
   Body<int> body(y,x);
   parallel_scan(blocked_range<int>(0, 10), body);
   cout<<"sum:"<<body.get_sum()<<endl;
   return 0;
}

parallel_do

l 摘要

並行處理工作項的模板函數

l 語法

template<typename InputIterator, typename Body>

void parallel_do( InputIterator first, InputIteratorlast,

Body body[,task_group_context& group] );

l 頭文件

#include "tbb/parallel_do.h"

l 描述

parallel_do(first, last,body)在對處於半開放區間[first, last)的元素應用函數對象body（不見得並行運行）。如果body重載的()函數的第二個參數（類型爲parallel_do_feeder）不爲空，那麼可以增加另外的工作項。當對輸入隊列或者通過parallel_do_feeder::add方法添加的所有項x執行的body（x）都返回後，函數結束。其中的parallel_do_feeder允許parallel_do的body添加額外的工作項，只有parallel_do才能創建或者銷燬parallel_do_feeder對象。其他的代碼對parallel_do_feeder唯一能做的事就是調用它的add方法。

對於Body類型的需求如下：

僞簽名	語義
Body::operator()( cv-qualifiers T& item, Parallel_do_feeder<T>& feeder) const 或者 Body::operator()(cv-qualifiers T& item) const	處理item。模板parallel_do也許會爲同一個this（不能是同一個item）並行調用operator()。如果有第二個參數，允許在函數中另行添加工作項。
T(const T&)	拷貝工作項
~T::T()	銷燬工作項

如果所有來自輸入流的元素不能隨機訪問，那麼parallel_do中的並行就不具備可擴展性。爲達到可擴展性，可按

如下方式之一處理：

ü 使用隨機迭代器指定輸入流

ü 對諸如body經常增加不止一項任務的行爲設計自己的算法

ü 用parallel_for來替換

爲了提高速度，B::operator()的粒度至少要約10萬個時鐘週期。否則，parallel_do的內在開銷就會影響有效工作。算法可以傳遞一個task_group_context對象，這樣它的任務可以在此組內執行。默認情況下，算法在它自己的有界組中執行。

l example

#include <tbb/parallel_do.h>
#include <iostream>
#include <vector>
using namespace std;
using namespace tbb; 

struct t_test
{
       string msg;
       int ref;
       void operator()()const
       {
           cout<<msg<<endl;
       }
};

template <typename T>
struct body_test
{
       void operator()(T* t, parallel_do_feeder<T*>& feeder) const
       {
              (*t)();
              if(t->ref == 0)
              {
                   t->msg = "added msg";
                   feeder.add(t);
                   t->ref++;
              }
       }
};    

int main()
{
       t_test *pt = new t_test;
       pt->ref = 0;
       pt->msg = "original msg";

       vector<t_test*> vec;
       vec.push_back(pt);
       parallel_do(vec.begin(), vec.end(), body_test<t_test>());
       delete pt;
       return 0;
}

pipleline

class pipeline

{

public:

pipeline();

~pipeline();

void add_filter( filter& f );

void run( size_t max_number_of_live_tokens

[,task_group_context& group] );

void clear();

};

可按以下步驟使用pipeline類：

1、從filter繼承類f，f的構造函數傳遞給基類filter的構造函數一個參數，來指定它的模式

2、重載虛方法filter::operator()來實現過濾器對元素處理，並返回一個將被下一個過濾器處理的元素指針。如果流裏沒有其他的要處理的元素，返回空值。最後一個過濾器的返回值將被忽略。

3、生成pipeline類的實例

4、生成過濾器f的實例，並將它們按先後順序加給pipeline。一個過濾器的實例一次只能加給一個pipeline。同一時間，一個過濾器禁止成爲多個pipeline的成員。

5、調用pipeline::run方法。參數max_number_of_live_tokens指定了能併發運行的階段數量上限。較高的值會以更多的內存消耗爲代價來增加併發性。

函數parallel_pipeline提供了一種強類型的面向lambda的方式來建立並運行管道。

過濾器基類

filter

class filter

{

public:

enum mode

{

parallel = implementation-defined,

serial_in_order = implementation-defined,

serial_out_of_order =implementation-defined

};

bool is_serial() const;

bool is_ordered() const;

virtual void* operator()( void* item ) = 0;

virtual void finalize( void* item ) {}

virtual ~filter();

protected:

filter( mode );

};

過濾器模式有三種模式：parallel，serial_in_order,serial_out_of_order

Ø parallel過濾器能不按特定的順序並行處理多個工作項

Ø serial_out_of_order過濾器不按特定的順序每次處理一個工作項

Ø serial_in_order過濾器每次處理一個工作項。管道中的所有serial_in_order過濾器都按同樣的順序處理工作項。

由於parallel過濾器支持並行加速，所以推薦使用。如果必須使用serial過濾器，那麼serial_out_of_order類型的過濾器是優先考慮的，因爲他在處理順序上的約束較少。

線程綁定過濾器

thread_bound_filter

classthread_bound_filter: public filter

{

protected:

thread_bound_filter(mode filter_mode);

public:

enum result_type

{

success,

item_not_available,

end_of_stream

};

result_type try_process_item();

result_type process_item();

};

管道中過濾器的抽象基類，線程必須顯式爲其提供服務。當一個過濾器必須由某個指定線程執行的時候會派上用場。服務於thread_bound_filter的線程不能是調用pipeline::run()的線程。

example:

#include<iostream>

#include <tbb/pipeline.h>

#include<tbb/compat/thread>

#include<tbb/task_scheduler_init.h>

using namespacestd;
using namespacetbb;
char input[] ="abcdefg\n";

classinputfilter:public filter
{
       char *_ptr;
public:
       void *operator()(void *)
       {
              if(*_ptr)
              {
                     cout<<"input:"<<*_ptr<<endl;
                     return _ptr++;
              }
              else   return 0;

       }
       inputfilter():filter(serial_in_order),_ptr(input){}
};

classoutputfilter: public thread_bound_filter
{
public:
       void *operator()(void *item)
       {
              cout<<*(char*)item;
              return 0;
       }
       outputfilter():thread_bound_filter(serial_in_order){}
}; 

voidrun_pipeline(pipeline *p)
{
    p->run(8);
} 

int main()
{
       inputfilter inf;
       outputfilter ouf;
       pipeline p;
       p.add_filter(inf);
       p.add_filter(ouf);
        //由於主線程服務於繼承自thread_bound_filter的outputfilter，所以pipeline要運行在另一個單獨的線程
       thread t(run_pipeline, &p);
       while(ouf.process_item()!=thread_bound_filter::end_of_stream)
              continue;
       t.join();
       return 0;
}

Intel Threading Building Blocks ：基本算法參考及使用

基本算法（algorithms）

parallel_for

parallel_reduce

parallel_scan

parallel_do

pipleline

filter

工作中用到的腳本合集

24-5-18 X

嵌入式系統中使用CGDB進行調試

c++新特性：多線程

Intel Threading Building Blocks ：基本算法參考及使用

Visual Studio Code ： C/C++開發者實用指南

ubuntu下Qt5無法啓用攝像頭的問題

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結