MPI羣通信與矩陣乘法的Fox算法實現

原創

2020-07-02 11:26

原本以爲，MPI天生只能在Linux上運行。但這次卻發現了Intel MPI Library 這個好用的東西。基本不需要設置，安上之後，用自己能登錄windows的帳號和密碼註冊就行了。雖然不是局域網上的機器，但也可以讓我的雙核CPU達到100%（平時開個Matlab什麼的都纔是50%，軟件優化真是關鍵啊）。

FOX算法有一些噁心的要求：輸入的矩陣必須是方陣，而且進程必須爲平方數，方陣必須能均勻劃分給每個進程。其實方陣的條件並不一定必須，只是會增加編程複雜性。

MPI中，雖然最精華的就6條語句，但僅用這6條，還是比較麻煩的。使用一些高級語句，能提高程序性能或是簡化代碼。

MPI_Bcast，這個是必須要提的。能把這樣的語句 if (task_id == root) { send to child_id } else { recieve from root} 僅用個bcast就替換了。

MPI_Isend/MPI_Irecv是異步發送和接收的語句，顯然比MPI_Send/MPI_Recv這樣的阻塞語句更可能提高性能。另外，還可以避免循環死鎖。當然，避免循環死鎖的方法還有使用奇偶法、MPI_Sendrecv等。

雖然，理解MPI_Cart_create等一系列的笛卡爾結構操作還是有些難度的，但卻是十分有用的進程邏輯功能劃分方法。特別在FOX算法中，每行是個通信子域，每列也是。這樣就會有n+n個子域（儘管有劃分重疊），每個域的進程都僅與域中的其它進程有信息交換，這樣僅用bcast一語就能傳完數據。

以下是代碼。比較簡單。就不說明了。

// MatrixMulFox.cpp

// Jarod 2007.12.3

#include "mpi.h"

#include <algorithm>

#include <fstream>

#include <cmath>

const int root_id = 0;

const int max_procs_size = 16;

int main(int argc,char *argv[])

{

double start_time, end_time, time;

int procs_id, procs_size;

MPI_Status status;

MPI_Request reqSend, reqRecv;

MPI_Init(&argc,&argv);

start_time = MPI_Wtime();

MPI_Comm_size(MPI_COMM_WORLD,&procs_size);

MPI_Comm_rank(MPI_COMM_WORLD,&procs_id);

// 參數檢查

int N=0;

{

for (int i=1; i<argc; ++i ) {

char * pos =strstr(argv[i], "-N=");

if ( pos!=NULL) {

sscanf(pos, "-N=%d", &N);

break;

}

const int procs_size_sqrt = floor(sqrt(static_cast<double>(procs_size)));

const int n = N / procs_size_sqrt;

const int n_sqr = n*n;

if (procs_size<4 || procs_size> max_procs_size) {

printf("The fox algorithm requires at least 4 processors and at most %d processors. ",

max_procs_size);

MPI_Finalize();

return 0;

}

if (procs_size_sqrt*procs_size_sqrt != procs_size ) {

printf("The number of process must be a square. ");

MPI_Finalize();

return 0;

}

if (N % procs_size_sqrt !=0) {

printf("N mod procs_size_sqrt !=0 ");

MPI_Finalize();

return 0;

}

//初始化矩陣

int * A = new int[n_sqr];

int * B = new int[n_sqr];

int * C = new int[n_sqr];

int * T = new int[n_sqr];

for (int i=0; i<n; ++i)

for (int j=0; j<n; ++j) {

A[i*n+j] = (i+j)*procs_id;

B[i*n+j] = (i-j)*procs_id;

C[i*n+j] = 0;

}

//輸出矩陣

printf("A on procs %d : ", procs_id);

for (int i=0; i<n; ++i) {

for (int j=0; j<n; ++j) {

printf("%5d",A[i*n+j]);

}

printf(" ");

}

printf("B on procs %d : ", procs_id);

for (int i=0; i<n; ++i) {

for (int j=0; j<n; ++j) {

printf("%5d",B[i*n+j]);

}

printf(" ");

}

// 劃分組, 建立子通信空間

MPI_Comm cart_all, cart_row, cart_col;

int dims[2], periods[2];

int procs_cart_rank, procs_coords[2];

dims[0] = dims[1] = procs_size_sqrt;

periods[0] = periods[1] = true;

MPI_Cart_create(MPI_COMM_WORLD, 2, dims, periods, false, &cart_all);

MPI_Comm_rank(cart_all, &procs_cart_rank);

MPI_Cart_coords(cart_all, procs_cart_rank, 2, procs_coords);

MPI_Comm_split(cart_all, procs_coords[0], procs_coords[1], &cart_row);

MPI_Comm_split(cart_all, procs_coords[1], procs_coords[0], &cart_col);

int rank_cart_row, rank_cart_col;

MPI_Comm_rank(cart_row, & rank_cart_row);

MPI_Comm_rank(cart_col, & rank_cart_col);

// 計算並傳遞

for (int round = 0; round < procs_size_sqrt; ++ round) {

MPI_Isend(B, n_sqr, MPI_INT, (procs_coords[0] - 1 + procs_size_sqrt) % procs_size_sqrt,

1, cart_col, &reqSend);

int broader = (round + procs_coords[0]) % procs_size_sqrt;

if (broader == procs_coords[1]) std::copy(A,A+n_sqr,T);

MPI_Bcast(T, n_sqr, MPI_INT, broader , cart_row);

for (int row=0; row<n; ++row)

for (int col=0; col<n; ++col)

for (int k=0; k<n; ++k) {

C[row*n+col] += T[row*n+k]*B[k*n+col];

}

MPI_Wait(&reqSend, &status);

MPI_Recv(T, n_sqr, MPI_INT, (procs_coords[0] + 1) % procs_size_sqrt

, 1, cart_col, &status);

std::copy(T,T+n_sqr,B);

}

//輸出結果

printf("C on procs %d : ", procs_id);

for (int i=0; i<n; ++i) {

for (int j=0; j<n; ++j) {

printf("%5d",C[i*n+j]);

}

printf(" ");

}

// 釋放

MPI_Comm_free(&cart_col);

MPI_Comm_free(&cart_row);

MPI_Comm_free(&cart_all);

delete []A;

delete []B;

delete []C;

delete []T;

end_time = MPI_Wtime();

MPI_Finalize();

printf("task %d consumed %lf seconds ", procs_id, end_time-start_time);

return 0;

}

發表評論

所有評論

還沒有人評論，想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.

MPI羣通信與矩陣乘法的Fox算法實現

[轉帖]使用NMT和pmap解決JVM資源泄漏問題原創

Python實現大麥網搶票的四大關鍵技術點解析

Python 安裝庫指令大全

salesforce零基礎學習（一百三十八）零碎知識點小總結（十）

一款開源的.NET程序集反編譯、編輯和調試神器

關於接口協議，你必須要知道這些！

基於 Milvus + LlamaIndex 實現高級 RAG

【2024-05-21】以茶會友

boost/uBLAS，Primal Perceptron Algorithm，重定向

std::vector 插入數組的簡潔代碼

MPI羣通信與矩陣乘法的Fox算法實現

OpenCV 1.0 在VS2005中編譯爲靜態庫所需的設置

開張!做一個程序員的追求從未放棄!

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結