MPI-Communication modes(通訊模式)

MPI提供四種通訊模式:

這裏寫圖片描述

“MPI defines four communication modes which are selected via the specific SEND routine used. The communication mode instructs MPI on the algorithm that should be used for sending the message: synchronous, buffered, ready, or standard. Choice of the mode is separate from the selection of blocking/nonblocking; in other words, each communication mode can be applied to either a blocking or a nonblocking send. It should be noted that the RECV routine does not specify the communication modeit is simply blocking or nonblocking. The table below provides the names of the MPI routines corresponding to each communication mode.”

決定使用何種通訊模式,主要考慮是overhead, 即進程中多少時間將用來等待blocking send or receive 返回。每種模式有不同的overhead特點。overhead 通常有兩個來源:
1) 系統overhead-轉移buffer內容花費的時間
2) 同步overhead-等待另一個進程花費的時間

System overhead is incurred from transferring the message data from the sender’s message buffer onto the network (directly or indirectly), and from transferring the message data from the network into the receiver’s message buffer. It’s worth noting that this overhead does not generally encompass waiting for the message to be received, just for the sender’s buffer to be clear.

Synchronization overhead is the time spent waiting for an event to occur on another task. In certain modes, the sender must wait for the receive to be executed and for the handshake to arrive before the message can be transferred. The receiver also incurs some synchronization overhead in waiting for the handshake to complete. Synchronization overhead can be significant, not surprisingly, in synchronous mode. As we shall see, the other modes try different strategies for reducing this overhead.

This means that the speed of an MPI program is dependent on good network connections (low system overhead) and intelligent programming (low synchronization overhead). Generally speaking, MPI communications operate in what is known as the rendezvous protocol, which involves a handshake procedure in order to establish communication. This Flash animation illustrates this generalized procedure as well as showing how a pair of processes operate in concert to communicate with each other. Next, we’ll explore the 4 different communication modes that MPI provides, all of which perform slight modifications of the generalized procedure shown here.
(MPI程序性能依賴好的網絡連接(低系統overhead)以及靈巧的編程(低同步overhead))
(一)Standard communication mode(標準通訊模式)
此種模式下,MPI 將決定是否緩存發出的消息。如果MPI緩存發出的消息,那麼send操作能夠在有匹配的接收操作前完成。換句話說,send能夠在消息緩存之後立即返回,不管有沒有匹配的接收操作。另一種情況是可能沒有足夠空間buffer,或者MPI選擇不buffer消息,此時,send操作不會完成,直到有一個匹配的接收操作出現,並且數據被移動到接收器。
因此,標準模式下的send不管有沒有匹配的接收操作都可以開始。它可以在有匹配的接收前完成。
消息緩存代價有時比較高,因爲涉及到分配額外的存貯空間以及在內存間的拷貝操作。因此MPI提供了選擇,程序員可以自己指定通訊模式。
”standard send, is actually the hardest to define. Its functionality is left open to the implementer in the MPI-1 and -2 specifications. The prior MPI communication modes are all defined explicitly, so the developer knows exactly what behavior to expect, regardless of the implementation or the computational platform being used. Standard send, however, is intended to take advantage of specific optimizations that may be available via system enhancements.

In practice, one finds that standard-mode communications follow a similar pattern in most MPI implementations, including those available on Stampede. The strategy is to treat large and small messages differently. As we have seen, message buffering helps to reduce synchronization overhead, but it comes at a cost in memory. If messages are large, the penalty of putting extra copies in a buffer may be excessive and may even cause the program to crash. Still, buffering may turn out to be beneficial for small messages.

A typical standard send tries to strike a balance. For large messages (greater than some threshold), the standard send follows the rendezvous protocol , equivalent to synchronous mode. This protocol is safest and uses the least memory. For small messages, the standard send follows instead what is called the eager protocol. This protocol is like buffered mode, but the buffering is all done on the receiving side rather than the sending side, as shown below:“
(二)Buffered model(緩存模式)
此種模式下的send不管有沒有匹配的接收都能夠開始。在有匹配的接收之前,也可以完成。此種模式下最好明確的給buffer分配空間,MPI_Buffer_attach. 如果buffer溢出,則出錯。
”Buffered mode incurs extra system overhead, because of the additional copy from the message buffer to the user-supplied buffer. On the other hand, synchronization overhead is eliminated on the sending task—the timing of the receive is now irrelevant to the sender. Synchronization overhead can still be incurred by the receiving task, though, because it must block until the send has been completed.

In buffered mode, the programmer is responsible for allocating and managing the data buffer (one per process), by using calls to MPI_Buffer_attach and MPI_Buffer_detach. This has the advantage of providing increased control over the system, but also requires the programmer to safely manage this space. If a buffered-mode send requires more buffer space than is available, an error will be generated, and (by default) the program will exit.“
(三)Synchronous model(同步模式)
此種模式下的send不管有沒有匹配的接收都能夠開始。然後,send只有在有一個匹配的接收操作時,並且接收操作開始接收發送的數據時,才能夠成功完成。因此,同步模式的發送完成,首先暗示發送的buffer能夠重新使用,其次暗示匹配的接收操作已經開始。
同步模式是最快的點到點通訊模式。因爲sender要求receiver提供一個準備接收就緒的信號來觸發send操作。
”Message transfer must be preceded by a sender-receiver “handshake”. When the blocking synchronous send MPI_Ssend is executed, the sending task sends the receiving task a “ready to send” message. When the receiver executes the receive call, it sends a “ready to receive” message. Once both processes have succesfully received the other’s “ready” message, the actual data transfer can begin.

In the diagram above, the sender is shown waiting for the receiver to become available, but this delay can also occur in the other direction. If the receiver posts a blocking receive first, then the synchronization delay will occur on the receiving side. Given large numbers of independent processes on disparate systems, keeping them in sync is a challenging task and separate processes can rapidly get out of sync causing fairly major synchronization overhead. MPI_Barrier() can be used to try to keep nodes in sync, but probably doesn’t reduce actual overhead. MPI_Barrier() blocks processing until all tasks have checked in (i.e., synced up), so repeatedly calling Barrier tends to just shift synchronization overhead out of MPI_Send/Recv and into the Barrier calls.“
(四)Ready model(就緒模式)
此種模式的發送只有在一個匹配的接收出現時才能開始,否則,操作出錯並且給一個不確定的輸出。
*Receive must initiate, before Send starts
*Lowest sender overhead (No Sender/Receiver handshake ,No extra copy to buffer )

”The ready send attempts to reduce system and synchronization overhead by assuming that a ready-to-receive message has already arrived. Conceptually this results in a diagram that is identical to that first used to describe the simple view of point-to-point communication:
The idea is to have a blocking send that only blocks long enough to send the data to the network. However, if the matching receive has not already been posted when the send begins, an error will be generated.

How does this mode compare with buffered-mode communication? For the sender, it obviously reduces system overhead by eliminating the extra data copy. For the receiver, however, it may increase synchronization overhead, since in general the receive must be posted earlier than in buffered mode. Moreover, if the receive is not posted soon enough, an error will be triggered that is detected only at the receiver and not by the sender, which can make successful handling of the error difficult. Due to the risk of generating such an error, it only makes sense to use ready mode when the logic of the program dictates that the receive must be posted first, e.g., if the message is the expected response to a query that was sent previously.

發佈了16 篇原創文章 · 獲贊 0 · 訪問量 2萬+
發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章