塊存儲:AIO的直接寫流程註釋

提交io流程中aio_write之前函數的註釋,可參考“塊存儲:AIO的直接讀流程註釋”。

 設置iter迭代器的函數註釋可參考“塊存儲:AIO的直接讀流程註釋”。

blkdev_write_iter函數調用__generic_file_write_iter開始直寫,其註釋:

 如果是Direct寫,__generic_file_write_iter將首先調用generic_file_direct_write函數,其註釋如下:

塊設備直寫函數blkdev_direct_IO及其以下函數調用鏈的註釋見“塊存儲:AIO的直接讀流程註釋”。

最後blkdev_write_iter函數調用generic_write_sync用於最後數據的安全落盤,實際是調用塊設備的fsync函數blkdev_fsync, 向設備發送一個FLUSH指令,將設備本身帶的cache落盤:

 另外,對於具體文件系統,fsync()的實現取決於具體文件系統的實現,大部分情況下也會用到REQ_PREFLUSH接口將數據刷到硬盤存儲介質。

上述blkdev_fsync先調用file_write_and_wait_range將page cache中的緩存直接落盤,但是OS並不知道磁盤上有沒有寫緩存,如果磁盤上面有寫緩存,file_write_and_wait_range觸發的落盤可能只落在了磁盤緩存上,並沒有落在非易失介質上,所以需要觸發下面的FLUSH指令。 

FLUSH指令作用示意:

關於REQ_PREFLUSH
REQ_PREFLUSH 是bio的request flag,表示在本次io開始時先確保在它之前完成的io都已經寫到非易失性存儲裏。
可在一個空的bio裏設置REQ_PREFLUSH,表示回刷disk page cache裏數據。

Explicit cache flushes (Documentation/block/writeback_cache_control.txt)
The REQ_PREFLUSH flag can be OR ed into the r/w flags of a bio submitted from the filesystem and will make sure the volatile cache of the storage device has been flushed before the actual I/O operation is started. This explicitly
guarantees that previously completed write requests are on non-volatile storage before the flagged bio starts
. In addition the REQ_PREFLUSH flag can be set on an otherwise empty bio structure, which causes only an explicit cache
flush without any dependent I/O. It is recommend to use the blkdev_issue_flush() helper for a pure cache flush.

REQ_FLUSH:表示把磁盤cache中的data刷新到磁盤介質中,防止掉電丟失; REQ_FUA (force unit access):繞過磁盤cache,直接把數據寫到磁盤介質中。

 

Documentation/block/writeback_cache_control.txt:
==========================================
Explicit volatile write back cache control
==========================================

Introduction
------------

Many storage devices, especially in the consumer market, come with volatile
write back caches.  That means the devices signal I/O completion to the
operating system before data actually has hit the non-volatile storage.  This
behavior obviously speeds up various workloads, but it means the operating
system needs to force data out to the non-volatile storage when it performs
a data integrity operation like fsync, sync or an unmount.

The Linux block layer provides two simple mechanisms that let filesystems
control the caching behavior of the storage device.  These mechanisms are
a forced cache flush, and the Force Unit Access (FUA) flag for requests.


Explicit cache flushes
----------------------

The REQ_PREFLUSH flag can be OR ed into the r/w flags of a bio submitted from
the filesystem and will make sure the volatile cache of the storage device
has been flushed before the actual I/O operation is started.  This explicitly
guarantees that previously completed write requests are on non-volatile
storage before the flagged bio starts. In addition the REQ_PREFLUSH flag can be
set on an otherwise empty bio structure, which causes only an explicit cache
flush without any dependent I/O.  It is recommend to use
the blkdev_issue_flush() helper for a pure cache flush.


Forced Unit Access
------------------

The REQ_FUA flag can be OR ed into the r/w flags of a bio submitted from the
filesystem and will make sure that I/O completion for this request is only
signaled after the data has been committed to non-volatile storage.


Implementation details for filesystems
--------------------------------------

Filesystems can simply set the REQ_PREFLUSH and REQ_FUA bits and do not have to
worry if the underlying devices need any explicit cache flushing and how
the Forced Unit Access is implemented.  The REQ_PREFLUSH and REQ_FUA flags
may both be set on a single bio.


Implementation details for make_request_fn based block drivers
--------------------------------------------------------------

These drivers will always see the REQ_PREFLUSH and REQ_FUA bits as they sit
directly below the submit_bio interface.  For remapping drivers the REQ_FUA
bits need to be propagated to underlying devices, and a global flush needs
to be implemented for bios with the REQ_PREFLUSH bit set.  For real device
drivers that do not have a volatile cache the REQ_PREFLUSH and REQ_FUA bits
on non-empty bios can simply be ignored, and REQ_PREFLUSH requests without
data can be completed successfully without doing any work.  Drivers for
devices with volatile caches need to implement the support for these
flags themselves without any help from the block layer.


Implementation details for request_fn based block drivers
---------------------------------------------------------

For devices that do not support volatile write caches there is no driver
support required, the block layer completes empty REQ_PREFLUSH requests before
entering the driver and strips off the REQ_PREFLUSH and REQ_FUA bits from
requests that have a payload.  For devices with volatile write caches the
driver needs to tell the block layer that it supports flushing caches by
doing::

	blk_queue_write_cache(sdkp->disk->queue, true, false);

and handle empty REQ_OP_FLUSH requests in its prep_fn/request_fn.  Note that
REQ_PREFLUSH requests with a payload are automatically turned into a sequence
of an empty REQ_OP_FLUSH request followed by the actual write by the block
layer.  For devices that also support the FUA bit the block layer needs
to be told to pass through the REQ_FUA bit using::

	blk_queue_write_cache(sdkp->disk->queue, true, true);

and the driver must handle write requests that have the REQ_FUA bit set
in prep_fn/request_fn.  If the FUA bit is not natively supported the block
layer turns it into an empty REQ_OP_FLUSH request after the actual write.
發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章