JAVA NIO之淺談內存映射文件原理與DirectMemory

JAVA類庫中的NIO包相對於IO 包來說有一個新功能是內存映射文件，日常編程中並不是經常用到，但是在處理大文件時是比較理想的提高效率的手段。本文我主要想結合操作系統中（OS）相關方面的知識介紹一下原理。

在傳統的文件IO操作中，我們都是調用操作系統提供的底層標準IO系統調用函數 read()、write() ，此時調用此函數的進程（在JAVA中即java進程）由當前的用戶態切換到內核態，然後OS的內核代碼負責將相應的文件數據讀取到內核的IO緩衝區，然後再把數據從內核IO緩衝區拷貝到進程的私有地址空間中去，這樣便完成了一次IO操作。至於爲什麼要多此一舉搞一個內核IO緩衝區把原本只需一次拷貝數據的事情搞成需要2次數據拷貝呢？我想學過操作系統或者計算機系統結構的人都知道，這麼做是爲了減少磁盤的IO操作，爲了提高性能而考慮的，因爲我們的程序訪問一般都帶有局部性，也就是所謂的局部性原理，在這裏主要是指的空間局部性，即我們訪問了文件的某一段數據，那麼接下去很可能還會訪問接下去的一段數據，由於磁盤IO操作的速度比直接訪問內存慢了好幾個數量級，所以OS根據局部性原理會在一次 read()系統調用過程中預讀更多的文件數據緩存在內核IO緩衝區中，當繼續訪問的文件數據在緩衝區中時便直接拷貝數據到進程私有空間，避免了再次的低效率磁盤IO操作。在JAVA中當我們採用IO包下的文件操作流，如：

[java] view plain copy print ?

FileInputStream in = new FileInputStream("D:\\java.txt");

   FileInputStream in = new FileInputStream("D:\\java.txt");

[java] view plain copy print ?

in.read();

   in.read();

JAVA虛擬機內部便會調用OS底層的 read()系統調用完成操作，如上所述，在第二次調用 in.read()的時候可能就是從內核緩衝區直接返回數據了（可能還有經過 native堆做一次中轉，因爲這些函數都被聲明爲 native，即本地平臺相關，所以可能在C代碼中有做一次中轉，如 win32中是通過 C代碼從OS讀取數據，然後再傳給JVM內存）。既然如此，JAVA的IO包中爲啥還要提供一個 BufferedInputStream 類來作爲緩衝區呢。關鍵在於四個字，"系統調用"！當讀取OS內核緩衝區數據的時候，便發起了一次系統調用操作（通過native的C函數調用），而系統調用的代價相對來說是比較高的，涉及到進程用戶態和內核態的上下文切換等一系列操作，所以我們經常採用如下的包裝：

[java] view plain copy print ?

FileInputStream in = new FileInputStream("D:\\java.txt");

   FileInputStream in = new FileInputStream("D:\\java.txt");

[java] view plain copy print ?

BufferedInputStream buf_in = new BufferedInputStream(in);

   BufferedInputStream buf_in = new BufferedInputStream(in);

[java] view plain copy print ?

buf_in.read();

   buf_in.read();

這樣一來，我們每一次 buf_in.read() 時候，BufferedInputStream 會根據情況自動爲我們預讀更多的字節數據到它自己維護的一個內部字節數組緩衝區中，這樣我們便可以減少系統調用次數，從而達到其緩衝區的目的。所以要明確的一點是 BufferedInputStream 的作用不是減少磁盤IO操作次數（這個OS已經幫我們做了），而是通過減少系統調用次數來提高性能的。同理 BufferedOuputStream , BufferedReader/Writer 也是一樣的。在 C語言的函數庫中也有類似的實現，如 fread()，這個函數就是 C語言中的緩衝IO，作用與BufferedInputStream()相同.

這裏簡單的引用下JDK6 中 BufferedInputStream 的源碼驗證下：

[java] view plain copy print ?

public
class BufferedInputStream extends FilterInputStream {
private static int defaultBufferSize = 8192;
/**
* The internal buffer array where the data is stored. When necessary,
* it may be replaced by another array of
* a different size.
*/
protected volatile byte buf[];
/**
* The index one greater than the index of the last valid byte in
* the buffer.
* This value is always
* in the range <code>0</code> through <code>buf.length</code>;
* elements <code>buf[0]</code> through <code>buf[count-1]
* </code>contain buffered input data obtained
* from the underlying input stream.
*/
protected int count;
/**
* The current position in the buffer. This is the index of the next
* character to be read from the <code>buf</code> array.
* <p>
* This value is always in the range <code>0</code>
* through <code>count</code>. If it is less
* than <code>count</code>, then <code>buf[pos]</code>
* is the next byte to be supplied as input;
* if it is equal to <code>count</code>, then
* the next <code>read</code> or <code>skip</code>
* operation will require more bytes to be
* read from the contained input stream.
*
* @see java.io.BufferedInputStream#buf
*/
protected int pos;
/* 這裏省略去 N 多代碼 ------>> */
/**
* See
* the general contract of the <code>read</code>
* method of <code>InputStream</code>.
*
* @return the next byte of data, or <code>-1</code> if the end of the
* stream is reached.
* @exception IOException if this input stream has been closed by
* invoking its {@link #close()} method,
* or an I/O error occurs.
* @see java.io.FilterInputStream#in
*/
public synchronized int read() throws IOException {
if (pos >= count) {
fill();
if (pos >= count)
return -1;
}
return getBufIfOpen()[pos++] & 0xff;
}

public
class BufferedInputStream extends FilterInputStream {

    private static int defaultBufferSize = 8192;

    /**
     * The internal buffer array where the data is stored. When necessary,
     * it may be replaced by another array of
     * a different size.
     */
    protected volatile byte buf[];
  /**
     * The index one greater than the index of the last valid byte in 
     * the buffer. 
     * This value is always
     * in the range <code>0</code> through <code>buf.length</code>;
     * elements <code>buf[0]</code>  through <code>buf[count-1]
     * </code>contain buffered input data obtained
     * from the underlying  input stream.
     */
    protected int count;

    /**
     * The current position in the buffer. This is the index of the next 
     * character to be read from the <code>buf</code> array. 
     * <p>
     * This value is always in the range <code>0</code>
     * through <code>count</code>. If it is less
     * than <code>count</code>, then  <code>buf[pos]</code>
     * is the next byte to be supplied as input;
     * if it is equal to <code>count</code>, then
     * the  next <code>read</code> or <code>skip</code>
     * operation will require more bytes to be
     * read from the contained  input stream.
     *
     * @see     java.io.BufferedInputStream#buf
     */
    protected int pos;

 /* 這裏省略去 N 多代碼 ------>>  */

  /**
     * See
     * the general contract of the <code>read</code>
     * method of <code>InputStream</code>.
     *
     * @return     the next byte of data, or <code>-1</code> if the end of the
     *             stream is reached.
     * @exception  IOException  if this input stream has been closed by
     *				invoking its {@link #close()} method,
     *				or an I/O error occurs. 
     * @see        java.io.FilterInputStream#in
     */
    public synchronized int read() throws IOException {
	if (pos >= count) {
	    fill();
	    if (pos >= count)
		return -1;
	}
	return getBufIfOpen()[pos++] & 0xff;
    }

我們可以看到，BufferedInputStream 內部維護着一個字節數組 byte[] buf 來實現緩衝區的功能，我們調用的 buf_in.read() 方法在返回數據之前有做一個 if 判斷，如果 buf 數組的當前索引不在有效的索引範圍之內，即 if 條件成立， buf 字段維護的緩衝區已經不夠了，這時候會調用內部的 fill() 方法進行填充，而fill()會預讀更多的數據到 buf 數組緩衝區中去，然後再返回當前字節數據，如果 if 條件不成立便直接從 buf緩衝區數組返回數據了。其中getBufIfOpen()返回的就是 buf字段的引用。順便說下，源碼中的 buf 字段聲明爲 protected volatile byte buf[]; 主要是爲了通過 volatile 關鍵字保證 buf數組在多線程併發環境中的內存可見性.

和 JAVA NIO 的內存映射無關的部分說了這麼多篇幅，主要是爲了做個鋪墊，這樣才能建立起一個知識體系，以便更好的理解內存映射文件的優點。

內存映射文件和之前說的標準IO操作最大的不同之處就在於它雖然最終也是要從磁盤讀取數據，但是它並不需要將數據讀取到OS內核緩衝區，而是直接將進程的用戶私有地址空間中的一部分區域與文件對象建立起映射關係，就好像直接從內存中讀、寫文件一樣，速度當然快了。爲了說清楚這個，我們以 Linux操作系統爲例子，看下圖：

此圖爲 Linux 2.X 中的進程虛擬存儲器，即進程的虛擬地址空間，如果你的機子是 32 位，那麼就有 2^32 = 4G的虛擬地址空間，我們可以看到圖中有一塊區域： “Memory mapped region for shared libraries” ，這段區域就是在內存映射文件的時候將某一段的虛擬地址和文件對象的某一部分建立起映射關係，此時並沒有拷貝數據到內存中去，而是當進程代碼第一次引用這段代碼內的虛擬地址時，觸發了缺頁異常，這時候OS根據映射關係直接將文件的相關部分數據拷貝到進程的用戶私有空間中去，當有操作第N頁數據的時候重複這樣的OS頁面調度程序操作。注意啦，原來內存映射文件的效率比標準IO高的重要原因就是因爲少了把數據拷貝到OS內核緩衝區這一步（可能還少了 native堆中轉這一步）。

java中提供了3種內存映射模式，即：只讀(readonly)、讀寫(read_write)、專用(private) ，對於只讀模式來說，如果程序試圖進行寫操作，則會拋出ReadOnlyBufferException異常；第二種的讀寫模式表明了通過內存映射文件的方式寫或修改文件內容的話是會立刻反映到磁盤文件中去的，別的進程如果共享了同一個映射文件，那麼也會立即看到變化！而不是像標準IO那樣每個進程有各自的內核緩衝區，比如JAVA代碼中，沒有執行 IO輸出流的 flush() 或者 close() 操作，那麼對文件的修改不會更新到磁盤去，除非進程運行結束；最後一種專用模式採用的是OS的“寫時拷貝”原則，即在沒有發生寫操作的情況下，多個進程之間都是共享文件的同一塊物理內存（進程各自的虛擬地址指向同一片物理地址），一旦某個進程進行寫操作，那麼將會把受影響的文件數據單獨拷貝一份到進程的私有緩衝區中，不會反映到物理文件中去。

在JAVA NIO中可以很容易的創建一塊內存映射區域，代碼如下：

[java] view plain copy print ?

File file = new File("E:\\download\\office2007pro.chs.ISO");
FileInputStream in = new FileInputStream(file);
FileChannel channel = in.getChannel();
MappedByteBuffer buff = channel.map(FileChannel.MapMode.READ_ONLY, 0,channel.size());

  File file = new File("E:\\download\\office2007pro.chs.ISO");
  FileInputStream in = new FileInputStream(file);
  FileChannel channel = in.getChannel();
  MappedByteBuffer buff = channel.map(FileChannel.MapMode.READ_ONLY, 0,channel.size());

這裏創建了一個只讀模式的內存映射文件區域，接下來我就來測試下與普通NIO中的通道操作相比性能上的優勢，先看如下代碼：

[java] view plain copy print ?

public class IOTest {
static final int BUFFER_SIZE = 1024;
public static void main(String[] args) throws Exception {
File file = new File("F:\\aa.pdf");
FileInputStream in = new FileInputStream(file);
FileChannel channel = in.getChannel();
MappedByteBuffer buff = channel.map(FileChannel.MapMode.READ_ONLY, 0,
channel.size());
byte[] b = new byte[1024];
int len = (int) file.length();
long begin = System.currentTimeMillis();
for (int offset = 0; offset < len; offset += 1024) {
if (len - offset > BUFFER_SIZE) {
buff.get(b);
} else {
buff.get(new byte[len - offset]);
}
}
long end = System.currentTimeMillis();
System.out.println("time is:" + (end - begin));
}
}

public class IOTest {
	static final int BUFFER_SIZE = 1024;

	public static void main(String[] args) throws Exception {

		File file = new File("F:\\aa.pdf");
		FileInputStream in = new FileInputStream(file);
		FileChannel channel = in.getChannel();
		MappedByteBuffer buff = channel.map(FileChannel.MapMode.READ_ONLY, 0,
				channel.size());

		byte[] b = new byte[1024];
		int len = (int) file.length();

		long begin = System.currentTimeMillis();

		for (int offset = 0; offset < len; offset += 1024) {

			if (len - offset > BUFFER_SIZE) {
				buff.get(b);
			} else {
				buff.get(new byte[len - offset]);
			}
		}

		long end = System.currentTimeMillis();
		System.out.println("time is:" + (end - begin));

	}
}

輸出爲 63，即通過內存映射文件的方式讀取 86M多的文件只需要78毫秒，我現在改爲普通NIO的通道操作看下：

[java] view plain copy print ?

File file = new File("F:\\liq.pdf");
FileInputStream in = new FileInputStream(file);
FileChannel channel = in.getChannel();
ByteBuffer buff = ByteBuffer.allocate(1024);
long begin = System.currentTimeMillis();
while (channel.read(buff) != -1) {
buff.flip();
buff.clear();
}
long end = System.currentTimeMillis();
System.out.println("time is:" + (end - begin));

               File file = new File("F:\\liq.pdf");
		FileInputStream in = new FileInputStream(file);
		FileChannel channel = in.getChannel();
		ByteBuffer buff = ByteBuffer.allocate(1024); 
 
		long begin = System.currentTimeMillis();
		while (channel.read(buff) != -1) {
			buff.flip();
			buff.clear();
		}
		long end = System.currentTimeMillis();
		System.out.println("time is:" + (end - begin));

輸出爲 468毫秒，幾乎是 6 倍的差距，文件越大，差距便越大。所以內存映射文件特別適合於對大文件的操作，JAVA中的限制是最大不得超過 Integer.MAX_VALUE，即2G左右，不過我們可以通過分次映射文件(channel.map)的不同部分來達到操作整個文件的目的。

按照jdk文檔的官方說法，內存映射文件屬於JVM中的直接緩衝區，還可以通過 ByteBuffer.allocateDirect() ，即DirectMemory的方式來創建直接緩衝區。他們相比基礎的 IO操作來說就是少了中間緩衝區的數據拷貝開銷。同時他們屬於JVM堆外內存，不受JVM堆內存大小的限制。

其中 DirectMemory 默認的大小是等同於JVM最大堆，理論上說受限於進程的虛擬地址空間大小，比如 32位的windows上，每個進程有4G的虛擬空間除去 2G爲OS內核保留外，再減去 JVM堆的最大值，剩餘的纔是DirectMemory大小。通過設置 JVM參數 -Xmx64M，即JVM最大堆爲64M，然後執行以下程序可以證明DirectMemory不受JVM堆大小控制：

[java] view plain copy print ?

public static void main(String[] args) {
ByteBuffer.allocateDirect(1024*1024*100); // 100MB
}

     public static void main(String[] args) {	   
	   ByteBuffer.allocateDirect(1024*1024*100); // 100MB
   }

我們設置了JVM堆 64M限制，然後在直接內存上分配了 100MB空間，程序執行後直接報錯：Exception in thread "main" java.lang.OutOfMemoryError: Direct buffer memory。接着我設置 -Xmx200M，程序正常結束。然後我修改配置： -Xmx64M -XX:MaxDirectMemorySize=200M，程序正常結束。因此得出結論：直接內存DirectMemory的大小默認爲 -Xmx 的JVM堆的最大值，但是並不受其限制，而是由JVM參數 MaxDirectMemorySize單獨控制。接下來我們來證明直接內存不是分配在JVM堆中。我們先執行以下程序，並設置 JVM參數 -XX:+PrintGC，

[java] view plain copy print ?

public static void main(String[] args) {
for(int i=0;i<20000;i++) {
ByteBuffer.allocateDirect(1024*100); //100K
}
}

 public static void main(String[] args) {	   
	 for(int i=0;i<20000;i++) {
            ByteBuffer.allocateDirect(1024*100);  //100K
       }
   }

輸出結果如下：

     [GC 1371K->1328K(61312K), 0.0070033 secs]
    [Full GC 1328K->1297K(61312K), 0.0329592 secs]
     [GC 3029K->2481K(61312K), 0.0037401 secs]
     [Full GC 2481K->2435K(61312K), 0.0102255 secs]

我們看到這裏執行 GC的次數較少，但是觸發了兩次 Full GC，原因在於直接內存不受 GC(新生代的Minor GC)影響，只有當執行老年代的 Full GC時候纔會順便回收直接內存！而直接內存是通過存儲在JVM堆中的DirectByteBuffer對象來引用的，所以當衆多的 DirectByteBuffer對象從新生代被送入老年代後才觸發了 full gc。

再看直接在JVM堆上分配內存區域的情況：

[java] view plain copy print ?

public static void main(String[] args) {
r(int i=0;i<10000;i++) {
ByteBuffer.allocate(1024*100); //100K
}

   public static void main(String[] args) {	   
	for(int i=0;i<10000;i++) {
             ByteBuffer.allocate(1024*100);  //100K
	 }
   }

ByteBuffer.allocate 意味着直接在 JVM堆上分配內存，所以受新生代的 Minor GC影響，輸出如下：

        [GC 16023K->224K(61312K), 0.0012432 secs]
       [GC 16211K->192K(77376K), 0.0006917 secs]
        [GC 32242K->176K(77376K), 0.0010613 secs]
        [GC 32225K->224K(109504K), 0.0005539 secs]
        [GC 64423K->192K(109504K), 0.0006151 secs]
        [GC 64376K->192K(171392K), 0.0004968 secs]
        [GC 128646K->204K(171392K), 0.0007423 secs]
        [GC 128646K->204K(299968K), 0.0002067 secs]
        [GC 257190K->204K(299968K), 0.0003862 secs]
        [GC 257193K->204K(287680K), 0.0001718 secs]
        [GC 245103K->204K(276480K), 0.0001994 secs]
        [GC 233662K->204K(265344K), 0.0001828 secs]
        [GC 222782K->172K(255232K), 0.0001998 secs]
        [GC 212374K->172K(245120K), 0.0002217 secs]

可以看到，由於直接在 JVM堆上分配內存，所以觸發了多次GC，且不會觸及 Full GC，因爲對象根本沒機會進入老年代。

我想提個疑問，NIO中的DirectMemory和內存文件映射同屬於直接緩衝區，但是前者和 -Xmx和-XX:MaxDirectMemorySize有關，而後者完全沒有JVM參數可以影響和控制，這讓我不禁懷疑兩者的直接緩衝區是否相同，前者指的是 JAVA進程中的 native堆，即涉及底層平臺如 win32的dll 部分，因爲 C語言中的 malloc()分配的內存就屬於 native堆，不屬於 JVM堆，這也是DirectMemory能在一些場景中顯著提高性能的原因，因爲它避免了在 native堆和jvm堆之間數據的來回複製；而後者則是沒有經過 native堆，是由 JAVA進程直接建立起某一段虛擬地址空間和文件對象的關聯映射關係，參見 Linux虛擬存儲器圖中的 “Memory mapped region for shared libraries” 區域，所以內存映射文件的區域並不在JVM GC的回收範圍內，因爲它本身就不屬於堆區，卸載這部分區域只能通過系統調用 unmap()來實現 (Linux)中，而 JAVA API 只提供了 FileChannel.map 的形式創建內存映射區域，卻沒有提供對應的 unmap()，讓人十分費解，導致要卸載這部分區域比較麻煩。

最後再試試通過 DirectMemory來操作前面內存映射和基本通道操作的例子，來看看直接內存操作的話，程序的性能如何：

[java] view plain copy print ?

File file = new File("F:\\liq.pdf");
FileInputStream in = new FileInputStream(file);
FileChannel channel = in.getChannel();
ByteBuffer buff = ByteBuffer.allocateDirect(1024);
long begin = System.currentTimeMillis();
while (channel.read(buff) != -1) {
buff.flip();
buff.clear();
}
long end = System.currentTimeMillis();
System.out.println("time is:" + (end - begin));

               File file = new File("F:\\liq.pdf");
		FileInputStream in = new FileInputStream(file);
		FileChannel channel = in.getChannel();
		ByteBuffer buff = ByteBuffer.allocateDirect(1024); 
 
		long begin = System.currentTimeMillis();
		while (channel.read(buff) != -1) {
			buff.flip();
			buff.clear();
		}
		long end = System.currentTimeMillis();
		System.out.println("time is:" + (end - begin));

程序輸出爲 312毫秒，看來比普通的NIO通道操作（468毫秒）來的快，但是比 mmap 內存映射的 63秒差距太多了，我想應該不至於吧，通過修改;ByteBuffer buff = ByteBuffer.allocateDirect(1024); 爲 ByteBuffer buff = ByteBuffer.allocateDirect((int)file.length())，即一次性分配整個文件長度大小的堆外內存，最終輸出爲 78毫秒，由此可以得出兩個結論：1.堆外內存的分配耗時比較大. 2.還是比mmap內存映射來得慢，都不要說通過mmap讀取數據的時候還涉及缺頁異常、頁面調度的系統調用了，看來內存映射文件確實NB啊，這還只是 86M的文件，如果上 G 的大小呢？

最後一點爲 DirectMemory的內存只有在 JVM執行 full gc 的時候纔會被回收，那麼如果在其上分配過大的內存空間，那麼也將出現 OutofMemoryError，即便 JVM 堆中的很多內存處於空閒狀態。

本來只想寫點內存映射部分，但是寫着寫着涉及進來的知識多了點，邊界不好把控啊。。。

尼瑪，都是3月8號凌晨快2點了，不過想想總比以前玩拳皇遊戲熬夜來的好吧，寫完收工，趕緊睡覺去。。。

我想補充下額外的一個知識點，關於 JVM堆大小的設置是不受限於物理內存，而是受限於虛擬內存空間大小，理論上來說是進程的虛擬地址空間大小，但是實際上我們的虛擬內存空間是有限制的，一般windows上默認在C盤，大小爲物理內存的2倍左右。我做了個實驗：我機子是 64位的win7，那麼理論上說進程虛擬空間是幾乎無限大，物理內存爲4G，而我設置 -Xms5000M，即在啓動JAVA程序的時候一次性申請到超過物理內存大小的5000M內存，程序正常啓動，而當我加到 -Xms8000M的時候就報OOM錯誤了，然後我修改增加 win7的虛擬內存，程序又正常啓動了，說明 -Xms 受限於虛擬內存的大小。我設置-Xms5000M，即超過了4G物理內存，並在一個死循環中不斷創建對象，並保證不會被GC回收。程序運行一會後整個電腦幾乎死機狀態，即卡住了，反映很慢很慢，推測是發生了系統顛簸，即頻繁的頁面調度置換導致，說明 -Xms -Xmx不是侷限於物理內存的大小，而是綜合虛擬內存了，JVM會根據電腦虛擬內存的設置來控制。

成者之劍

發佈了9 篇原創文章 · 獲贊 30 · 訪問量 38萬+

私信關注

JAVA NIO之淺談內存映射文件原理與DirectMemory

Java Web應用調優線程池：沒你想的那麼複雜

單點登錄(SSO)入門第二篇--SSO之實現CAS

單點登錄(SSO)入門第一篇--基本概念

JAVA NIO之淺談內存映射文件原理與DirectMemory

JAVA正則表達式語法大全

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結