Android大文件上傳秒傳之MD5篇

前言

現在越來越多的應用開始有上傳大文件的需求，以及秒傳，續傳功能。由於最近學習大文件分隔上傳，以及秒傳的實現，給予分享的這種精神，我想將自己的學習過程，以及遇到的問題做一個總結，希望對有這方面需求的小夥伴有一定的幫助。

分析

說到大文件上傳，我們可能首先會想的一些網盤App,這些優秀的網盤除了上傳大文件外，還可以實現秒傳以及斷點續傳功能。說起斷點續傳也就明白了文章題目所說的大文件分片，由於網絡的原因，一個大文件例如一百兆的文件，很有可能中途上傳到一半或者50MB,或者上傳到99MB時候失敗了，如果下次再上傳時還從頭開始上傳，這個體驗很少人能接受的，如果你要真做成這樣的話，那麼客戶一定會嚴重流失，所以我們需要對其分片或者說下次上傳時從失敗的地方開始上傳。相信使用網盤較多的朋友都知道有一個很6的功能就是秒傳，可能你很難相信爲何我幾百兆甚至幾個G的文件，爲何幾秒內就上傳成功了，爲何這麼神奇，其實原理也很簡單，就是我們每次上傳文件時每一個文件都會有一個獨一無二的特徵值，當我們上傳文件時，他首先會檢測服務器是否有該特徵值的文件，，如果有的話，就不需要佔用網絡帶寬,直接複製一份到你的網盤。今天分享的這篇文章便是爲秒傳打下堅實基礎的，獲取大文件的特徵值-MD5.

MD5消息摘要算法（英語：MD5 Message-Digest Algorithm），一種被廣泛使用的密碼散列函數，可以產生出一個128位（16字節）的散列值（hash value），用於確保信息傳輸完整一致。MD5由羅納德·李維斯特設計，於1992年公開，用以替換MD4算法

MessageDigest

在java.security這個包下有一個類MessageDigest ，通過名字我們就知道是消息摘要的意思，那麼本篇文章也是有MessageDigest 這個類展開討論。

//方法1：返回MessageDigest實例 algorithm算法名稱
public static MessageDigest getInstance(String algorithm)
            throws NoSuchAlgorithmException {}
//方法2：更新計算消息摘要的數據內容
public void update(byte[] input) {}

//方法3：計算消息摘要並重置
 public byte[] digest(){}

對於計算文件的MD5，我們主要用的上面的幾個方法。方法1主要是進行初始化操作，需要指定算法，方法2是進行消息摘要內容的更新。而方法3就是最重要的一步，計算消息摘要的值並返回。

讀取文件

對於文件的讀取有很多種方式，例如通過FileInputStream讀取字節流，也可以包裝成InputStreamReader讀取字節流，也可以包裝成BufferedInputStream進行帶緩衝區的讀取，以及RandomAccessFile或者nio 包中FileChannel加內存映射的方式。當然各種方式的性能不言而喻（對流不熟悉的自行補腦）。

具體實現

FileInputStream字節流方式

/**
     * 獲取文件的MD5值
     *
     * @param file 文件路徑
     * @return md5
     */
    public static String getFileMd5(File file) {
        MessageDigest messageDigest;
        //MappedByteBuffer byteBuffer = null;
        FileInputStream fis = null;
        try {
            messageDigest = MessageDigest.getInstance("MD5");
            if (file == null) {
                return "";
            }
            if (!file.exists()) {
                return "";
            }
            int len = 0;
            fis = new FileInputStream(file);
            //普通流讀取方式
            byte[] buffer = new byte[1024 * 1024 * 10];
            while ((len = fis.read(buffer)) > 0) {
                //該對象通過使用 update（）方法處理數據
                messageDigest.update(buffer, 0, len);
            }
            BigInteger bigInt = new BigInteger(1, messageDigest.digest());
            String md5 = bigInt.toString(16);
            while (md5.length() < 32) {
                md5 = "0" + md5;
            }
            return md5;
        } catch (NoSuchAlgorithmException e) {
            // TODO Auto-generated catch block
            e.printStackTrace();
        } catch (FileNotFoundException e) {
            // TODO Auto-generated catch block
            e.printStackTrace();
        } catch (IOException e) {
            // TODO Auto-generated catch block
            e.printStackTrace();
        } finally {
            try {
                if (fis != null) {
                    fis.close();
                    fis = null;
                }
            } catch (IOException e) {
                // TODO Auto-generated catch block
                e.printStackTrace();
            }
        }
        return "";
    }

FileChannel +MappedByteBuffer 方式

    /**
     * FileChannel 獲取文件的MD5值
     *
     * @param file 文件路徑
     * @return md5
     */
    public static String getFileMd52(File file) {
        MessageDigest messageDigest;
        FileInputStream fis = null;
        FileChannel ch=null;
        try {
            messageDigest = MessageDigest.getInstance("MD5");
            if (file == null) {
                return "";
            }
            if (!file.exists()) {
                return "";
            }
            fis = new FileInputStream(file);
            ch = fis.getChannel();
            int size = 1024 * 1024 * 10;
            long part = file.length() / size + (file.length() % size > 0 ? 1 : 0);
            System.err.println("文件分片數" + part);
            for (int j = 0; j < part; j++) {
                MappedByteBuffer byteBuffer = ch.map(FileChannel.MapMode.READ_ONLY, j * size, j == part - 1 ? file.length() : (j + 1) * size);
                messageDigest.update(byteBuffer);
                byteBuffer.clear();
            }
            BigInteger bigInt = new BigInteger(1, messageDigest.digest());
            String md5 = bigInt.toString(16);
            while (md5.length() < 32) {
                md5 = "0" + md5;
            }
            return md5;
        } catch (NoSuchAlgorithmException e) {
            // TODO Auto-generated catch block
            e.printStackTrace();
        } catch (FileNotFoundException e) {
            // TODO Auto-generated catch block
            e.printStackTrace();
        } catch (IOException e) {
            // TODO Auto-generated catch block
            e.printStackTrace();
        } finally {
            try {
                if (fis != null) {
                    fis.close();
                    fis = null;
                }
                if (ch!=null){
                    ch.close();
                    ch=null;
                }
            } catch (IOException e) {
                // TODO Auto-generated catch block
                e.printStackTrace();
            }
        }
        return "";
    }

RandomAccessFile 方式

/**
     * RandomAccessFile 獲取文件的MD5值
     *
     * @param file 文件路徑
     * @return md5
     */
    public static String getFileMd53(File file) {
        MessageDigest messageDigest;
        RandomAccessFile randomAccessFile = null;
        try {
            messageDigest = MessageDigest.getInstance("MD5");
            if (file == null) {
                return "";
            }
            if (!file.exists()) {
                return "";
            }
            randomAccessFile=new RandomAccessFile(file,"r");
            byte[] bytes=new byte[1024*1024*10];
            int len=0;
            while ((len=randomAccessFile.read(bytes))!=-1){
                messageDigest.update(bytes,0, len);
            }
            BigInteger bigInt = new BigInteger(1, messageDigest.digest());
            String md5 = bigInt.toString(16);
            while (md5.length() < 32) {
                md5 = "0" + md5;
            }
            return md5;
        } catch (NoSuchAlgorithmException e) {
            // TODO Auto-generated catch block
            e.printStackTrace();
        } catch (FileNotFoundException e) {
            // TODO Auto-generated catch block
            e.printStackTrace();
        } catch (IOException e) {
            // TODO Auto-generated catch block
            e.printStackTrace();
        } finally {
            try {
                if (randomAccessFile != null) {
                    randomAccessFile.close();
                    randomAccessFile = null;
                }
            } catch (IOException e) {
                // TODO Auto-generated catch block
                e.printStackTrace();
            }
        }
        return "";
    }

性能對比

我們選了一個小的文件，大概1M左右，觀察執行時間

11-09 11:49:20.210 12678-12678/com.example.xh I/System.out: FileInputStream執行時間：179
11-09 11:49:20.266 12678-12678/com.example.xh I/System.out: FileChannel執行時間：55
11-09 11:49:20.322 12678-12678/com.example.xh I/System.out: RandomAccessFile執行時間：58

但是我選擇大概10M的文件FileChannel+MappedByteBuffer性能並不明顯，最後通過查詢資料學習發現MappedByteBuffer這個東西很可怕，這個回收是不確定的，在手機上測試FileChannel效率並不是最好的。如果要計算一個幾百兆的大文件，發現FileChannel+MappedByteBuffer還很容易OOM,原因就是MappedByteBuffer內存佔用、文件關閉不確定，被其打開的文件只有在垃圾回收的纔會被關閉，而且這個時間點是不確定的。當文件達到100M時就出現OOM如下

FATAL EXCEPTION: main
java.lang.OutOfMemoryError
at java.security.MessageDigestSpi.engineUpdate(MessageDigestSpi.java:85)
at java.security.MessageDigest.update(MessageDigest.java:369)

所以在Android設備上儘量不要使用nio中的內存映射。在官方文檔中有這樣的一句話：A mapped byte buffer and the file mapping that it represents remain valid until the buffer itself is garbage-collected.
那麼我們來計算一個大文件的MD5,此時我測試的文件是300多兆

11-09 16:06:49.930 3101-3101/com.example.xh I/System.out: FileInputStream執行時間：4219
11-09 16:06:54.490 3101-3101/com.example.xh I/System.out: RandomAccessFile執行時間：2162

通過日誌發現RandomAccessFile的效率還是很明顯的，此時使用FileChannel+MappedByteBuffer就OOM了，雖然使用了分段映射也調用了MappedByteBuffer的clear()方法。當然通過日誌你肯定明白計算文件MD5值是一個比較耗時的操作，不要再主線程中計算。

計算MD5

我們需要注意對於較大的文件計算MD5,我們不要一次將文件讀取然後調用update方法。不然執行update方法時就會出現OOM。我們可以分段讀取多次調用update方法，如下

            while ((len = fis.read(buffer)) > 0) {
                //該對象通過使用 update（）方法處理數據
                messageDigest.update(buffer, 0, len);
            }

你要明白調用執行update並沒有計算MD5的值，真正計算的MD5值是調用digest()，該方法返回的是一個byte數組

         byte[] bytes = messageDigest.digest();

通常我們一般將MD5用16進制也就是32位表示，所以我們可以將byte數組轉化爲16進制，此時我們可以使用BigInteger類，他的構造方法可以接收byte數組參數，如下，1表示符號爲正數。

         BigInteger bigInt = new BigInteger(1, bytes );

BigInteger這個類還提供了一個toString方法該參數可以指定轉化數據格式，由於我們轉化爲16進制，所以參數可以寫16，如下

        String md5 = bigInt.toString(16);

OK了，MD5的值已經出現了，不過你可能會疑問了，轉化爲16進制的話，MD5值應該是32位，爲何有時候計算的值不是32位，而是31位呢？甚至還可能更少，原因就是digest()返回值的高位包含了0，當然高位0是不寫的，所以就出現少位的情況，這也就有了下面的代碼，如果不到32位我們再高位補0就好了。

            while (md5.length() < 32) {
                md5 = "0" + md5;
            }

至此，本篇文章結束，若有不足的地方歡迎指正，謝謝。

Code4Android

發佈了47 篇原創文章 · 獲贊 42 · 訪問量 35萬+

私信關注

Android大文件上傳秒傳之MD5篇

前言

分析

MessageDigest

讀取文件

具體實現

FileInputStream字節流方式

FileChannel +MappedByteBuffer 方式

RandomAccessFile 方式

性能對比

計算MD5

elk3

Python 將PDF轉爲PDF/A、PDF/X，以及PDF/A轉回PDF

號稱能打敗MLP的KAN到底行不行？數學核心原理全面解析

同事使用 insert into select 遷移數據，開開心心上線，上線後被公司開除！

DeepFilterNet復現

致敬我奮起直追的2016

RxJava操作符系列三

RecyclerView實現滑動刪除和拖拽功能

RxJava操作符系列二

ReactNative WebView組件詳解屬性函數實戰分析 JavaScript 最後補充：

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結