數據庫插入百萬數據

這是對一次數據庫作業的深究

首先說一下作業題目要求：

建立一張包含四個字段的表，表名爲test

第一列爲id，主鍵，自增。

第二列爲col1，隨機爲Mike，Bob，Jack，Alice，Cathy，Ann，Betty，Cindy，Mary，Jane中的一個

第三列爲col2，隨機爲一個5位字母，字母限制在a-e

第三列爲col3，隨機爲一個1-20之間的整數

按照步驟一中對錶的要求插入100萬條記錄，記錄執行的時間

對要插入的數據範圍進行一定的預處理

(1)對於col1，創建取值範圍數組

private static String[] col1Values={"Mike","Bob","Jack","Alice","Cathy","Ann","Betty","Cindy","Mary","Jane"};

隨機獲取的時候只要調用 col1Values[(int)(Math.random()*10)] 即可。

(2)對於col2，通過遞歸創建取值範圍數組

private static String[] col2Values=new String[3125]; static{ point=0; initCol2Value(5,new StringBuffer("")); } private static void initCol2Value(int n,StringBuffer str){ if(n==0){ col2Values[point++]=new String(str); return; } for(int i=0;i<5;i++){ StringBuffer strTemp = new StringBuffer(str); initCol2Value(n-1,strTemp.append((char)('a'+i))); } }

隨機獲取的時候只要調用col2Values[(int)(Math.random()*3125)]即可。

(3)對於col3，隨機獲取的時候只要(int)(Math.random()*20)+1即可。

插入大數據量的數據

(1)首先想到的方法當然是傳統的一行一行的插入方法：通過Connection獲得Statement，再調用Statement對象的execute函數執行sql語句，插入一行，這樣循環100萬次即可，但是時間複雜度太高，估計沒有個把小時是搞不定的。

(2)然後想到了對sql語句進行預處理，於是很大程度上提高了效率。下面是這部分代碼的核心部分。

public static void insertData() { try { System.out.println("start insert data"); Long beginTime = System.currentTimeMillis(); conn.setAutoCommit(false); PreparedStatement pst = conn .prepareStatement("INSERT INTO test(col1,col2,col3)values(?,?,?)"); for (int i = 1; i <= 1000000; i++) { pst.setString(1, col1Values[(int) (Math.random() * 10)]); pst.setString(2, col2Values[(int) (Math.random() * 3125)]); pst.setInt(3, (int) (Math.random() * 20) + 1); pst.execute(); } conn.commit(); pst.close(); Long endTime = System.currentTimeMillis(); System.out.println("end insert data"); System.out.println("insert time: " + (double) (endTime - beginTime) / 1000 + " s"); System.out.println(); } catch (SQLException ce) { System.out.println(ce); } }

測試結果如下：

start insert data
end insert data
insert time: 110.215 s

(3)對於上面的結果還是不太滿意，於是便開始了探索。

(a)從網上看到一個方法，使用在PreparedStatement 類上的addBatch(),executeBatch()方法，通過批量處理，可以一次性的將1000甚至10000個sql插入操作作爲一個事務進行批量優化，並且作者在oracle的數據庫上測試過時間是低於10s的。於是我也嘗試了一下，發現依然是107s左右，於是便迷茫了。

(b)這個時候看到網上的另外一篇文章，解釋了爲什麼MySql的JDBC驅動不支持批量操作，原來Mysql不支持addBatch(),executeBatch()等方法的批量優化，而Oracle則數據庫支持，並且可以在360 ms左右的時間插入100萬條記錄

網址：http://elf8848.iteye.com/blog/770032

(c)後來看到葛班長的日誌，他通過Python在SQLite中插入100萬條數據只用了4秒，原因在於Python對所有的這100萬條插入語句進行了優化，將所有的插入操作放到了同一個事務中，這樣極大的減少了開啓和取消事務的時間，而正是這部分操作會消耗大量的時間。

網址：http://aegiryy.net/?p=380

(d)於是我受到了啓發，並且瞭解到對於Mysql數據庫的操作時，一個sql插入語句中可以插入多行數據。於是我嘗試通過StringBuffer構造一個比較大的sql語句，每個語句可以插入1萬行的數據（如果是10萬或者100萬的話會超出堆內存限制），這樣循環100次即可完成插入。下面是這種方法的核心代碼：

public static void insertData() { try { System.out.println("start insert data"); Long beginTime = System.currentTimeMillis(); Statement st = conn.createStatement(); for (int i = 0; i < 100; i++) { StringBuffer sqlBuffer = new StringBuffer( "insert into test (col1,col2,col3) values"); sqlBuffer.append(" (/"" + col1Values[(int) (Math.random() * 10)] + "/",/"" + col2Values[(int) (Math.random() * 3125)] + "/"," + ((int) (Math.random() * 20) + 1) + ")"); for (int j = 2; j <= 10000; j++) { sqlBuffer.append(" ,(/"" + col1Values[(int) (Math.random() * 10)] + "/",/"" + col2Values[(int) (Math.random() * 3125)] + "/"," + ((int) (Math.random() * 20) + 1) + ")"); } sqlBuffer.append(";"); String sql = new String(sqlBuffer); st.execute(sql); } Long endTime = System.currentTimeMillis(); System.out.println("end insert data"); System.out.println("insert time: " + (double) (endTime - beginTime) / 1000 + " s"); System.out.println(); } catch (SQLException ce) { System.out.println(ce); } }

測試結果如下：

start insert data
end insert data
insert time: 15.083 s

(e)最後我想到了再將這種方法優化，採用預處理的方式，在代碼易讀性和效率上都有所提高，雖然效率提高的不多。下面是這個方法的核心代碼：

public static void insertData() { try { conn.setAutoCommit(false); StringBuffer sqlBuffer = new StringBuffer( "insert into test (col1,col2,col3) values"); sqlBuffer.append("(?,?,?)"); for (int j = 2; j <= 10000; j++) { sqlBuffer.append(",(?,?,?)"); } sqlBuffer.append(";"); String sql = new String(sqlBuffer); PreparedStatement pst = conn.prepareStatement(sql); System.out.println("start insert data"); Long beginTime = System.currentTimeMillis(); for (int i = 0; i < 100; i++) { for (int j = 0; j < 10000; j++) { pst.setString(3 * j + 1, col1Values[(int) (Math.random() * 10)]); pst.setString(3 * j + 2, col2Values[(int) (Math.random() * 3125)]); pst.setInt(3 * j + 3, (int) (Math.random() * 20) + 1); } pst.execute(); } conn.commit(); pst.close(); Long endTime = System.currentTimeMillis(); System.out.println("end insert data"); System.out.println("insert time: " + (double) (endTime - beginTime) / 1000 + " s"); System.out.println(); } catch (SQLException ce) { System.out.println(ce); } }

測試結果如下：

start insert data
end insert data
insert time: 14.47 s

最後貼出最終個解決方案的所有代碼：

package godfrey.nju; import java.sql.Connection; import java.sql.DriverManager; import java.sql.PreparedStatement; import java.sql.ResultSet; import java.sql.SQLException; import java.sql.Statement; public class TestDB2 { private static String dbClassName = "com.mysql.jdbc.Driver"; private static String dbUrl = "jdbc:mysql://localhost:3306/db_test"; private static String dbUser = "root"; private static String dbPwd = "123"; private static Connection conn = null; private static String[] col1Values = { "Mike", "Bob", "Jack", "Alice", "Cathy", "Ann", "Betty", "Cindy", "Mary", "Jane" }; private static String[] col2Values = new String[3125]; private static int point; public static void main(String args[]) { insertData(); // query1(); // clearData(); } public static void insertData() { try { conn.setAutoCommit(false); StringBuffer sqlBuffer = new StringBuffer( "insert into test (col1,col2,col3) values"); sqlBuffer.append("(?,?,?)"); for (int j = 2; j <= 10000; j++) { sqlBuffer.append(",(?,?,?)"); } sqlBuffer.append(";"); String sql = new String(sqlBuffer); PreparedStatement pst = conn.prepareStatement(sql); System.out.println("start insert data"); Long beginTime = System.currentTimeMillis(); for (int i = 0; i < 100; i++) { for (int j = 0; j < 10000; j++) { pst.setString(3 * j + 1, col1Values[(int) (Math.random() * 10)]); pst.setString(3 * j + 2, col2Values[(int) (Math.random() * 3125)]); pst.setInt(3 * j + 3, (int) (Math.random() * 20) + 1); } pst.execute(); } conn.commit(); pst.close(); Long endTime = System.currentTimeMillis(); System.out.println("end insert data"); System.out.println("insert time: " + (double) (endTime - beginTime) / 1000 + " s"); System.out.println(); } catch (SQLException ce) { System.out.println(ce); } } public static void query1() { try { System.out .println("start query1: 'select count(*) from test group by col1 order by count(*);'"); Long beginTime = System.currentTimeMillis(); Statement st = conn.createStatement(); String sql = "select count(*) from test group by col1 order by count(*);"; ResultSet rs = st.executeQuery(sql); Long endTime = System.currentTimeMillis(); System.out.println("result:"); while (rs.next()) { System.out.println(rs.getInt(1)); } System.out.println("query1 time: " + (double) (endTime - beginTime) / 1000 + " s"); st.close(); conn.close(); } catch (Exception e) { e.printStackTrace(); } } public static void clearData() { try { System.out.println("start delete all data"); Long beginTime = System.currentTimeMillis(); Statement st = conn.createStatement(); String sql = "delete from test"; st.execute(sql); st.close(); conn.close(); Long endTime = System.currentTimeMillis(); System.out.println("end delete all data"); System.out.println("delete time: " + (double) (endTime - beginTime) / 1000 + " s"); } catch (Exception e) { e.printStackTrace(); } } static { try { Class.forName(dbClassName).newInstance(); conn = DriverManager.getConnection(dbUrl, dbUser, dbPwd); } catch (Exception e) { e.printStackTrace(); } point = 0; initCol2Value(5, new StringBuffer("")); } private static void initCol2Value(int n, StringBuffer str) { if (n == 0) { col2Values[point++] = new String(str); return; } for (int i = 0; i < 5; i++) { StringBuffer strTemp = new StringBuffer(str); initCol2Value(n - 1, strTemp.append((char) ('a' + i))); } } }

數據庫插入百萬數據

Android研究之Activity組件

wordpress站點的統計

android度量相對於地球座標系的加速度

幾道面試到的算法題

數據庫插入百萬數據

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結