背景
最近在SpringCloud項目中, 使用Mybatis-Plus執行一個88萬條左右的數據插入MySQL數據庫的操作時, 發現執行時長竟然長達2個小時, 按理講, MP框架執行如下批處理操作時:
- XXService.insertBatch()
- XXService.updateBatchById()
- xxService.deleteBatchIds()
- xxService.selectBatchIds
在jdbc的底層將使用 stmt.addBatch() 和 stmt.executeBatch()方法, 根據以往使用原生jdbc 批處理方式 執行sql的經驗來看, 執行性能非常高, 處理時長非常短. 即使使用MP封裝對jdbc底層的操作, 但這個場景下執行如此緩慢, 此處必定有幺蛾子.
排查歷程:
一. 設置開啓控制檯完整sql打印模式
方式①:
application-[dev|prod|mybatisplus].yml 中 添加 log-impl參數項(如下示例最後一項):
configuration:
#配置返回數據庫(column下劃線命名&&返回java實體是駝峯命名),自動匹配無需as(沒開啓這個,SQL需要寫as: select user_id as userId)
map-underscore-to-camel-case: true
cache-enabled: false
jdbc-type-for-null: null
log-impl: org.apache.ibatis.logging.stdout.StdOutImpl
方式②:
設置logback打印詳細日誌:
logging:
config: classpath:logback-spring.xml
level:
root: debug #測試sql語句打印時,請置爲 debug
方式③:
mybatis:
configuration:
log-impl: org.apache.ibatis.logging.stdout.StdOutImpl
原理同①
二. 啓動工程, 查看sql日誌:
啓動工程前, 將第二個參數batchSize 調小爲 3:
status = insertBatch(list, 3);
啓動項目, 可以看到每每4行會有一個循環:
分別是:
第一條sql語句爲insert into插入語句模板:
第二/三/四條語句爲數據條目,可以明顯看到與上述batchSize配置一樣, 爲3條.
==> Preparing: INSERT INTO test_table ( `day`, day_time, plat_type, uv,created,updated ) VALUES ( ?, ?, ?, ?,?,? )
==> Parameters: 2020-05-31 00:00:00.0(Timestamp), 2020-05-31 14:00:00.0(Timestamp), Android(String), 11(Integer), 2020-06-05 12:23:24.437(Timestamp), 2020-06-05 12:23:24.437(Timestamp)
==> Parameters: 2020-05-31 00:00:00.0(Timestamp), 2020-05-31 14:00:00.0(Timestamp), IOS(String), 22(Integer), 2020-06-05 12:23:24.437(Timestamp), 2020-06-05 12:23:24.437(Timestamp)
==> Parameters: 2020-05-31 00:00:00.0(Timestamp), 2020-05-31 14:00:00.0(Timestamp), MP(String), 33(Integer), 2020-06-05 12:23:24.437(Timestamp), 2020-06-05 12:23:24.437(Timestamp)
==> Preparing: INSERT INTO test_table ( `day`, day_time, plat_type, uv,created,updated ) VALUES ( ?, ?, ?, ?,?,? )
==> Parameters: 2020-05-31 00:00:00.0(Timestamp), 2020-05-31 14:00:00.0(Timestamp), H5(String), 44(Integer), 2020-06-05 12:23:24.437(Timestamp), 2020-06-05 12:23:24.437(Timestamp)
==> Parameters: 2020-05-31 00:00:00.0(Timestamp), 2020-05-31 14:00:00.0(Timestamp), PC(String), 55(Integer), 2020-06-05 12:23:24.437(Timestamp), 2020-06-05 12:23:24.437(Timestamp)
==> Parameters: 2020-05-31 00:00:00.0(Timestamp), 2020-05-31 14:00:00.0(Timestamp), QuickApp(String), 66(Integer), 2020-06-05 12:23:24.437(Timestamp), 2020-06-05 12:23:24.437(Timestamp)
....
Console這種查看方式,顯示的sql不太直觀, 無法判斷後臺發送了幾條, 是執行的單條insert還是insertBatch, 打開服務端mysql log日誌(位置: /tmp/mysql.sql)查看:
命令: tail -100f /tmp/mysql.log, 然後執行業務邏輯批插入.
如下圖所示, MP裏的insertBatch 語句 確實是按單次插入給執行了:
111 Query SELECT @@session.tx_read_only
111 Query SELECT @@session.tx_isolation
111 Query SELECT @@session.tx_read_only
111 Query INSERT INTO test_table
( `day`,
day_time,
plat_type,
uv,created,updated ) VALUES
( '2020-05-31 00:00:00.0',
'2020-05-31 00:00:00.0',
'Android',
11,'2020-06-05 12:23:24.437','2020-06-05 12:23:24.437' )
111 Query INSERT INTO test_table
( `day`,
day_time,
plat_type,
uv,created,updated ) VALUES
( '2020-05-31 00:00:00.0',
'2020-05-31 00:00:00.0',
'IOS',
22,'2020-06-05 12:23:24.437','2020-06-05 12:23:24.437' )
111 Query INSERT INTO test_table
( `day`,
day_time,
plat_type,
uv,created,updated ) VALUES
( '2020-05-31 00:00:00.0',
'2020-05-31 00:00:00.0',
'MP',
33,'2020-06-05 12:23:24.437','2020-06-05 12:23:24.437' )
三. 跟蹤MY insertBatch源碼
個人理解以註釋形式填入以下源碼中.
/**
* 批量插入
*
* @param entityList
* @param batchSize
* @return
*/
@Transactional(rollbackFor = Exception.class)
@Override
public boolean insertBatch(List<T> entityList, int batchSize) {
if (CollectionUtils.isEmpty(entityList)) {
throw new IllegalArgumentException("Error: entityList must not be empty");
}
try (SqlSession batchSqlSession = sqlSessionBatch()) {
//數據總條數
int size = entityList.size();
//根據枚舉中定義的模板, 形成上述日誌中sql插入語句,如下:
//Preparing: INSERT INTO test_table ( `day`, day_time, plat_type, uv,created,updated ) VALUES ( ?, ?, ?, ?,?,? )
String sqlStatement = sqlStatement(SqlMethod.INSERT_ONE);
//核心批量處理代碼, 使用循環的方式, 以類似原生addBatch 和 executeBatch 方法的方式實現批量處理
for (int i = 0; i < size; i++) {
//類比原生jdbc的 addBatch 方法,在此之前,先根據本條數據, 構建insert into插入語句.
batchSqlSession.insert(sqlStatement, entityList.get(i));
//如果數據條數爲2條以上, 且能被batchSize整除, 則向DBMS批量提交執行一次
if (i >= 1 && i % batchSize == 0) {
batchSqlSession.flushStatements();
}
}
//剩餘不能整除的少量數據(小於batchSize), 再向DBMS批量提交執行一次
batchSqlSession.flushStatements();
} catch (Throwable e) {
throw new MybatisPlusException("Error: Cannot execute insertBatch Method. Cause", e);
}
return true;
}
SqlMethod.INSERT_ONE枚舉值定義的類源碼:
/**
* <p>
* MybatisPlus 支持 SQL 方法
* </p>
*
* @author hubin
* @Date 2016-01-23
*/
public enum SqlMethod {
/**
* 插入
*/
INSERT_ONE("insert", "插入一條數據(選擇字段插入)", "<script>INSERT INTO %s %s VALUES %s</script>"),
INSERT_ONE_ALL_COLUMN("insertAllColumn", "插入一條數據(全部字段插入)", "<script>INSERT INTO %s %s VALUES %s</script>"),
/**
* 刪除
*/
DELETE_BY_ID("deleteById", "根據ID 刪除一條數據", "<script>DELETE FROM %s WHERE %s=#{%s}</script>"),
DELETE_BY_MAP("deleteByMap", "根據columnMap 條件刪除記錄", "<script>DELETE FROM %s %s</script>"),
DELETE("delete", "根據 entity 條件刪除記錄", "<script>DELETE FROM %s %s</script>"),
DELETE_BATCH_BY_IDS("deleteBatchIds", "根據ID集合,批量刪除數據", "<script>DELETE FROM %s WHERE %s IN (%s)</script>"),
/**
* 邏輯刪除
*/
LOGIC_DELETE_BY_ID("deleteById", "根據ID 邏輯刪除一條數據", "<script>UPDATE %s %s WHERE %s=#{%s}</script>"),
LOGIC_DELETE_BY_MAP("deleteByMap", "根據columnMap 條件邏輯刪除記錄", "<script>UPDATE %s %s %s</script>"),
LOGIC_DELETE("delete", "根據 entity 條件邏輯刪除記錄", "<script>UPDATE %s %s %s</script>"),
LOGIC_DELETE_BATCH_BY_IDS("deleteBatchIds", "根據ID集合,批量邏輯刪除數據", "<script>UPDATE %s %s WHERE %s IN (%s)</script>"),
/**
* 修改
*/
UPDATE_BY_ID("updateById", "根據ID 選擇修改數據", "<script>UPDATE %s %s WHERE %s=#{%s} %s</script>"),
UPDATE_ALL_COLUMN_BY_ID("updateAllColumnById", "根據ID 修改全部數據", "<script>UPDATE %s %s WHERE %s=#{%s} %s</script>"),
UPDATE("update", "根據 whereEntity 條件,更新記錄", "<script>UPDATE %s %s %s</script>"),
UPDATE_FOR_SET("updateForSet", "根據 whereEntity 條件,自定義Set值更新記錄", "<script>UPDATE %s %s %s</script>"),
/**
* 邏輯刪除 -> 修改
*/
LOGIC_UPDATE_BY_ID("updateById", "根據ID 修改數據", "<script>UPDATE %s %s WHERE %s=#{%s} %s</script>"),
LOGIC_UPDATE_ALL_COLUMN_BY_ID("updateAllColumnById", "根據ID 選擇修改數據", "<script>UPDATE %s %s WHERE %s=#{%s} %s</script>"),
/**
* 查詢
*/
SELECT_BY_ID("selectById", "根據ID 查詢一條數據", "SELECT %s FROM %s WHERE %s=#{%s}"),
SELECT_BY_MAP("selectByMap", "根據columnMap 查詢一條數據", "<script>SELECT %s FROM %s %s</script>"),
SELECT_BATCH_BY_IDS("selectBatchIds", "根據ID集合,批量查詢數據", "<script>SELECT %s FROM %s WHERE %s IN (%s)</script>"),
SELECT_ONE("selectOne", "查詢滿足條件一條數據", "<script>SELECT %s FROM %s %s</script>"),
SELECT_COUNT("selectCount", "查詢滿足條件總記錄數", "<script>SELECT COUNT(1) FROM %s %s</script>"),
SELECT_LIST("selectList", "查詢滿足條件所有數據", "<script>SELECT %s FROM %s %s</script>"),
SELECT_PAGE("selectPage", "查詢滿足條件所有數據(並翻頁)", "<script>SELECT %s FROM %s %s</script>"),
SELECT_MAPS("selectMaps", "查詢滿足條件所有數據", "<script>SELECT %s FROM %s %s</script>"),
SELECT_MAPS_PAGE("selectMapsPage", "查詢滿足條件所有數據(並翻頁)", "<script>SELECT %s FROM %s %s</script>"),
SELECT_OBJS("selectObjs", "查詢滿足條件所有數據", "<script>SELECT %s FROM %s %s</script>"),
/**
* 邏輯刪除 -> 查詢
*/
LOGIC_SELECT_BY_ID("selectById", "根據ID 查詢一條數據", "SELECT %s FROM %s WHERE %s=#{%s} %s"),
LOGIC_SELECT_BATCH_BY_IDS("selectBatchIds", "根據ID集合,批量查詢數據", "<script>SELECT %s FROM %s WHERE %s IN (%s) %s</script>");
private final String method;
private final String desc;
private final String sql;
SqlMethod(final String method, final String desc, final String sql) {
this.method = method;
this.desc = desc;
this.sql = sql;
}
public String getMethod() {
return this.method;
}
public String getDesc() {
return this.desc;
}
public String getSql() {
return this.sql;
}
}
經過MP複雜的封裝 和 調用, 最終定位到 insertBatch 的底層執行邏輯方法爲doFlushStatements(boolean isRollback) , 位於
org.apache.ibatis.executor.BatchExecutor.java中, 如下:
@Override
public List<BatchResult> doFlushStatements(boolean isRollback) throws SQLException {
try {
//這裏的 BatchResult 並非爲批操作執行後的結果集, 而是 mybatis封裝的MappedStatement(類比jdbc中的PrepareStatement), 加上 sql 語句, 再加上輸入參數parameterObject的包裝類.
List<BatchResult> results = new ArrayList<BatchResult>();
//如果sql被回滾,則返回空集合.
if (isRollback) {
return Collections.emptyList();
}
//遍歷 Statement
for (int i = 0, n = statementList.size(); i < n; i++) {
Statement stmt = statementList.get(i);
applyTransactionTimeout(stmt);
//將Statement/sql/與請求參數包裝成BatchResult對象
BatchResult batchResult = batchResultList.get(i);
try {
//Statement執行的executeBatch條數賦值給updateCounts字段.
batchResult.setUpdateCounts(stmt.executeBatch());
MappedStatement ms = batchResult.getMappedStatement();
List<Object> parameterObjects = batchResult.getParameterObjects();
//獲取KeyGenerator
KeyGenerator keyGenerator = ms.getKeyGenerator();
//如果是jdbc類型的KeyGenerator, 走下面的processBatch方法, 進而走jdbc底層的executeBatch方法.
if (Jdbc3KeyGenerator.class.equals(keyGenerator.getClass())) {
Jdbc3KeyGenerator jdbc3KeyGenerator = (Jdbc3KeyGenerator) keyGenerator;
jdbc3KeyGenerator.processBatch(ms, stmt, parameterObjects);
} else if (!NoKeyGenerator.class.equals(keyGenerator.getClass())) { //issue #141
for (Object parameter : parameterObjects) {
keyGenerator.processAfter(this, ms, stmt, parameter);
}
}
// Close statement to close cursor #1109
closeStatement(stmt);
} catch (BatchUpdateException e) { //異常返回的消息包裝
StringBuilder message = new StringBuilder();
message.append(batchResult.getMappedStatement().getId())
.append(" (batch index #")
.append(i + 1)
.append(")")
.append(" failed.");
if (i > 0) {
message.append(" ")
.append(i)
.append(" prior sub executor(s) completed successfully, but will be rolled back.");
}
throw new BatchExecutorException(message.toString(), e, results, batchResult);
}
results.add(batchResult);
}
return results;
} finally { //數據庫連接相關的資源釋放
for (Statement stmt : statementList) {
closeStatement(stmt);
}
currentSql = null;
statementList.clear();
batchResultList.clear();
}
}
通過上述代碼, 看到了其底層調用了jdbc的 stmt.executeBatch() 方法.
四. 跟蹤JDBC executeBatch() 源碼
打開本工程使用的mysql驅動源碼(mysql-connector-java-8.0.16-sources.jar) , 在StatementImpl.java 類中, 可以看到批處理執行語句executeBatch()方法如下:
@Override
public int[] executeBatch() throws SQLException {
return Util.truncateAndConvertToInt(executeBatchInternal());
}
其又調用了executeBatchInternal()方法:
該方法源碼如下:
protected long[] executeBatchInternal() throws SQLException {
JdbcConnection locallyScopedConn = checkClosed();
synchronized (locallyScopedConn.getConnectionMutex()) {
if (locallyScopedConn.isReadOnly()) {
throw SQLError.createSQLException(Messages.getString("Statement.34") + Messages.getString("Statement.35"),
MysqlErrorNumbers.SQL_STATE_ILLEGAL_ARGUMENT, getExceptionInterceptor());
}
implicitlyCloseAllOpenResults();
List<Object> batchedArgs = this.query.getBatchedArgs();
if (batchedArgs == null || batchedArgs.size() == 0) {
return new long[0];
}
// we timeout the entire batch, not individual statements
int individualStatementTimeout = getTimeoutInMillis();
setTimeoutInMillis(0);
CancelQueryTask timeoutTask = null;
try {
resetCancelledState();
statementBegins();
try {
this.retrieveGeneratedKeys = true; // The JDBC spec doesn't forbid this, but doesn't provide for it either...we do..
long[] updateCounts = null;
if (batchedArgs != null) {
int nbrCommands = batchedArgs.size();
this.batchedGeneratedKeys = new ArrayList<>(batchedArgs.size());
boolean multiQueriesEnabled = locallyScopedConn.getPropertySet().getBooleanProperty(PropertyKey.allowMultiQueries).getValue();
if (multiQueriesEnabled || (locallyScopedConn.getPropertySet().getBooleanProperty(PropertyKey.rewriteBatchedStatements).getValue()
&& nbrCommands > 4)) {
return executeBatchUsingMultiQueries(multiQueriesEnabled, nbrCommands, individualStatementTimeout);
}
timeoutTask = startQueryTimer(this, individualStatementTimeout);
updateCounts = new long[nbrCommands];
for (int i = 0; i < nbrCommands; i++) {
updateCounts[i] = -3;
}
SQLException sqlEx = null;
int commandIndex = 0;
for (commandIndex = 0; commandIndex < nbrCommands; commandIndex++) {
try {
String sql = (String) batchedArgs.get(commandIndex);
updateCounts[commandIndex] = executeUpdateInternal(sql, true, true);
if (timeoutTask != null) {
// we need to check the cancel state on each iteration to generate timeout exception if needed
checkCancelTimeout();
}
// limit one generated key per OnDuplicateKey statement
getBatchedGeneratedKeys(this.results.getFirstCharOfQuery() == 'I' && containsOnDuplicateKeyInString(sql) ? 1 : 0);
} catch (SQLException ex) {
updateCounts[commandIndex] = EXECUTE_FAILED;
if (this.continueBatchOnError && !(ex instanceof MySQLTimeoutException) && !(ex instanceof MySQLStatementCancelledException)
&& !hasDeadlockOrTimeoutRolledBackTx(ex)) {
sqlEx = ex;
} else {
long[] newUpdateCounts = new long[commandIndex];
if (hasDeadlockOrTimeoutRolledBackTx(ex)) {
for (int i = 0; i < newUpdateCounts.length; i++) {
newUpdateCounts[i] = java.sql.Statement.EXECUTE_FAILED;
}
} else {
System.arraycopy(updateCounts, 0, newUpdateCounts, 0, commandIndex);
}
sqlEx = ex;
break;
//throw SQLError.createBatchUpdateException(ex, newUpdateCounts, getExceptionInterceptor());
}
}
}
if (sqlEx != null) {
throw SQLError.createBatchUpdateException(sqlEx, updateCounts, getExceptionInterceptor());
}
}
if (timeoutTask != null) {
stopQueryTimer(timeoutTask, true, true);
timeoutTask = null;
}
return (updateCounts != null) ? updateCounts : new long[0];
} finally {
this.query.getStatementExecuting().set(false);
}
} finally {
stopQueryTimer(timeoutTask, false, false);
resetCancelledState();
setTimeoutInMillis(individualStatementTimeout);
clearBatch();
}
}
}
其中, 關鍵的代碼爲:
if (multiQueriesEnabled || (locallyScopedConn.getPropertySet().getBooleanProperty(PropertyKey.rewriteBatchedStatements).getValue()
&& nbrCommands > 4)) {
return executeBatchUsingMultiQueries(multiQueriesEnabled, nbrCommands, individualStatementTimeout);
}
能進入if語句, 並執行批處理方法 executeBatchUsingMultiQueryies 的條件我:
①. multiQueriesEnables = true
PropertyKey.java中定義了 multiQueriesEnables 的枚舉值,如下:
allowMultiQueries("allowMultiQueries", true);
②. 數據庫url連接參數封裝的connection對象的" rewriteBatchedStatements "屬性設置爲true, 並且 數據總條數 > 4條.
根據以上語句,可以得出:
通過在jdbc的連接url處, 設置:
A). &allowMultiQueries=true
B). &rewriteBatchedStatements=true
均可以實現真正的批量insertBatch功能, 批量update, 批量delete同理.
完整的springcloud+druid數據庫連接池實現多數據源配置:
spring:
profiles:
name: test_multi_datasource
aop:
proxy-target-class: true
auto: true
datasource:
druid:
#DBMS1
db1:
url: jdbc:mysql://IP:PORT/database_1?serverTimezone=Asia/Shanghai&useUnicode=true&characterEncoding=utf8&useSSL=false&rewriteBatchedStatements=true
username: xxxx
password: xxxx
driver-class-name: com.mysql.cj.jdbc.Driver
initialSize: 5
minIdle: 5
maxActive: 20
#DBMS1
db2:
url: jdbc:mysql://IP:PORT/database_2?serverTimezone=Asia/Shanghai&useUnicode=true&characterEncoding=utf8&useSSL=false&rewriteBatchedStatements=true
username: yyyy
password: yyyy
driver-class-name: com.mysql.cj.jdbc.Driver
initialSize: 5
minIdle: 5
maxActive: 20
引用列表: