kafka jdbc sink connect源碼調研

JdbcSinkConnector

public class JdbcSinkConnector extends SinkConnector {
  public Class<? extends Task> taskClass() {
  }

  @Override
  public List<Map<String, String>> taskConfigs(int maxTasks) {
  }

  @Override
  public void start(Map<String, String> props) {
  }

  @Override
  public void stop() {
  }

  @Override
  public ConfigDef config() {
  }

  @Override
  public Config validate(Map<String, String> connectorConfigs) {
  }

  @Override
  public String version() {
  }
}

start: connect啓動時的生命週期方法,props爲創建connect實例時填寫的config參數,例如:

{
  "name": "jdbc-sink-debezium", 
  "config": {
    "connector.class": "io.confluent.connect.jdbc.JdbcSinkConnector", 
    "tasks.max": "1",
    "topics": "mysql02.debezium_test_db.person2",
    "table.name.format": "test.person",
    "connection.url": "jdbc:db2://192.168.84.136:50000/TEST", 
    "connection.user": "db2inst1", 
    "connection.password": "root1234", 
    "transforms": "unwrap",
    "transforms.unwrap.type": "io.debezium.transforms.ExtractNewRecordState",
    "transforms.unwrap.drop.tombstones": "false",
    "auto.create": "true",
    "auto.evolve": "true",
    "insert.mode": "insert",
    "delete.enabled": "true",
    "pk.fields": "id",
    "pk.mode": "record_key"
  }
}

taskClass: 指定真正執行任務的類,必須爲SinkTask的子類
taskConfigs: 傳遞給任務對象的配置
這個類的作用主要時指定了任務類爲JdbcSinkTask,並且將啓動參數原封不動的傳遞給任務對象。

JdbcSinkTask

public class JdbcSinkTask extends SinkTask {

  @Override
  public void start(final Map<String, String> props) {
  }

  @Override
  public void put(Collection<SinkRecord> records) {
  }

  public void stop() {
  }
}

start: props爲接受的啓動參數,由JdbcSinkConnector的taskConfigs提供
put: connect從kafka中獲取到的數據後會封裝成record對象調用這個put方法
stop: 任務關閉時執行的生命週期方法

start

先看start方法:

@Override
  public void start(final Map<String, String> props) {
    log.info("Starting JDBC Sink task"); 
    config = new JdbcSinkConfig(props);   //1
    initWriter();						  //2
    remainingRetries = config.maxRetries; //3
  }

1號代碼就是將傳入的配置封裝成JdbcSinkConfig對象
2號代碼初始化了writer屬性
3號代碼讀取了配置中的maxRetries賦值給了remainingRetries

進入initWriter方法:

  void initWriter() {
    if (config.dialectName != null && !config.dialectName.trim().isEmpty()) { 	//1
      dialect = DatabaseDialects.create(config.dialectName, config);    		//2
    } else {
      dialect = DatabaseDialects.findBestFor(config.connectionUrl, config); 	//3
    }
    final DbStructure dbStructure = new DbStructure(dialect);					//4
    log.info("Initializing writer using SQL dialect: {}", dialect.getClass().getSimpleName());
    writer = new JdbcDbWriter(config, dialect, dbStructure);					//5
  }

1號代碼在校驗是否指定了dialect,也就是數據庫方言,方便後續2、3號代碼去創建具體的DatabaseDialect對象,這裏假設是Db2DatabaseDialect
4號代碼創建了DbStructure對象,DbStructure中包含了一些與數據庫結構相關的方法,例如創建不存在的表,創建缺失的列
5號代碼創建了一個JdbcDbWriter對象,核心方法write負責向DB寫入數據

put

再來看put方法:

@Override
  public void put(Collection<SinkRecord> records) {
    if (records.isEmpty()) {
      return;
    }
    final SinkRecord first = records.iterator().next();
    final int recordsCount = records.size();
    log.debug(
        "Received {} records. First record kafka coordinates:({}-{}-{}). Writing them to the "
        + "database...",
        recordsCount, first.topic(), first.kafkaPartition(), first.kafkaOffset()
    );
    try {
      writer.write(records); //1
    } catch (SQLException sqle) {
      log.warn(
          "Write of {} records failed, remainingRetries={}",
          records.size(),
          remainingRetries,
          sqle
      );
      String sqleAllMessages = "";
      for (Throwable e : sqle) {
        sqleAllMessages += e + System.lineSeparator();
      }
      if (remainingRetries == 0) {
        throw new ConnectException(new SQLException(sqleAllMessages));
      } else {
        writer.closeQuietly();
        initWriter();
        remainingRetries--;
        context.timeout(config.retryBackoffMs);
        throw new RetriableException(new SQLException(sqleAllMessages));
      }
    }
    remainingRetries = config.maxRetries;
  }

先關注正常邏輯,校驗、異常之類的代碼暫時不看,那麼就只剩下1號代碼。
1號代碼是將SinkRecord對象集合通過JdbcDbWriter的write方法寫入數據庫

進入JdbcDbWriter的write方法:

void write(final Collection<SinkRecord> records) throws SQLException {
    final Connection connection = cachedConnectionProvider.getConnection(); //1

    final Map<TableId, BufferedRecords> bufferByTable = new HashMap<>();
    for (SinkRecord record : records) { 
      final TableId tableId = destinationTable(record.topic()); //2
      BufferedRecords buffer = bufferByTable.get(tableId); //3
      if (buffer == null) {
        buffer = new BufferedRecords(config, tableId, dbDialect, dbStructure, connection);//4
        bufferByTable.put(tableId, buffer);//5
      }
      buffer.add(record);//6
    }
    for (Map.Entry<TableId, BufferedRecords> entry : bufferByTable.entrySet()) {
      TableId tableId = entry.getKey();
      BufferedRecords buffer = entry.getValue();
      log.debug("Flushing records in JDBC Writer for table ID: {}", tableId);
      buffer.flush();//7
      buffer.close();
    }
    connection.commit();
  }

1號代碼通過CachedConnectionProvider對象獲取數據庫連接,這個對象是在JdbcDbWriter的構造器中進行初始化的
2號代碼主要作用就是獲取當前record所對應的表名
3 4 5 6號代碼最終目的就是將當前接受到的SinkRecord對象集合中操作同一張表的SinkRecord對象聚合到一個BufferedRecords對象中
7號代碼開始對某一張表的SinkRecord對象進行解析並插入數據庫的操作了

先來看6號代碼處的add方法做了什麼操作:

  public List<SinkRecord> add(SinkRecord record) throws SQLException {
    final List<SinkRecord> flushed = new ArrayList<>();

    boolean schemaChanged = false;
    if (!Objects.equals(keySchema, record.keySchema())) {//1
      keySchema = record.keySchema();//2
      schemaChanged = true;//3
    }
    if (isNull(record.valueSchema())) {//4
      // For deletes, both the value and value schema come in as null.
      // We don't want to treat this as a schema change if key schemas is the same
      // otherwise we flush unnecessarily.
      if (config.deleteEnabled) {//5
        deletesInBatch = true;//6
      }
    } else if (Objects.equals(valueSchema, record.valueSchema())) {//7
      if (config.deleteEnabled && deletesInBatch) {//8
        // flush so an insert after a delete of same record isn't lost
        flushed.addAll(flush());//9
      }
    } else {
      // value schema is not null and has changed. This is a real schema change.
      valueSchema = record.valueSchema();//10
      schemaChanged = true;//11
    }

    if (schemaChanged) {//12
      // Each batch needs to have the same schemas, so get the buffered records out
      flushed.addAll(flush());

      // re-initialize everything that depends on the record schema
      final SchemaPair schemaPair = new SchemaPair(
          record.keySchema(),
          record.valueSchema()
      );
      fieldsMetadata = FieldsMetadata.extract(
          tableId.tableName(),
          config.pkMode,
          config.pkFields,
          config.fieldsWhitelist,
          schemaPair
      );
      dbStructure.createOrAmendIfNecessary(
          config,
          connection,
          tableId,
          fieldsMetadata
      );
      final String insertSql = getInsertSql();
      final String deleteSql = getDeleteSql();
      log.debug(
          "{} sql: {} deleteSql: {} meta: {}",
          config.insertMode,
          insertSql,
          deleteSql,
          fieldsMetadata
      );
      close();
      updatePreparedStatement = dbDialect.createPreparedStatement(connection, insertSql);
      updateStatementBinder = dbDialect.statementBinder(
          updatePreparedStatement,
          config.pkMode,
          schemaPair,
          fieldsMetadata,
          config.insertMode
      );
      if (config.deleteEnabled && nonNull(deleteSql)) {
        deletePreparedStatement = dbDialect.createPreparedStatement(connection, deleteSql);
        deleteStatementBinder = dbDialect.statementBinder(
            deletePreparedStatement,
            config.pkMode,
            schemaPair,
            fieldsMetadata,
            config.insertMode
        );
      }
    }
    records.add(record);//13

    if (records.size() >= config.batchSize) {
      flushed.addAll(flush());
    }
    return flushed;
  }

1 2 3號代碼比較了當前對象的keySchema與傳入參數record的keySchema,這個keySchema大致可以理解爲表的主鍵定義,所以這部分代碼大致含義是,當主鍵的定義發生改變時,schemaChanged賦值爲true
4 5 6號代碼表示傳入的record對象的valueSchema爲空,且配置deleteEnabled爲true時,刪除標識deletesInBatch賦值爲true,也就是後續會根據這個標識刪除一條數據
7 8 9號代碼表示valueSchema沒有變化(可以理解爲表的結構沒有變化),且還有未刷新的刪除record,執行一次刷新操作
10 11號代碼,進入這個else分支就表示valueSchema雖然不爲空,但是與之前的valueSchema不一樣了,意思就是表的結構有所改變,可能是增加字段之類的。
12號代碼處的schemaChanged,只有當表結構改變時纔會賦值爲true,此時會進入這個if分支,分支內代碼大致功能是:創建不存在的表或缺失的字段,更新updateStatementBinder和deleteStatementBinder
13號代碼將這條record放進當前BufferedRecords對象的SinkRecord集合中,在flush方法中使用

再來看BufferedRecords的flush方法:

  public List<SinkRecord> flush() throws SQLException {
    if (records.isEmpty()) {
      log.debug("Records is empty");
      return new ArrayList<>();
    }
    log.debug("Flushing {} buffered records", records.size());
    for (SinkRecord record : records) {//1
      if (isNull(record.value()) && nonNull(deleteStatementBinder)) {//2
        deleteStatementBinder.bindRecord(record);//3
      } else {
        updateStatementBinder.bindRecord(record);//4
      }
    }
    Optional<Long> totalUpdateCount = executeUpdates();//5
    long totalDeleteCount = executeDeletes();//6

    final long expectedCount = updateRecordCount();
    log.trace("{} records:{} resulting in totalUpdateCount:{} totalDeleteCount:{}",
        config.insertMode, records.size(), totalUpdateCount, totalDeleteCount
    );
    if (totalUpdateCount.filter(total -> total != expectedCount).isPresent()
        && config.insertMode == INSERT) {
      throw new ConnectException(String.format(
          "Update count (%d) did not sum up to total number of records inserted (%d)",
          totalUpdateCount.get(),
          expectedCount
      ));
    }
    if (!totalUpdateCount.isPresent()) {
      log.info(
          "{} records:{} , but no count of the number of rows it affected is available",
          config.insertMode,
          records.size()
      );
    }

    final List<SinkRecord> flushedRecords = records;
    records = new ArrayList<>();
    deletesInBatch = false;
    return flushedRecords;
  }

1 2 3 4號代碼遍歷了SinkRecord集合,將它們分別綁定到刪除操作deleteStatementBinder或更新操作updateStatementBinder,其實內部是解析了SinkRecord的key和value,填充到了statement的佔位符中
5 6號代碼執行了各自statement的executeBatch方法,是真正執行sql的地方

發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章