【大數據平臺】——基於Confluent的Kafka Rest API探索（四）

Kafka Rest API 指定分區
Kafka消息分區規則

這裏我們戳進KafkaProducer的send方法

@Override
    public Future<RecordMetadata> send(ProducerRecord<K, V> record, Callback callback) {
        // intercept the record, which can be potentially modified; this method does not throw exceptions
        ProducerRecord<K, V> interceptedRecord = this.interceptors.onSend(record);
        return doSend(interceptedRecord, callback);
    }

再戳進doSend方法

/**
     * Implementation of asynchronously send a record to a topic.
     */
    private Future<RecordMetadata> doSend(ProducerRecord<K, V> record, Callback callback) {
        TopicPartition tp = null;
        try {
            ... ....
            byte[] serializedKey;
            try {
                serializedKey = keySerializer.serialize(record.topic(), record.headers(), record.key());
            } catch (ClassCastException cce) {
                throw new SerializationException("Can't convert key of class " + record.key().getClass().getName() +
                        " to class " + producerConfig.getClass(ProducerConfig.KEY_SERIALIZER_CLASS_CONFIG).getName() +
                        " specified in key.serializer", cce);
            }
            ... ...
            int partition = partition(record, serializedKey, serializedValue, cluster);
            ... ...
    }

調用的partition方法

private int partition(ProducerRecord<K, V> record, byte[] serializedKey, byte[] serializedValue, Cluster cluster) {
        Integer partition = record.partition();
        return partition != null ?
                partition :
                partitioner.partition(
                        record.topic(), record.key(), serializedKey, record.value(), serializedValue, cluster);
    }

這裏面的partitioner的類就是：

org.apache.kafka.clients.producer.internals.DefaultPartitioner

public int partition(String topic, Object key, byte[] keyBytes, Object value, byte[] valueBytes, Cluster cluster) {
        List<PartitionInfo> partitions = cluster.partitionsForTopic(topic);
        int numPartitions = partitions.size();
        if (keyBytes == null) {
            int nextValue = nextValue(topic);
            List<PartitionInfo> availablePartitions = cluster.availablePartitionsForTopic(topic);
            if (availablePartitions.size() > 0) {
                int part = Utils.toPositive(nextValue) % availablePartitions.size();
                return availablePartitions.get(part).partition();
            } else {
                // no partitions are available, give a non-available partition
                return Utils.toPositive(nextValue) % numPartitions;
            }
        } else {
            // hash the keyBytes to choose a partition
            return Utils.toPositive(Utils.murmur2(keyBytes)) % numPartitions;
        }
    }

可以看出如果沒有指定Key ，會採用隨機方式指定分區。而如果指定了Key則會對Key做MurmurHash2操作之後對總分區數取模。注意的是，這裏的Key可不是發送消息時指定的Key，在doSend方法不難看見，他是

keySerializer.serialize(record.topic(), record.headers(), record.key());

這個接口實際

default byte[] serialize(String topic, Headers headers, T data) {
        return serialize(topic, data);
    }

觀察他的幾個實現

public class ShortSerializer implements Serializer<Short> {
    public byte[] serialize(String topic, Short data) {
        if (data == null)
            return null;

        return new byte[] {
            (byte) (data >>> 8),
            data.byteValue()
        };
    }
}

public class LongSerializer implements Serializer<Long> {
    public byte[] serialize(String topic, Long data) {
        if (data == null)
            return null;

        return new byte[] {
            (byte) (data >>> 56),
            (byte) (data >>> 48),
            (byte) (data >>> 40),
            (byte) (data >>> 32),
            (byte) (data >>> 24),
            (byte) (data >>> 16),
            (byte) (data >>> 8),
            data.byteValue()
        };
    }
}

public class StringSerializer implements Serializer<String> {
    private String encoding = "UTF8";

    ... ...

    @Override
    public byte[] serialize(String topic, String data) {
        try {
            if (data == null)
                return null;
            else
                return data.getBytes(encoding);
        } catch (UnsupportedEncodingException e) {
            throw new SerializationException("Error when serializing string to byte[] due to unsupported encoding " + encoding);
        }
    }
}

public class ByteArraySerializer implements Serializer<byte[]> {
    @Override
    public byte[] serialize(String topic, byte[] data) {
        return data;
    }
}

所以他是根據Key的序列化後的進行Hash在模分區數

Kafka Rest Proxy指定分區

觀察Kafka Rest Proxy源碼：

https://github.com/confluentinc/kafka-rest

關注工程kafka-rest，可以看到如之前介紹的兩個類：ProducerPool 和 ProduceTask，ProducerPool 中的produce方法實現了消息的發送。

public <K, V> void produce(
      String topic,
      Integer partition,
      EmbeddedFormat recordFormat,
      SchemaHolder schemaHolder,
      Collection<? extends ProduceRecord<K, V>> records,
      ProduceRequestCallback callback
  ) {
    ProduceTask task = new ProduceTask(schemaHolder, records.size(), callback);
    log.trace("Starting produce task " + task.toString());
    RestProducer restProducer = producers.get(recordFormat);
    restProducer.produce(task, topic, partition, records);
  }

這個是RestProducer接口定義的方法，以最簡單的（無Schema的）消息爲例，實現RestProducer接口的是NoSchemaRestProducer，這個類中的produce方法

@Override
  public void produce(
      ProduceTask task,
      String topic,
      Integer partition,
      Collection<? extends ProduceRecord<K, V>> produceRecords
  ) {
    for (ProduceRecord<K, V> record : produceRecords) {
      Integer recordPartition = partition;
      if (recordPartition == null) {
        recordPartition = record.partition();
      }
      producer.send(
          new ProducerRecord(topic, recordPartition, record.getKey(), record.getValue()),
          task.createCallback()
      );
    }
  }

這個producer就是KafkaProducer，後面就不解釋了吧。

不過既然Kafka Rest API支持向指定分區發送消息，我可以像Java Client實現Partitioner 接口那樣自定義分區規則——通過自定義的業務場景預先分好Partition，並將Partition信息寫入請求的JSON中。

測試

使用Topic rest_test2測試，該Topic有3個分區。

通過Postman驗證以下Key及分區爲：

KEY = 33669988， Postman測試會被分到 Partition 0

KEY = 15935725， Postman測試會被分到 Partition 1

KEY = 13572468， Postman測試會被分到 Partition 2

之後使用這三個Key，分別發送1000條數據，每發送一條都驗證以下請求的response中的partition是否和Postman的一樣，把一樣的計數，最後看結果。

代碼：

public class TestPartition {

    public static int DATA_SIZE = 1000;
    public static String REST_HOST = "xxx";
    public static int REST_PORT = 8085;
    public static String TOPIC = "rest_test2";
    public static String CONTENT_TYPE = "application/vnd.kafka.binary.v2+json";
    public static String ENCODE = "utf-8";
    public static String KEY_IN_PARTITION_0 = "33669988";
    public static String KEY_IN_PARTITION_1 = "15935725";
    public static String KEY_IN_PARTITION_2 = "13572468";

    public static void main(String[] args) {
        Random random = new Random();

        int count0 = 0;
        int count1 = 0;
        int count2 = 0;

        for(int i=0; i<DATA_SIZE; i++){
            //發送數據
            String response = sendMessage(
                    REST_HOST,
                    REST_PORT,
                    TOPIC,
                    KEY_IN_PARTITION_0,
                    MD5Util.encrypt(
                    String.valueOf(
                            random.nextInt(999999)
                    )),
                    CONTENT_TYPE,
                    ENCODE);
            //驗證是否在 Partition 0
            if(0==getPartition(response)){
                count0++;
            }else {
                System.out.println("Need 0 but "+getPartition(response));
            }
        }

        for(int i=0; i<DATA_SIZE/2; i++){
            String response = sendMessage(
                    REST_HOST,
                    REST_PORT,
                    TOPIC,
                    KEY_IN_PARTITION_1,
                    MD5Util.encrypt(
                            String.valueOf(
                                    random.nextInt(999999)
                            )),
                    CONTENT_TYPE,
                    ENCODE);
            if(1==getPartition(response)){
                count1++;
            }else {
                System.out.println(getPartition(response));
            }
        }

        for(int i=0; i<DATA_SIZE/5; i++){
            String response = sendMessage(
                    REST_HOST,
                    REST_PORT,
                    TOPIC,
                    KEY_IN_PARTITION_2,
                    MD5Util.encrypt(
                            String.valueOf(
                                    random.nextInt(999999)
                            )),
                    CONTENT_TYPE,
                    ENCODE);
            if(2==getPartition(response)){
                count2++;
            }else {
                System.out.println(getPartition(response));
            }
        }

        System.out.println(count0);
        System.out.println(count1);
        System.out.println(count2);

        HttpClientPoolTool.closeConnectionPool();
    }

    public static int getPartition(String response){
        try {
            Pattern pattern = Pattern.compile("(?<=(\"partition\":)).*(?=(,\"offset\"))");
            Matcher matcher = pattern.matcher(response);
            if (matcher.find()) {
                return Integer.parseInt(matcher.group(0).trim());
            }
            return -1;
        }catch (Exception e){
            System.out.println(response);
            e.printStackTrace();
        }
        return -1;
    }
}

結果均是1000，再測試一下三個Key數據量不一致的，分別發送1000、500、200條，結果依舊是相同的Key會被分在同一個partition下。

【大數據平臺】——基於Confluent的Kafka Rest API探索（四）

Kafka Rest API 指定分區

Kafka消息分區規則

Kafka Rest Proxy指定分區

測試

985 碩士程序員，空窗 4 個月沒有 Offer！

一文搞懂 Spring 循環依賴

賽博鬥地主——使用大語言模型扮演Agent智能體玩牌類遊戲。

VScode右鍵打開(添加到右鍵)

【Java數據結構】樹

【大數據平臺】——基於Confluent的Kafka Rest API探索（一）

【大數據平臺】——基於Confluent的Kafka Rest API探索（三）

【大數據平臺】——基於Confluent的Kafka Rest API探索（四）

【區塊鏈】——區塊鏈學習初探（三）

Mac下配置sublime實現LaTeX

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結