kubernetes容器化常用中間件之kafka

kafka

Kafka是由Apache軟件基金會開發的一個開源流處理平臺,由Scala和Java編寫。該項目的目標是爲處理實時數據提供一個統一、高吞吐、低延遲的平臺。其持久化層本質上是一個“按照分佈式事務日誌架構的大規模發佈/訂閱消息隊列”,[3]這使它作爲企業級基礎設施來處理流式數據非常有價值。此外,Kafka可以通過Kafka Connect連接到外部系統(用於數據輸入/輸出),並提供了Kafka Streams——一個Java流式處理庫。

容器化步驟

構建kafka鏡像

容器化的第一步首先是要構建kafka的基礎鏡像,當然你也可以使用dockerhub上別人已經做好的kafka鏡像,但是這樣的話,你首先的熟悉別人做的鏡像的相關配置,可能你想使用的kafka版本沒有對應的鏡像,所以這裏我們就從0開始,自己構建你想使用的任何版本的kafka鏡像
兩種方式構建kafka鏡像

方式一

訪問kafak官方下載你想使用的kafka版本的包,比如我下載kafka_2.11-1.1.1.tgz,下載完整後,開始編寫dockerfile文件。在編寫dockerfile之前,可以先在kafka官網查詢相關版本使用的jdk的版本(kafka官網點擊documenet,其中有個java version的tab),然後找對應的jdk的基礎鏡像做完base鏡像。

FROM mcr.microsoft.com/java/jdk:8-zulu-alpine

LABEL MAINTAINER="[email protected]"

ENV KAFKA_VERSION="1.1.1" SCALA_VERSION="2.11"

ADD kafka_2.11-1.1.1.tgz /opt 
# 修改java的jvm參數     
RUN sed  -i 's/-Xmx1G -Xms1G/-Xmx4g -Xms4G/g' /opt/kafka_2.11-1.1.1/bin/kafka-server-start.sh

VOLUME ["/kafka"]

ENV KAFKA_HOME /opt/kafka_${SCALA_VERSION}-${KAFKA_VERSION}

ENV PATH=${PATH}:${KAFKA_HOME}/bin

# 9092是broker server的監聽地址,5555是JMX監聽的端口
EXPOSE 9092 5555
# 這裏沒有指定CMD或者ENTRYPOINT,可以自行自定,沒指定的原因是在部署的時候,自己指定啓動命令

方式二

直接在dockerfile中用命令下載構建

FROM mcr.microsoft.com/java/jdk:8-zulu-alpine

LABEL MAINTAINER="[email protected]"

RUN apk add --update unzip wget curl

ENV KAFKA_VERSION="1.1.1" SCALA_VERSION="2.11"

RUN wget -q https://archive.apache.org/dist/kafka/${KAFKA_VERSION}/kafka_${SCALA_VERSION}-${KAFKA_VERSION}.tgz  -O /tmp/kafka_${SCALA_VERSION}-${KAFKA_VERSION}.tgz \
    && tar xfz /tmp/kafka_${SCALA_VERSION}-${KAFKA_VERSION}.tgz -C /opt && rm /tmp/kafka_${SCALA_VERSION}-${KAFKA_VERSION}.tgz \
    && sed  -i 's/-Xmx1G -Xms1G/-Xmx4g -Xms4G/g' /opt/kafka_${SCALA_VERSION}-${KAFKA_VERSION}/bin/kafka-server-start.sh

VOLUME ["/kafka"]

ENV KAFKA_HOME /opt/kafka_${SCALA_VERSION}-${KAFKA_VERSION}

ENV PATH=${PATH}:${KAFKA_HOME}/bin

# 9092是broker server的監聽地址,5555是JMX監聽的端口
EXPOSE 9092 5555

到此爲止,kafka的基礎鏡像,我們已經做好了,接下來就是在k8s中部署kafka集羣了

部署kafka集羣

在 部署前,我們需要了解kafka集羣的實現方式,我們手動部署kafka集羣的方式比較簡單,可以參考官網相關文檔,我們這裏部署的時候,儘量的做到靈活化,就是kafka的很多參數都可以動態的修改。kafka的配置文件我們通過configmap的方式掛載,並且配置文件是模版的方式填充,依靠initcontainer去動態的渲染配置文件。

headless service

apiVersion: v1
kind: Service
metadata:
  labels:
    app.kubernetes.io/name: kafka
    app.kubernetes.io/instance: kafka-demo
    app.kubernetes.io/version: 1.1.1
  name: kafka-demo
  namespace: default
spec:
  clusterIP: None
  ports:
  - name: broker
    port: 9092
    protocol: TCP
    targetPort: 9092
  selector:
    app.kubernetes.io/instance: kafka-demo
    app.kubernetes.io/version: 1.1.1
  type: ClusterIP

configmap

apiVersion: v1
kind: ConfigMap
metadata:
  labels:
    app.kubernetes.io/name: kafka
    app.kubernetes.io/instance: kafka-demo
    app.kubernetes.io/version: 1.1.1
  name: kafka-config
  namespace: default
data:
  init.sh: |-
    #!/bin/bash
    set -x

    cp /etc/kafka-configmap/log4j.properties /etc/kafka/

    KAFKA_BROKER_ID=${HOSTNAME##*-}

    ZOOKEEPER=${ZOOKEEPER}

    # to dynamic set kafka server broker id
    sed "s/#init#broker.id=#init#/broker.id=$KAFKA_BROKER_ID/" /etc/kafka-configmap/server.properties > /etc/kafka/server.properties.tmp

    # to dynamic set kafka server zookeeper connect info
    sed -i "s/#init#zookeeper.connect=#init#/zookeeper.connect=$ZOOKEEPER/" /etc/kafka/server.properties.tmp

    [ $? -eq 0 ] && mv /etc/kafka/server.properties.tmp /etc/kafka/server.properties
  log4j.properties: |-
    log4j.rootLogger=INFO, stdout

    log4j.appender.stdout=org.apache.log4j.ConsoleAppender
    log4j.appender.stdout.layout=org.apache.log4j.PatternLayout
    log4j.appender.stdout.layout.ConversionPattern=[%d] %p %m (%c)%n

    log4j.appender.kafkaAppender=org.apache.log4j.DailyRollingFileAppender
    log4j.appender.kafkaAppender.DatePattern='.'yyyy-MM-dd-HH
    log4j.appender.kafkaAppender.File=${kafka.logs.dir}/server.log
    log4j.appender.kafkaAppender.layout=org.apache.log4j.PatternLayout
    log4j.appender.kafkaAppender.layout.ConversionPattern=[%d] %p %m (%c)%n

    log4j.appender.stateChangeAppender=org.apache.log4j.DailyRollingFileAppender
    log4j.appender.stateChangeAppender.DatePattern='.'yyyy-MM-dd-HH
    log4j.appender.stateChangeAppender.File=${kafka.logs.dir}/state-change.log
    log4j.appender.stateChangeAppender.layout=org.apache.log4j.PatternLayout
    log4j.appender.stateChangeAppender.layout.ConversionPattern=[%d] %p %m (%c)%n

    log4j.appender.requestAppender=org.apache.log4j.DailyRollingFileAppender
    log4j.appender.requestAppender.DatePattern='.'yyyy-MM-dd-HH
    log4j.appender.requestAppender.File=${kafka.logs.dir}/kafka-request.log
    log4j.appender.requestAppender.layout=org.apache.log4j.PatternLayout
    log4j.appender.requestAppender.layout.ConversionPattern=[%d] %p %m (%c)%n

    log4j.appender.cleanerAppender=org.apache.log4j.DailyRollingFileAppender
    log4j.appender.cleanerAppender.DatePattern='.'yyyy-MM-dd-HH
    log4j.appender.cleanerAppender.File=${kafka.logs.dir}/log-cleaner.log
    log4j.appender.cleanerAppender.layout=org.apache.log4j.PatternLayout
    log4j.appender.cleanerAppender.layout.ConversionPattern=[%d] %p %m (%c)%n

    log4j.appender.controllerAppender=org.apache.log4j.DailyRollingFileAppender
    log4j.appender.controllerAppender.DatePattern='.'yyyy-MM-dd-HH
    log4j.appender.controllerAppender.File=${kafka.logs.dir}/controller.log
    log4j.appender.controllerAppender.layout=org.apache.log4j.PatternLayout
    log4j.appender.controllerAppender.layout.ConversionPattern=[%d] %p %m (%c)%n

    log4j.appender.authorizerAppender=org.apache.log4j.DailyRollingFileAppender
    log4j.appender.authorizerAppender.DatePattern='.'yyyy-MM-dd-HH
    log4j.appender.authorizerAppender.File=${kafka.logs.dir}/kafka-authorizer.log
    log4j.appender.authorizerAppender.layout=org.apache.log4j.PatternLayout
    log4j.appender.authorizerAppender.layout.ConversionPattern=[%d] %p %m (%c)%n

    # Turn on all our debugging info
    #log4j.logger.kafka.producer.async.DefaultEventHandler=DEBUG, kafkaAppender
    #log4j.logger.kafka.client.ClientUtils=DEBUG, kafkaAppender
    #log4j.logger.kafka.perf=DEBUG, kafkaAppender
    #log4j.logger.kafka.perf.ProducerPerformance$ProducerThread=DEBUG, kafkaAppender
    #log4j.logger.org.I0Itec.zkclient.ZkClient=DEBUG
    log4j.logger.kafka=INFO, kafkaAppender

    log4j.logger.kafka.network.RequestChannel$=WARN, requestAppender
    log4j.additivity.kafka.network.RequestChannel$=false

    #log4j.logger.kafka.network.Processor=INFO, requestAppender
    #log4j.logger.kafka.server.KafkaApis=INFO, requestAppender
    #log4j.additivity.kafka.server.KafkaApis=false
    log4j.logger.kafka.request.logger=WARN, requestAppender
    log4j.additivity.kafka.request.logger=false

    log4j.logger.kafka.controller=INFO, controllerAppender
    log4j.additivity.kafka.controller=false

    log4j.logger.kafka.log.LogCleaner=INFO, cleanerAppender
    log4j.additivity.kafka.log.LogCleaner=false

    log4j.logger.state.change.logger=INFO, stateChangeAppender
    log4j.additivity.state.change.logger=false

    #Change this to debug to get the actual audit log for authorizer.
    log4j.logger.kafka.authorizer.logger=WARN, authorizerAppender
    log4j.additivity.kafka.authorizer.logger=false
  server.properties: |-
    ############################# Socket Server Settings #############################

    # The id of the broker. This must be set to a unique integer for each broker.
    #init#broker.id=#init#

    #init#broker.rack=#init#

    listeners=PLAINTEXT://:9092

    # The number of threads handling network requests
    num.network.threads=3

    # The number of threads doing disk I/O
    num.io.threads=8

    # The send buffer (SO_SNDBUF) used by the socket server
    socket.send.buffer.bytes=102400

    # The receive buffer (SO_RCVBUF) used by the socket server
    socket.receive.buffer.bytes=102400

    # The maximum size of a request that the socket server will accept (protection against OOM)
    socket.request.max.bytes=104857600

    ############################# Log Basics #############################

    # A comma seperated list of directories under which to store log files
    log.dirs=/var/lib/kafka/data/topics

    # The default number of log partitions per topic. More partitions allow greater
    # parallelism for consumption, but this will also result in more files across
    # the brokers.
    num.partitions=1

    default.replication.factor=3

    min.insync.replicas=2

    auto.create.topics.enable=true

    # The number of threads per data directory to be used for log recovery at startup and flushing at shutdown.
    # This value is recommended to be increased for installations with data dirs located in RAID array.
    num.recovery.threads.per.data.dir=1

    ############################# Log Flush Policy #############################

    # Messages are immediately written to the filesystem but by default we only fsync() to sync
    # the OS cache lazily. The following configurations control the flush of data to disk.
    # There are a few important trade-offs here:
    #    1. Durability: Unflushed data may be lost if you are not using replication.
    #    2. Latency: Very large flush intervals may lead to latency spikes when the flush does occur as there will be a lot of data to flush.
    #    3. Throughput: The flush is generally the most expensive operation, and a small flush interval may lead to excessive seeks.
    # The settings below allow one to configure the flush policy to flush data after a period of time or
    # every N messages (or both). This can be done globally and overridden on a per-topic basis.

    # The number of messages to accept before forcing a flush of data to disk
    log.flush.interval.messages=10000

    # The maximum amount of time a message can sit in a log before we force a flush
    log.flush.interval.ms=1000

    ############################# Log Retention Policy #############################

    # The following configurations control the disposal of log segments. The policy can
    # be set to delete segments after a period of time, or after a given size has accumulated.
    # A segment will be deleted whenever *either* of these criteria are met. Deletion always happens
    # from the end of the log.

    # The minimum age of a log file to be eligible for deletion
    log.retention.hours=168

    # A size-based retention policy for logs. Segments are pruned from the log unless the remaining
    # segments drop below log.retention.bytes. Functions independently of log.retention.hours.
    log.retention.bytes=1073741824

    # The maximum size of a log segment file. When this size is reached a new log segment will be created.
    log.segment.bytes=1073741824

    # The interval at which log segments are checked to see if they can be deleted according
    # to the retention policies
    log.retention.check.interval.ms=300000

    ############################# Zookeeper #############################

    # Zookeeper connection string (see zookeeper docs for details).
    # This is a comma separated host:port pairs, each corresponding to a zk
    # server. e.g. "127.0.0.1:3000,127.0.0.1:3001,127.0.0.1:3002".
    # You can also append an optional chroot string to the urls to specify the
    # root directory for all kafka znodes.
    #init#zookeeper.connect=#init#

    # Zookeeper open acl in root path, we need set this vaule to true.
    # zookeeper.set.acl=true

    # Timeout in ms for connecting to zookeeper
    #zookeeper.connection.timeout.ms=6000


    ############################# Group Coordinator Settings #############################

    # The following configuration specifies the time, in milliseconds, that the GroupCoordinator will delay the initial consumer rebalance.
    # The rebalance will be further delayed by the value of group.initial.rebalance.delay.ms as new members join the group, up to a maximum of max.poll.interval.ms.
    # The default value for this is 3 seconds.
    # We override this to 0 here as it makes for a better out-of-the-box experience for development and testing.
    # However, in production environments the default value of 3 seconds is more suitable as this will help to avoid unnecessary, and potentially expensive, rebalances during application startup.
    #group.initial.rebalance.delay.ms=0

statefulset

apiVersion: apps/v1
kind: StatefulSet
metadata:
  labels:
    app.kubernetes.io/name: kafka
    app.kubernetes.io/instance: kafka-demo
    app.kubernetes.io/version: 1.1.1
  name: kafka-demo
  namespace: default
spec:
  replicas: 3
  selector:
    matchLabels:
      app.kubernetes.io/instance: kafka-demo
      app.kubernetes.io/version: 1.1.1
  serviceName: kafka-demo
  template:
    metadata:
      labels:
        app.kubernetes.io/instance: kafka-demo
        app.kubernetes.io/version: 1.1.1
    spec:
      containers:
      - command:
        - kafka-server-start.sh
        - /etc/kafka/server.properties
        env:
        - name: KAFKA_LOG4J_OPTS
          value: -Dlog4j.configuration=file:/etc/kafka/log4j.properties
        - name: JMX_PORT
          value: "5555"
        - name: KAFKA_HEAP_OPTS
        # 自定更改成符合自己需求的內存大小
          value: -Xmx1G -Xms1G
        # 自行更換成製作的鏡像
        image: kafka:2.11-1.1.1
        imagePullPolicy: IfNotPresent
        name: kafka
        ports:
        - containerPort: 5555
          name: jmx
          protocol: TCP
        - containerPort: 9092
          name: broker
          protocol: TCP
        readinessProbe:
          failureThreshold: 3
          periodSeconds: 10
          successThreshold: 1
          tcpSocket:
            port: 9092
          timeoutSeconds: 1
        resources:
          limits:
            # 自行更換成合適的cpu
            cpu: 512m
            # 自行更換成合適的memory
            memory: 512M
          requests:
            cpu: 512m
            memory: 512
        securityContext:
          privileged: true
        volumeMounts:
        - mountPath: /etc/kafka
          name: config
        - mountPath: /var/lib/kafka/data
          name: data
      initContainers:
      - command:
        - /bin/bash
        - /etc/kafka-configmap/init.sh
        env:
        # zk地址,自行替換
        - name: ZOOKEEPER
          value: localhost:2182
        image: huangjia/kafka:2.11-1.1.1
        imagePullPolicy: IfNotPresent
        name: init-config
        resources:
          limits:
            cpu: 512m
            memory: 512M
          requests:
            cpu: 512m
            memory: 512M
        securityContext:
          privileged: true
        volumeMounts:
        - mountPath: /etc/kafka-configmap
          name: configmap
        - mountPath: /etc/kafka
          name: config
      restartPolicy: Always
      volumes:
      - configMap:
          defaultMode: 420
          name: kafka-config
        name: configmap
      - emptyDir: {}
        name: config
      # 自行使用pvc替換
      - emptyDir: {}
        name: data

部署完kafka後可以在創建一個service給具體的服務使用

踩過的坑

在kafka容器化完成之後,肯定是需要給具體的應用使用的,可以給一個具體的服務加端口的方式,但是者只能在集羣內部,也可以通過配置ingress的方式給集羣外部的服務使用,但是在給集羣外部的服務使用的時候,通常會遇到問題,具體的原因:集羣外部使用一個地址去訪問kafka的時候,producer在鏈接kafka的時候,內部是先通過服務地址,獲取對應的topic的partition的leader partition所在的節點(broker)的信息,然後根據partition的leader所在broker的host和端口,再去創建鏈接,這個時候,獲取的broker的host是集羣內部的statefulset的pod的dns record,集羣外部是不能正常解析的。所以導致不能正常的使用kafka。根據這種情況,可以通過創建hostnetwork的statefulset去做,但是這需要自己去動態的管理hostnetwork的port來避免端口的衝突。有需求的朋友可以自行的去嘗試。使用hostnetwork的時候,需要注意的是,container的port中的 HostPort和ContainerPort必須一致。

後話

kafka的很多可配置的參數我沒詳細設置,具體的可以參考kafka官網,根據自己的需要去動態配置參數,動態的配置的方式也比較簡單,就是通過在configmap的配置文件中通過佔位的方式,然後在init.sh腳本中去動態的渲染,參數的來源可以是通過設置initcontainer的環境變量的方式。後續我會繼續整理RabbitMQ,Zookeeper,Etcd,Mysql,MariaDB,Mongo的容器化的部署。

發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章