這裏主要總結一些我在CDH中安裝的kafka測試的一些比較基礎的指令。
一、相關基礎內容
Kafka羣集中的每個主機都運行一個稱爲代理的服務器,該服務器存儲發送到主題的消息並服務於消費者請求。
首先先看服務器安裝kafka的實例信息:
注意:
然後正常kafka的指令是 : ./bin/kafka-topics.sh --zookeeper cluster2-4:2181 .......
但是使用CDH安裝的kafka則不需要全寫出此 ./bin/kafka-topics.sh 部分。只許直接寫 kafka-topics 即可,這是很重要的一個區別,使用CDH安裝的kafka時候要特別注意一下。
具體有哪些指令可以看此路徑下:
/opt/cloudera/parcels/KAFKA-4.1.0-1.4.1.0.p0.4/bin
二、topic主題使用
接下來測試topic指令,這裏我們要先看CDH中配置的這個ZooKeeper Root的Kafka服務範圍爲: " /kafka "。
所以我們使用topic的指令格式應該都類似:
kafka-topics --zookeeper cluster2-4:2181/kafka ......
A.創建一個名爲 test 的主題(Topic):
kafka-topics --zookeeper cluster2-4:2181/kafka --create -replication-factor 1 --partitions 3 --topic test
Or
若是上述中的 ZooKeeper Root 的Kafka服務範圍爲: " / "。則這裏的創建主題指令改爲:
kafka-topics --zookeeper cluster2-4:2181 --create --replication-factor 1 --partitions 3 --topic test
B.查詢現在已經存在的topic:
kafka-topics --zookeeper localhost:2181/kafka --list
C.刪除創建的topic:
kafka-topics --zookeeper localhost:2181/kafka --delete --topic test2
拓展:
這裏如果直接刪除,則會輸出 Topic *** is marked for deletion 如上圖,如果我們topic中消息堆積的太多,或者kafka所在磁盤空間滿了等等,則會需要徹底清理一下kafka topic。
方法一:修改kafaka配置文件server.properties, 添加 delete.topic.enable=true,重啓kafka,之後通過kafka命令行就可以直接刪除topic。
方法二:通過命令行刪除topic: ./bin/kafka-topics.sh --delete --zookeeper {zookeeper server} --topic {topic name}
因爲kafaka配置文件中server.properties沒有配置delete.topic.enable=true,此時的刪除並不是真正的刪除,只是把topic標記爲:marked for deletion 你可以通過命令:./bin/kafka-topics --zookeeper {zookeeper server} --list 來查看所有topic
方法三:若需要真正刪除它,需要登錄zookeeper客戶端:
zookeeper-client
找到topic所在的目錄:
ls /kafka/brokers/topics
執行命令,即可,此時topic被徹底刪除:
rmr /kafka/brokers/topics/{topic name}
D.修改topic的分區數:
kafka-topics --zookeeper localhost:2181/kafka --alter --topic test \ partitions 5
E.查看topic的詳細信息:
kafka-topics --describe --zookeeper localhost:2181/kafka --topic test
F.我們還可以在這裏測試分佈式是否連接正常:
kafka-topics --zookeeper cluster2-4:2181/kafka --list
kafka-topics --zookeeper cluster2-3:2181/kafka --list
可以看到在2-4這臺服務器中,我們後面輸入 cluster2-4:2181/kafka 與 cluster2-3:2181/kafka 均可得到統一的信息。
topic指令參數:
Option Description
------ -----------
--alter Alter the number of partitions,
replica assignment, and/or
configuration for the topic.
--bootstrap-server <String: server to REQUIRED: The Kafka server to connect
connect to> to. In case of providing this, a
direct Zookeeper connection won't be
required.
--command-config <String: command Property file containing configs to be
config property file> passed to Admin Client. This is used
only with --bootstrap-server option
for describing and altering broker
configs.
--config <String: name=value> A topic configuration override for the
topic being created or altered.The
following is a list of valid
configurations:
cleanup.policy
compression.type
delete.retention.ms
file.delete.delay.ms
flush.messages
flush.ms
follower.replication.throttled.
replicas
index.interval.bytes
leader.replication.throttled.replicas
max.message.bytes
message.downconversion.enable
message.format.version
message.timestamp.difference.max.ms
message.timestamp.type
min.cleanable.dirty.ratio
min.compaction.lag.ms
min.insync.replicas
preallocate
retention.bytes
retention.ms
segment.bytes
segment.index.bytes
segment.jitter.ms
segment.ms
unclean.leader.election.enable
See the Kafka documentation for full
details on the topic configs.It is
supported only in combination with --
create if --bootstrap-server option
is used.
--create Create a new topic.
--delete Delete a topic
--delete-config <String: name> A topic configuration override to be
removed for an existing topic (see
the list of configurations under the
--config option). Not supported with
the --bootstrap-server option.
--describe List details for the given topics.
--disable-rack-aware Disable rack aware replica assignment
--exclude-internal exclude internal topics when running
list or describe command. The
internal topics will be listed by
default
--force Suppress console prompts
--help Print usage information.
--if-exists if set when altering or deleting or
describing topics, the action will
only execute if the topic exists.
Not supported with the --bootstrap-
server option.
--if-not-exists if set when creating topics, the
action will only execute if the
topic does not already exist. Not
supported with the --bootstrap-
server option.
--list List all available topics.
--partitions <Integer: # of partitions> The number of partitions for the topic
being created or altered (WARNING:
If partitions are increased for a
topic that has a key, the partition
logic or ordering of the messages
will be affected
--replica-assignment <String: A list of manual partition-to-broker
broker_id_for_part1_replica1 : assignments for the topic being
broker_id_for_part1_replica2 , created or altered.
broker_id_for_part2_replica1 :
broker_id_for_part2_replica2 , ...>
--replication-factor <Integer: The replication factor for each
replication factor> partition in the topic being created.
--topic <String: topic> The topic to create, alter, describe
or delete. It also accepts a regular
expression, except for --create
option. Put topic name in double
quotes and use the '\' prefix to
escape regular expression symbols; e.
g. "test\.topic".
--topics-with-overrides if set when describing topics, only
show topics that have overridden
configs
--unavailable-partitions if set when describing topics, only
show partitions whose leader is not
available
--under-replicated-partitions if set when describing topics, only
show under replicated partitions
--zookeeper <String: hosts> DEPRECATED, The connection string for
the zookeeper connection in the form
host:port. Multiple hosts can be
given to allow fail-over.
三、測試producer產生數據、consumer消費數據
之前我們創建好topic以後,這裏測試一下如何使用kafka中的kafka-console-producer與kafka-console-consumer來生產數據、另一端消費數據。
還需先了解這裏 發佈-訂閱系統中的代理結構:
producer產生數據到Topic中,然後consumer從要消費的Topic中消費數據。
- 首先啓動producer:
kafka-console-producer --broker-list cluster2-4:9092 --topic test
- 在這裏輸入數據,這些數據會上傳到zookeeper中的 /kafka/broker/test 主題中。
kafka-console-producer 生產者的指令參數:
Option Description
------ -----------
--batch-size <Integer: size> Number of messages to send in a single
batch if they are not being sent
synchronously. (default: 200)
--broker-list <String: broker-list> REQUIRED: The broker list string in
the form HOST1:PORT1,HOST2:PORT2.
--compression-codec [String: The compression codec: either 'none',
compression-codec] 'gzip', 'snappy', 'lz4', or 'zstd'.
If specified without value, then it
defaults to 'gzip'
--help Print usage information.
--line-reader <String: reader_class> The class name of the class to use for
reading lines from standard in. By
default each line is read as a
separate message. (default: kafka.
tools.
ConsoleProducer$LineMessageReader)
--max-block-ms <Long: max block on The max time that the producer will
send> block for during a send request
(default: 60000)
--max-memory-bytes <Long: total memory The total memory used by the producer
in bytes> to buffer records waiting to be sent
to the server. (default: 33554432)
--max-partition-memory-bytes <Long: The buffer size allocated for a
memory in bytes per partition> partition. When records are received
which are smaller than this size the
producer will attempt to
optimistically group them together
until this size is reached.
(default: 16384)
--message-send-max-retries <Integer> Brokers can fail receiving the message
for multiple reasons, and being
unavailable transiently is just one
of them. This property specifies the
number of retires before the
producer give up and drop this
message. (default: 3)
--metadata-expiry-ms <Long: metadata The period of time in milliseconds
expiration interval> after which we force a refresh of
metadata even if we haven't seen any
leadership changes. (default: 300000)
--producer-property <String: A mechanism to pass user-defined
producer_prop> properties in the form key=value to
the producer.
--producer.config <String: config file> Producer config properties file. Note
that [producer-property] takes
precedence over this config.
--property <String: prop> A mechanism to pass user-defined
properties in the form key=value to
the message reader. This allows
custom configuration for a user-
defined message reader.
--request-required-acks <String: The required acks of the producer
request required acks> requests (default: 1)
--request-timeout-ms <Integer: request The ack timeout of the producer
timeout ms> requests. Value must be non-negative
and non-zero (default: 1500)
--retry-backoff-ms <Integer> Before each retry, the producer
refreshes the metadata of relevant
topics. Since leader election takes
a bit of time, this property
specifies the amount of time that
the producer waits before refreshing
the metadata. (default: 100)
--socket-buffer-size <Integer: size> The size of the tcp RECV size.
(default: 102400)
--sync If set message send requests to the
brokers are synchronously, one at a
time as they arrive.
--timeout <Integer: timeout_ms> If set and the producer is running in
asynchronous mode, this gives the
maximum amount of time a message
will queue awaiting sufficient batch
size. The value is given in ms.
(default: 1000)
--topic <String: topic> REQUIRED: The topic id to produce
messages to.
- 接着啓動消費者:
kafka-console-consumer --bootstrap-server cluster2-3:9092 --topic test --from-beginning
後面的 --from-beginning 表示從指定主題中有效的起始位移位置開始消費所有分區的消息。
- 消費者消費到topic的數據:
kafka-console-consumer 消費者的指令參數:
Option Description
------ -----------
--bootstrap-server <String: server to REQUIRED: The server(s) to connect to.
connect to>
--consumer-property <String: A mechanism to pass user-defined
consumer_prop> properties in the form key=value to
the consumer.
--consumer.config <String: config file> Consumer config properties file. Note
that [consumer-property] takes
precedence over this config.
--enable-systest-events Log lifecycle events of the consumer
in addition to logging consumed
messages. (This is specific for
system tests.)
--formatter <String: class> The name of a class to use for
formatting kafka messages for
display. (default: kafka.tools.
DefaultMessageFormatter)
--from-beginning If the consumer does not already have
an established offset to consume
from, start with the earliest
message present in the log rather
than the latest message.
--group <String: consumer group id> The consumer group id of the consumer.
--help Print usage information.
--isolation-level <String> Set to read_committed in order to
filter out transactional messages
which are not committed. Set to
read_uncommittedto read all
messages. (default: read_uncommitted)
--key-deserializer <String:
deserializer for key>
--max-messages <Integer: num_messages> The maximum number of messages to
consume before exiting. If not set,
consumption is continual.
--offset <String: consume offset> The offset id to consume from (a non-
negative number), or 'earliest'
which means from beginning, or
'latest' which means from end
(default: latest)
--partition <Integer: partition> The partition to consume from.
Consumption starts from the end of
the partition unless '--offset' is
specified.
--property <String: prop> The properties to initialize the
message formatter. Default
properties include:
print.timestamp=true|false
print.key=true|false
print.value=true|false
key.separator=<key.separator>
line.separator=<line.separator>
key.deserializer=<key.deserializer>
value.deserializer=<value.
deserializer>
Users can also pass in customized
properties for their formatter; more
specifically, users can pass in
properties keyed with 'key.
deserializer.' and 'value.
deserializer.' prefixes to configure
their deserializers.
--skip-message-on-error If there is an error when processing a
message, skip it instead of halt.
--timeout-ms <Integer: timeout_ms> If specified, exit if no message is
available for consumption for the
specified interval.
--topic <String: topic> The topic id to consume on.
--value-deserializer <String:
deserializer for values>
--whitelist <String: whitelist> Regular expression specifying
whitelist of topics to include for
consumption.