Kafka
本章节为 Cloud Insight 支持的平台服务的文档的示例。
目前能够想到的章节分为以下几个部分:
- 支持的性能指标
- 如何配置 Kafka 监控
- 常见的问题
性能指标
Cloud Insight 采集 Kafka 以下性能指标:
- messages_in
- net.bytes_in
- net.bytes_out
- net.bytes_rejected
- replication.isr_expands
- replication.isr_shrinks
- replication.leader_elections
- replication.unclean_leader_elections
- request.fetch.failed
- request.fetch.time.99percentile
- request.fetch.time.avg
- request.handler.avg.idle.pct
- request.metadata.time.99percentile
- request.metadata.time.avg
- request.offsets.time.99percentile
- request.offsets.time.avg
- request.produce.failed
- request.produce.time.99percentile
- request.produce.time.avg
- request.update_metadata.time.99percentile
- request.update_metadata.time.avg
配置 Couchbase 监控
JMX
OneAPM Cloud Insight Agent 通过 JMX 获取 Kafka 中的性能指标。
由于每个实体最多可以监控 350 个性能指标,所以您需要按照下方的配置方法,修改配置文件来确定自己需要哪些指标。
有关 JMX 采集方法,请查阅 JMX 远程监控。
编辑配置文件
编辑配置文件 conf.d/kafka.yaml
,使 Cloud Insight Agent 可以与 Kafka 通信。
##########
# WARNING
# This sample works only for Kafka >= 0.8.2.
instances:
# - host: localhost
# port: 9999
# name: jmx_instance
# user: username
# password: password
# #java_bin_path: /path/to/java #Optional, should be set if the agent cannot find your java executable
# #trust_store_path: /path/to/trustStore.jks # Optional, should be set if ssl is enabled
# #trust_store_password: password
init_config:
is_jmx: true
# Metrics collected by this check. You should not have to modify this.
conf:
#
# Aggregate cluster stats
#
- include:
domain: 'kafka.server'
bean: 'kafka.server:type=BrokerTopicMetrics,name=AllTopicsBytesOutPerSec'
attribute:
MeanRate:
metric_type: counter
alias: kafka.net.bytes_out
- include:
domain: 'kafka.server'
bean: 'kafka.server:type=BrokerTopicMetrics,name=AllTopicsBytesInPerSec'
attribute:
MeanRate:
metric_type: counter
alias: kafka.net.bytes_in
- include:
domain: 'kafka.server'
bean: 'kafka.server:type=BrokerTopicMetrics,name=AllTopicsMessagesInPerSec'
attribute:
MeanRate:
metric_type: gauge
alias: kafka.messages_in
#
# Request timings
#
- include:
domain: 'kafka.server'
bean: 'kafka.server:type=BrokerTopicMetrics,name=AllTopicsFailedFetchRequestsPerSec'
attribute:
MeanRate:
metric_type: gauge
alias: kafka.request.fetch.failed
- include:
domain: 'kafka.server'
bean: 'kafka.server:type=BrokerTopicMetrics,name=AllTopicsFailedProduceRequestsPerSec'
attribute:
MeanRate:
metric_type: gauge
alias: kafka.request.produce.failed
- include:
domain: 'kafka.network'
bean: 'kafka.network:type=RequestMetrics,name=Produce-TotalTimeMs'
attribute:
Mean:
metric_type: counter
alias: kafka.request.produce.time.avg
99thPercentile:
metric_type: counter
alias: kafka.request.produce.time.99percentile
- include:
domain: 'kafka.network'
bean: 'kafka.network:type=RequestMetrics,name=Fetch-TotalTimeMs'
attribute:
Mean:
metric_type: counter
alias: kafka.request.fetch.time.avg
99thPercentile:
metric_type: counter
alias: kafka.request.fetch.time.99percentile
- include:
domain: 'kafka.network'
bean: 'kafka.network:type=RequestMetrics,name=UpdateMetadata-TotalTimeMs'
attribute:
Mean:
metric_type: counter
alias: kafka.request.update_metadata.time.avg
99thPercentile:
metric_type: counter
alias: kafka.request.update_metadata.time.99percentile
- include:
domain: 'kafka.network'
bean: 'kafka.network:type=RequestMetrics,name=Metadata-TotalTimeMs'
attribute:
Mean:
metric_type: counter
alias: kafka.request.metadata.time.avg
99thPercentile:
metric_type: counter
alias: kafka.request.metadata.time.99percentile
- include:
domain: 'kafka.network'
bean: 'kafka.network:type=RequestMetrics,name=Offsets-TotalTimeMs'
attribute:
Mean:
metric_type: counter
alias: kafka.request.offsets.time.avg
99thPercentile:
metric_type: counter
alias: kafka.request.offsets.time.99percentile
#
# Replication stats
#
- include:
domain: 'kafka.server'
bean: 'kafka.server:type=ReplicaManager,name=ISRShrinksPerSec'
attribute:
MeanRate:
metric_type: counter
alias: kafka.replication.isr_shrinks
- include:
domain: 'kafka.server'
bean: 'kafka.server:type=ReplicaManager,name=ISRExpandsPerSec'
attribute:
MeanRate:
metric_type: counter
alias: kafka.replication.isr_expands
- include:
domain: 'kafka.server'
bean: 'kafka.server:type=ControllerStats,name=LeaderElectionRateAndTimeMs'
attribute:
MeanRate:
metric_type: counter
alias: kafka.replication.leader_elections
- include:
domain: 'kafka.server'
bean: 'kafka.server:type=ControllerStats,name=UncleanLeaderElectionsPerSec'
attribute:
MeanRate:
metric_type: counter
alias: kafka.replication.unclean_leader_elections
#
# Log flush stats
#
- include:
domain: 'kafka.log'
bean: 'kafka.log:type=LogFlushStats,name=LogFlushRateAndTimeMs'
attribute:
MeanRate:
metric_type: counter
alias: kafka.log.flush_rate
编辑 Consumer 配置文件
编辑 Consumer 配置文件 conf.d/kafka_consumer.yaml
。
init_config:
instances:
# - kafka_connect_str: localhost:19092
# zk_connect_str: localhost:2181
# zk_prefix: /0.8
# consumer_groups:
# my_consumer:
# my_topic: [0, 1, 4, 12]
重启 Agent
重启 OneAPM Cloud Insight Agent,使配置生效。
您也可以通过查看 Agent Info 信息,来验证配置是否成功。当出现以下信息,则代表安装成功。
Checks
======
[...]
kafka-localhost-9999
--------------------
- instance #0 [OK]
- Collected 8 metrics & 0 events