Kafka
本章节为 Cloud Insight 支持的平台服务的文档的示例。
目前能够想到的章节分为以下几个部分:
- 支持的性能指标
- 如何配置 Kafka 监控
- 常见的问题

性能指标
Cloud Insight 采集 Kafka 以下性能指标:
- messages_in
- net.bytes_in
- net.bytes_out
- net.bytes_rejected
- replication.isr_expands
- replication.isr_shrinks
- replication.leader_elections
- replication.unclean_leader_elections
- request.fetch.failed
- request.fetch.time.99percentile
- request.fetch.time.avg
- request.handler.avg.idle.pct
- request.metadata.time.99percentile
- request.metadata.time.avg
- request.offsets.time.99percentile
- request.offsets.time.avg
- request.produce.failed
- request.produce.time.99percentile
- request.produce.time.avg
- request.update_metadata.time.99percentile
- request.update_metadata.time.avg
配置 Couchbase 监控
JMX
OneAPM Cloud Insight Agent 通过 JMX 获取 Kafka 中的性能指标。
由于每个实体最多可以监控 350 个性能指标,所以您需要按照下方的配置方法,修改配置文件来确定自己需要哪些指标。
有关 JMX 采集方法,请查阅 JMX 远程监控。
编辑配置文件
编辑配置文件 conf.d/kafka.yaml,使 Cloud Insight Agent 可以与 Kafka 通信。
##########
# WARNING
# This sample works only for Kafka >= 0.8.2.
instances:
#  - host: localhost
#    port: 9999
#    name: jmx_instance
#    user: username
#    password: password
#    #java_bin_path: /path/to/java #Optional, should be set if the agent cannot find your java executable
#    #trust_store_path: /path/to/trustStore.jks # Optional, should be set if ssl is enabled
#    #trust_store_password: password
init_config:
  is_jmx: true
  # Metrics collected by this check. You should not have to modify this.
    conf:
      #
      # Aggregate cluster stats
      #
      - include:
          domain: 'kafka.server'
          bean: 'kafka.server:type=BrokerTopicMetrics,name=AllTopicsBytesOutPerSec'
          attribute:
            MeanRate:
              metric_type: counter
              alias: kafka.net.bytes_out
      - include:
          domain: 'kafka.server'
          bean: 'kafka.server:type=BrokerTopicMetrics,name=AllTopicsBytesInPerSec'
          attribute:
            MeanRate:
              metric_type: counter
              alias: kafka.net.bytes_in
      - include:
          domain: 'kafka.server'
          bean: 'kafka.server:type=BrokerTopicMetrics,name=AllTopicsMessagesInPerSec'
          attribute:
            MeanRate:
              metric_type: gauge
              alias: kafka.messages_in
      #
      # Request timings
      #
      - include:
          domain: 'kafka.server'
          bean: 'kafka.server:type=BrokerTopicMetrics,name=AllTopicsFailedFetchRequestsPerSec'
          attribute:
            MeanRate:
              metric_type: gauge
              alias: kafka.request.fetch.failed
      - include:
          domain: 'kafka.server'
          bean: 'kafka.server:type=BrokerTopicMetrics,name=AllTopicsFailedProduceRequestsPerSec'
          attribute:
            MeanRate:
              metric_type: gauge
              alias: kafka.request.produce.failed
      - include:
          domain: 'kafka.network'
          bean: 'kafka.network:type=RequestMetrics,name=Produce-TotalTimeMs'
          attribute:
            Mean:
              metric_type: counter
              alias: kafka.request.produce.time.avg
            99thPercentile:
              metric_type: counter
              alias: kafka.request.produce.time.99percentile
      - include:
          domain: 'kafka.network'
          bean: 'kafka.network:type=RequestMetrics,name=Fetch-TotalTimeMs'
          attribute:
            Mean:
              metric_type: counter
              alias: kafka.request.fetch.time.avg
            99thPercentile:
              metric_type: counter
              alias: kafka.request.fetch.time.99percentile
      - include:
          domain: 'kafka.network'
          bean: 'kafka.network:type=RequestMetrics,name=UpdateMetadata-TotalTimeMs'
          attribute:
            Mean:
              metric_type: counter
              alias: kafka.request.update_metadata.time.avg
            99thPercentile:
              metric_type: counter
              alias: kafka.request.update_metadata.time.99percentile
      - include:
          domain: 'kafka.network'
          bean: 'kafka.network:type=RequestMetrics,name=Metadata-TotalTimeMs'
          attribute:
            Mean:
              metric_type: counter
              alias: kafka.request.metadata.time.avg
            99thPercentile:
              metric_type: counter
              alias: kafka.request.metadata.time.99percentile
      - include:
          domain: 'kafka.network'
          bean: 'kafka.network:type=RequestMetrics,name=Offsets-TotalTimeMs'
          attribute:
            Mean:
              metric_type: counter
              alias: kafka.request.offsets.time.avg
            99thPercentile:
              metric_type: counter
              alias: kafka.request.offsets.time.99percentile
      #
      # Replication stats
      #
      - include:
          domain: 'kafka.server'
          bean: 'kafka.server:type=ReplicaManager,name=ISRShrinksPerSec'
          attribute:
            MeanRate:
              metric_type: counter
                alias: kafka.replication.isr_shrinks
      - include:
          domain: 'kafka.server'
          bean: 'kafka.server:type=ReplicaManager,name=ISRExpandsPerSec'
          attribute:
            MeanRate:
              metric_type: counter
              alias: kafka.replication.isr_expands
      - include:
          domain: 'kafka.server'
          bean: 'kafka.server:type=ControllerStats,name=LeaderElectionRateAndTimeMs'
          attribute:
            MeanRate:
              metric_type: counter
              alias: kafka.replication.leader_elections
      - include:
          domain: 'kafka.server'
          bean: 'kafka.server:type=ControllerStats,name=UncleanLeaderElectionsPerSec'
          attribute:
            MeanRate:
              metric_type: counter
              alias: kafka.replication.unclean_leader_elections
      #
      # Log flush stats
      #
      - include:
          domain: 'kafka.log'
          bean: 'kafka.log:type=LogFlushStats,name=LogFlushRateAndTimeMs'
          attribute:
            MeanRate:
              metric_type: counter
              alias: kafka.log.flush_rate
编辑 Consumer 配置文件
编辑 Consumer 配置文件 conf.d/kafka_consumer.yaml。
init_config:
  instances:
  #  - kafka_connect_str: localhost:19092
  #    zk_connect_str: localhost:2181
  #    zk_prefix: /0.8
  #    consumer_groups:
  #    my_consumer:
  #      my_topic: [0, 1, 4, 12]
重启 Agent
重启 OneAPM Cloud Insight Agent,使配置生效。
您也可以通过查看 Agent Info 信息,来验证配置是否成功。当出现以下信息,则代表安装成功。
Checks
======
[...]
kafka-localhost-9999
--------------------
  - instance #0 [OK]
  - Collected 8 metrics & 0 events