Kafka

本章节为 Cloud Insight 支持的平台服务的文档的示例。

目前能够想到的章节分为以下几个部分:

  • 支持的性能指标
  • 如何配置 Kafka 监控
  • 常见的问题

性能指标

Cloud Insight 采集 Kafka 以下性能指标:

  • messages_in
  • net.bytes_in
  • net.bytes_out
  • net.bytes_rejected
  • replication.isr_expands
  • replication.isr_shrinks
  • replication.leader_elections
  • replication.unclean_leader_elections
  • request.fetch.failed
  • request.fetch.time.99percentile
  • request.fetch.time.avg
  • request.handler.avg.idle.pct
  • request.metadata.time.99percentile
  • request.metadata.time.avg
  • request.offsets.time.99percentile
  • request.offsets.time.avg
  • request.produce.failed
  • request.produce.time.99percentile
  • request.produce.time.avg
  • request.update_metadata.time.99percentile
  • request.update_metadata.time.avg

配置 Couchbase 监控

JMX

OneAPM Cloud Insight Agent 通过 JMX 获取 Kafka 中的性能指标。

由于每个实体最多可以监控 350 个性能指标,所以您需要按照下方的配置方法,修改配置文件来确定自己需要哪些指标。

有关 JMX 采集方法,请查阅 JMX 远程监控

编辑配置文件

编辑配置文件 conf.d/kafka.yaml,使 Cloud Insight Agent 可以与 Kafka 通信。

##########
# WARNING
# This sample works only for Kafka >= 0.8.2.

instances:
#  - host: localhost
#    port: 9999
#    name: jmx_instance
#    user: username
#    password: password
#    #java_bin_path: /path/to/java #Optional, should be set if the agent cannot find your java executable
#    #trust_store_path: /path/to/trustStore.jks # Optional, should be set if ssl is enabled
#    #trust_store_password: password

init_config:
  is_jmx: true

  # Metrics collected by this check. You should not have to modify this.
    conf:
      #
      # Aggregate cluster stats
      #
      - include:
          domain: 'kafka.server'
          bean: 'kafka.server:type=BrokerTopicMetrics,name=AllTopicsBytesOutPerSec'
          attribute:
            MeanRate:
              metric_type: counter
              alias: kafka.net.bytes_out
      - include:
          domain: 'kafka.server'
          bean: 'kafka.server:type=BrokerTopicMetrics,name=AllTopicsBytesInPerSec'
          attribute:
            MeanRate:
              metric_type: counter
              alias: kafka.net.bytes_in
      - include:
          domain: 'kafka.server'
          bean: 'kafka.server:type=BrokerTopicMetrics,name=AllTopicsMessagesInPerSec'
          attribute:
            MeanRate:
              metric_type: gauge
              alias: kafka.messages_in

      #
      # Request timings
      #
      - include:
          domain: 'kafka.server'
          bean: 'kafka.server:type=BrokerTopicMetrics,name=AllTopicsFailedFetchRequestsPerSec'
          attribute:
            MeanRate:
              metric_type: gauge
              alias: kafka.request.fetch.failed
      - include:
          domain: 'kafka.server'
          bean: 'kafka.server:type=BrokerTopicMetrics,name=AllTopicsFailedProduceRequestsPerSec'
          attribute:
            MeanRate:
              metric_type: gauge
              alias: kafka.request.produce.failed
      - include:
          domain: 'kafka.network'
          bean: 'kafka.network:type=RequestMetrics,name=Produce-TotalTimeMs'
          attribute:
            Mean:
              metric_type: counter
              alias: kafka.request.produce.time.avg
            99thPercentile:
              metric_type: counter
              alias: kafka.request.produce.time.99percentile
      - include:
          domain: 'kafka.network'
          bean: 'kafka.network:type=RequestMetrics,name=Fetch-TotalTimeMs'
          attribute:
            Mean:
              metric_type: counter
              alias: kafka.request.fetch.time.avg
            99thPercentile:
              metric_type: counter
              alias: kafka.request.fetch.time.99percentile
      - include:
          domain: 'kafka.network'
          bean: 'kafka.network:type=RequestMetrics,name=UpdateMetadata-TotalTimeMs'
          attribute:
            Mean:
              metric_type: counter
              alias: kafka.request.update_metadata.time.avg
            99thPercentile:
              metric_type: counter
              alias: kafka.request.update_metadata.time.99percentile
      - include:
          domain: 'kafka.network'
          bean: 'kafka.network:type=RequestMetrics,name=Metadata-TotalTimeMs'
          attribute:
            Mean:
              metric_type: counter
              alias: kafka.request.metadata.time.avg
            99thPercentile:
              metric_type: counter
              alias: kafka.request.metadata.time.99percentile
      - include:
          domain: 'kafka.network'
          bean: 'kafka.network:type=RequestMetrics,name=Offsets-TotalTimeMs'
          attribute:
            Mean:
              metric_type: counter
              alias: kafka.request.offsets.time.avg
            99thPercentile:
              metric_type: counter
              alias: kafka.request.offsets.time.99percentile

      #
      # Replication stats
      #
      - include:
          domain: 'kafka.server'
          bean: 'kafka.server:type=ReplicaManager,name=ISRShrinksPerSec'
          attribute:
            MeanRate:
              metric_type: counter
                alias: kafka.replication.isr_shrinks
      - include:
          domain: 'kafka.server'
          bean: 'kafka.server:type=ReplicaManager,name=ISRExpandsPerSec'
          attribute:
            MeanRate:
              metric_type: counter
              alias: kafka.replication.isr_expands
      - include:
          domain: 'kafka.server'
          bean: 'kafka.server:type=ControllerStats,name=LeaderElectionRateAndTimeMs'
          attribute:
            MeanRate:
              metric_type: counter
              alias: kafka.replication.leader_elections
      - include:
          domain: 'kafka.server'
          bean: 'kafka.server:type=ControllerStats,name=UncleanLeaderElectionsPerSec'
          attribute:
            MeanRate:
              metric_type: counter
              alias: kafka.replication.unclean_leader_elections

      #
      # Log flush stats
      #
      - include:
          domain: 'kafka.log'
          bean: 'kafka.log:type=LogFlushStats,name=LogFlushRateAndTimeMs'
          attribute:
            MeanRate:
              metric_type: counter
              alias: kafka.log.flush_rate

编辑 Consumer 配置文件

编辑 Consumer 配置文件 conf.d/kafka_consumer.yaml

init_config:

  instances:
  #  - kafka_connect_str: localhost:19092
  #    zk_connect_str: localhost:2181
  #    zk_prefix: /0.8
  #    consumer_groups:
  #    my_consumer:
  #      my_topic: [0, 1, 4, 12]

重启 Agent

重启 OneAPM Cloud Insight Agent,使配置生效。

您也可以通过查看 Agent Info 信息,来验证配置是否成功。当出现以下信息,则代表安装成功。

Checks
======

[...]

kafka-localhost-9999
--------------------
  - instance #0 [OK]
  - Collected 8 metrics & 0 events

常见问题