使用grafana和Diamond构建Graphite监控系统

前言

在豆瓣开源项目里面有个graph-index,
提供监控服务器的状态的目录索引,基于graph-explorer.
类似衍生物很多,就包括我要说的本文用到的项目.先看看我的测试环境的几个截图

一些关键词说明

  1. graphite-web # graphite组件之一, 提供一个django的可以高度扩展的实时画图系统
  2. Whisper # graphite组件之一, 实现数据库存储. 它比rrdtool要慢,因为whisper是使用python写的,而rrdtool是使用C写的。然而速度之间的差异很小
  3. Carbon # 数据收集的结果会传给它, 它会解析数据让它可用于实时绘图. 它默认可会提示一些类型的数据,监听2003和2004端口
  4. Diamond # 他是一个提供了大部分数据收集结果功能的结合,类似cpu, load, memory以及mongodb,rabbitmq,nginx等指标.这样就不需要我大量的写各种类型,因为它都已经提供,并且它提供了可扩展的自定义类型(最后我会展示一个我自己定义的类型)
  5. grafana # 这个面板是基于node, kibana,并且可以在线编辑. 因为是kibana,所以也用到了开元搜索框架elasticsearch
    PS: 其他工具可以参考这里Tools That Work With
    Graphite

    原理解析

    我没有看实际全部代码,大概的流程是这样的:
  6. 启动Carbon-cache等待接收数据(carbon用的是twisted)
  7. 启动graphite-web给grafana提供实时绘图数据api
  8. 启动grafana,调用graphite-web接口获取数据展示出来
  9. Diamond定期获取各类要监测的类型数据发给carbon(默认是5分钟,默认一小时自动重载一次配置)

    实现我这个系统需要做的事情

    安装graphite相关组件(我这里用的是centos)
1
2

yum --enablerepo=epel install graphite-web python-carbon -y
安装grafana需要的组件
1
2
3
4
5
6
7
8
9
10
11

# 增加elasticsearch的repo:
sudo rpm --import http://packages.elasticsearch.org/GPG-KEY-elasticsearch
$cat /etc/yum.repos.d/elasticsearch.repo
[elasticsearch-1.0]
name=Elasticsearch repository for 1.0.x packages
baseurl=http://packages.elasticsearch.org/elasticsearch/1.0/centos
gpgcheck=1
gpgkey=http://packages.elasticsearch.org/GPG-KEY-elasticsearch
enabled=1
sudo yum install nginx nodejs npm java-1.7.0-openjdk elasticsearch -y
下载Diamond和grafana
1
2
3
4
5
6
7

git clone https://github.com/torkelo/grafana
cd grafana
sudo npm install
sudo pip install django-cors-headers configobj # 这可能因为我环境中已经有了一些模块,看缺什么安装什么
git clone https://github.com/BrightcoveOS/Diamond
cd Diamond
开始修改配置
  1. 添加cors支持
    在/usr/lib/python2.6/site-packages/graphite/app_settings.py:
    INSTALLED_APPS里面添加corsheaders,
    MIDDLEWARE_CLASSES里面添加’corsheaders.middleware.CorsMiddleware’
  2. 使用nginx使用grafana
    在nginx.conf 添加类型的一段配置
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50

server {
listen *:80 ;

server_name monitor.dongwm.com; # 我用了虚拟主机
access_log /var/log/nginx/kibana.myhost.org.access.log;

location / {
add_header 'Access-Control-Allow-Origin' "$http_origin";
add_header 'Access-Control-Allow-Credentials' 'true';
root /home/operation/dongwm/grafana/src;
index index.html index.htm;
}

location ~ ^/_aliases$ {
proxy_pass http://127.0.0.1:9200;
proxy_read_timeout 90;
}
location ~ ^/_nodes$ {
proxy_pass http://127.0.0.1:9200;
proxy_read_timeout 90;
}
location ~ ^/.*/_search$ {
proxy_pass http://127.0.0.1:9200;
proxy_read_timeout 90;
}
location ~ ^/.*/_mapping$ {
proxy_pass http://127.0.0.1:9200;
proxy_read_timeout 90;
}

# Password protected end points
location ~ ^/kibana-int/dashboard/.*$ {
proxy_pass http://127.0.0.1:9200;
proxy_read_timeout 90;
limit_except GET {
proxy_pass http://127.0.0.1:9200;
auth_basic "Restricted";
auth_basic_user_file /etc/nginx/conf.d/dongwm.htpasswd;
}
}
location ~ ^/kibana-int/temp.*$ {
proxy_pass http://127.0.0.1:9200;
proxy_read_timeout 90;
limit_except GET {
proxy_pass http://127.0.0.1:9200;
auth_basic "Restricted";
auth_basic_user_file /etc/nginx/conf.d/dongwm.htpasswd;
}
}
  1. 修改grafana的src/config.js:
    graphiteUrl: “http://“+window.location.hostname+”:8020”, # 下面会定义graphite-
    web启动在8020端口
  2. 修改Diamond的配置conf/diamond.conf
1
2

cp conf/diamond.conf.example conf/diamond.conf

主要修改监听的carbon服务器和端口,以及要监控什么类型的数据,看我的一个全文配置

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223

################################################################################
# Diamond Configuration File
################################################################################

################################################################################
### Options for the server
[server]

# Handlers for published metrics.
handlers = diamond.handler.graphite.GraphiteHandler, diamond.handler.archive.ArchiveHandler

# User diamond will run as
# Leave empty to use the current user
user =

# Group diamond will run as
# Leave empty to use the current group
group =

# Pid file
pid_file = /home/dongwm/logs/diamond.pid # 换了pid的地址,因为我的服务都不会root启动

# Directory to load collector modules from
collectors_path = /home/dongwm/Diamond/src/collectors # 收集器的目录,这个/home/dongwm/Diamond就是克隆代码的地址

# Directory to load collector configs from
collectors_config_path = /home/dongwm/Diamond/src/collectors

# Directory to load handler configs from
handlers_config_path = /home/dongwm/Diamond/src/diamond/handler

handlers_path = /home/dongwm/Diamond/src/diamond/handler

# Interval to reload collectors
collectors_reload_interval = 3600 # 收集器定期会重载看有没有配置更新

################################################################################
### Options for handlers
[handlers]

# daemon logging handler(s)
keys = rotated_file

### Defaults options for all Handlers
[[default]]

[[ArchiveHandler]]

# File to write archive log files
log_file = /home/dongwm/logs/diamond_archive.log

# Number of days to keep archive log files
days = 7

[[GraphiteHandler]]
### Options for GraphiteHandler

# Graphite server host
host = 123.126.1.11

# Port to send metrics to
port = 2003

# Socket timeout (seconds)
timeout = 15

# Batch size for metrics
batch = 1

[[GraphitePickleHandler]]
### Options for GraphitePickleHandler

# Graphite server host
host = 123.126.1.11

# Port to send metrics to
port = 2004

# Socket timeout (seconds)
timeout = 15

# Batch size for pickled metrics
batch = 256

[[MySQLHandler]]
### Options for MySQLHandler

# MySQL Connection Info 这个可以你的会不同
hostname = 127.0.0.1
port = 3306
username = root
password =
database = diamond
table = metrics
# INT UNSIGNED NOT NULL
col_time = timestamp
# VARCHAR(255) NOT NULL
col_metric = metric
# VARCHAR(255) NOT NULL
col_value = value

[[StatsdHandler]]
host = 127.0.0.1
port = 8125

[[TSDBHandler]]
host = 127.0.0.1
port = 4242
timeout = 15

[[LibratoHandler]]
user = user@example.com
apikey = abcdefghijklmnopqrstuvwxyz0123456789abcdefghijklmnopqrstuvwxyz01

[[HostedGraphiteHandler]]
apikey = abcdefghijklmnopqrstuvwxyz0123456789abcdefghijklmnopqrstuvwxyz01
timeout = 15
batch = 1

# And any other config settings from GraphiteHandler are valid here

[[HttpPostHandler]]

### Urp to post the metrics
url = http://localhost:8888/
### Metrics batch size
batch = 100


################################################################################
### Options for collectors
[collectors]
[[TencentCollector]] # 本来[collectors]下试没有东西的,这个是我定制的一个类型
ttype = server
[[MongoDBCollector]] # 一般情况,有一些类型是默认enabled = True,也就是启动的,但是大部分是默认不启动《需要显示指定True
enabled = True
host = 127.0.0.1 # 每种类型的参数不同
[[TCPCollector]]
enabled = True
[[NetworkCollector]]
enabled = True
[[NginxCollector]]
enabled = False # 没开启nginx_status 开启了也没用
[[ SockstatCollector]]
enabled = True
[[default]]
### Defaults options for all Collectors

# Uncomment and set to hardcode a hostname for the collector path
# Keep in mind, periods are seperators in graphite
# hostname = my_custom_hostname

# If you prefer to just use a different way of calculating the hostname
# Uncomment and set this to one of these values:

# smart = Default. Tries fqdn_short. If that's localhost, uses hostname_short

# fqdn_short = Default. Similar to hostname -s
# fqdn = hostname output
# fqdn_rev = hostname in reverse (com.example.www)

# uname_short = Similar to uname -n, but only the first part
# uname_rev = uname -r in reverse (com.example.www)

# hostname_short = `hostname -s`
# hostname = `hostname`
# hostname_rev = `hostname` in reverse (com.example.www)

# hostname_method = smart

# Path Prefix and Suffix
# you can use one or both to craft the path where you want to put metrics
# such as: %(path_prefix)s.$(hostname)s.$(path_suffix)s.$(metric)s
# path_prefix = servers
# path_suffix =

# Path Prefix for Virtual Machines
# If the host supports virtual machines, collectors may report per
# VM metrics. Following OpenStack nomenclature, the prefix for
# reporting per VM metrics is "instances", and metric foo for VM
# bar will be reported as: instances.bar.foo...
# instance_prefix = instances

# Default Poll Interval (seconds)
# interval = 300

################################################################################
### Options for logging
# for more information on file format syntax:
# http://docs.python.org/library/logging.config.html#configuration-file-format

[loggers]

keys = root

# handlers are higher in this config file, in:
# [handlers]
# keys = ...

[formatters]

keys = default

[logger_root]

# to increase verbosity, set DEBUG
level = INFO
handlers = rotated_file
propagate = 1

[handler_rotated_file]

class = handlers.TimedRotatingFileHandler
level = DEBUG
formatter = default
# rotate at midnight, each day and keep 7 days
args = ('/home/dongwm/logs/diamond.log', 'midnight', 1, 7)

[formatter_default]

format = [%(asctime)s] [%(threadName)s] %(message)s
datefmt =
启动相关服务
1
2
3
4
5
6

sudo /etc/init.d/nginx reload
sudo /sbin/chkconfig --add elasticsearch
sudo service elasticsearch start
sudo service carbon-cache restart
sudo python /usr/lib/python2.6/site-packages/graphite/manage.py runserver 0.0.0.0:8020 # 启动graphite-web到8020端口
在每个要搜集信息的agent上面安装Diamond,并启动:
1
2
3
4
5

cd /home/dongm/Diamond
python ./bin/diamond --configfile=conf/diamond.conf

# PS: 也可以添加 -l -f在前台显示
自定义数据搜集类型,也就是上面的TencentCollector
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42

# coding=utf-8

"""
获取腾讯微博爬虫的业务指标
"""

import diamond.collector
import pymongo
from pymongo.errors import ConnectionFailure


class TencentCollector(diamond.collector.Collector): # 需要继承至diamond.collector.Collector
PATH = '/home/dongwm/tencent_data'

def get_default_config(self):
config = super(TencentCollector, self).get_default_config()
config.update({
'enabled': 'True',
'path': 'tencent',
'method': 'Threaded',
'ttype': 'agent' # 服务类型 包含agent和server
})
return config

def collect(self):
ttype = self.config['ttype']
if ttype == 'server':
try:
db = pymongo.MongoClient()['tmp']
except ConnectionFailure:
return
now_count = db.data.count()
try:
last_count = db.diamond.find_and_modify(
{}, {'$set': {'last': now_count}}, upsert=True)['last']
except TypeError:
last_count = 0
self.publish('count', now_count)
self.publish('update', abs(last_count - now_count))
if ttype == 'agent':
# somethings..........
添加你要绘图的类型. 这个就是打开grafana, 添加不同的row.给每个添加panel.选择metric的类型就好了

版权声明:本文由 董伟明 原创,未经作者授权禁止任何微信公众号和向掘金(juejin.im)转载,技术博客转载采用 保留署名-非商业性使用-禁止演绎 4.0-国际许可协议
python