提起自动化运维、系统监控这档子事,shell很好,python也不错。人生苦短,shell难学。要想偷懒,果断python。

##(UNIX)free
反向证明为什么要用python,如果坚持纯正的unix运维方式,首先需要记住系统命令,以及命令参数,然后可能还要用各种管道各种正则匹配从log中择取出想要的信息。

1
2
3
4
Total Memory:
$ free -m | grep Mem | awk '{print $2}'
Used Memory:
$ free -m | grep Mem | awk '{print $3}'

##psutil
Python系统性能监控模块,相当于top、lsof、ps、ifconfig…的结合体。同时也可以用于分析和限制系统资源以及进程管理。

#####(1)CPU监控
Linux中CPU的利用率有以下几个部分:

  • User Time,执行用户进程的时间百分比
  • System Time,执行内核进程和中断时间百分比
  • Wait IO, 由于IO等待所导致的CPU idle状态的时间百分比 (比如监控Mongo Server的IO性能)
  • idle
1
2
3
4
5
6
7
8
9
10
获取CPU状态
>>> import psutil
>>> psutil.cpu_times()
scputimes(user=85601.59, nice=1.09, system=11940.79, idle=635655.96, iowait=97120.9, irq=10.46, softirq=711.03, steal=965.51, guest=0.0, guest_nice=0.0)
获取逻辑CPU个数
>>> psutil.cpu_count()
4
获取CPU的物理个数
>>> psutil.cpu_count(logical=False)
4

#####(2)内存信息

1
2
3
4
5
6
7
8
9
10
>>> import psutil
>>> mem = psutil.virtual_memory()
>>> mem.total
7934197760L
>>> mem.used
7770337280L
>>> mem.free
155074560L
>>> psutil.swap_memory()
sswap(total=4294963200L, used=181596160L, free=4113367040L, percent=4.2, sin=8687616, sout=184086528)

#####(3)磁盘信息

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
获取磁盘完整信息
>>> psutil.disk_partitions()
[sdiskpart(device='/dev/xvda1', mountpoint='/', fstype='ext4', opts='rw,relatime,nobarrier,data=ordered'),
sdiskpart(device='/dev/xvdb1', mountpoint='/Backup', fstype='ext4', opts='rw,relatime,data=ordered'),
sdiskpart(device='/dev/xvdc1', mountpoint='/Volume', fstype='ext4', opts='rw,relatime,data=ordered')]
分区使用情况
>>> psutil.disk_usage('/')
sdiskusage(total=21002579968, used=17712050176, free=2200064000, percent=84.3)
>>> psutil.disk_usage('/dev/xvda1')
sdiskusage(total=4092887040, used=0, free=4092887040, percent=0.0)
系统IO信息, 这个有用,比如mongo server读写情况可以从这里看出硬盘的总的IO个数
>>> psutil.disk_io_counters()
sdiskio(read_count=66199319, write_count=26786580, read_bytes=2762422266880, write_bytes=829198069760, read_time=432435439, write_time=1245140957)
系统某一分区的读写情况
>>> psutil.disk_io_counters(perdisk=True)
{'xvdb1': sdiskio(read_count=8541383, write_count=4234422, read_bytes=369368880128, write_bytes=188905959424, read_time=61153482, write_time=712140206),
'xvda1': sdiskio(read_count=377984, write_count=2748355, read_bytes=4093592576, write_bytes=117483319296, read_time=4094666, write_time=458812306),
'xvdc1': sdiskio(read_count=57336794, write_count=19803933, read_bytes=2389246342144, write_bytes=522809917440, read_time=367414619, write_time=74215286)}

#####(4)网络信息

监控网络流量

1
2
3
4
5
6
7
8
>>> psutil.net_io_counters()
snetio(bytes_sent=210131746810, bytes_recv=505363579247, packets_sent=98638807, packets_recv=224213459, errin=0, errout=0, dropin=0, dropout=0)
精确到监控每个网卡的流量
>>> psutil.net_io_counters(pernic=True)
{'lo': snetio(bytes_sent=196418048009, bytes_recv=196418048009, packets_sent=6117091, packets_recv=6117091, errin=0, errout=0, dropin=0, dropout=0), 'tun0': snetio(bytes_sent=0, bytes_recv=0, packets_sent=0, packets_recv=0, errin=0, errout=0, dropin=0, dropout=0),
'eth1': snetio(bytes_sent=7725158588, bytes_recv=525035597, packets_sent=2945363, packets_recv=10702078, errin=0, errout=0, dropin=0, dropout=0),
'eth0': snetio(bytes_sent=5988555657, bytes_recv=308420566916, packets_sent=89576476, packets_recv=207395744, errin=0, errout=0, dropin=0, dropout=0)}

#####(5)其他系统信息

1
2
3
4
5
6
7
8
9
10
11
12
13
查看当前系统登录用户的信息
>>> psutil.users()
[suser(name='master', terminal='pts/0', host='220.1X.77.XX', started=1447292416.0),
suser(name='master', terminal='pts/1', host='220.1X.77.XX', started=1447292928.0),
suser(name='master', terminal='pts/3', host='220.1X.77.XX', started=1447300480.0)]
获取开机时间,当然这样看不懂,需要格式化。
>>> psutil.boot_time()
1447092458.0
格式化方法:
>>> import datetime
>>>datetime.datetime.fromtimestamp(psutil.boot_time()).strftime("%Y-%m-%d %H-%M-%S")
'2015-11-10 02-07-38'

综合以上的监控脚本(gist)