Zabbix监控HP服务器硬件信息

做为Linux系统工程师,在服务器的维护管理当中,除了对系统进行维护管理之外,最重要的还要对服务器的硬件进行监控,比如服务器Raid状态是否正常(如果Raid卡出问题,会影响数据的读写速度),服务器硬盘是否正常(如果硬盘坏掉,严重的情况会丢失数据),服务器电源是否有故障等。除此之外还要对服务器的CPU,内存,处理器等重要设备的温度进行监控,如果温度超过服务器的临界温度则进行报警通知。 HP的服务器在硬件管理方面提供了自己管理工具hpacucli,通过该工具可以查看HP服务器的RAID信息,服务器硬盘等信息。1)安装hpacucli工具(下载地址:HP hpacucli管理工具)

[root@monitor ~]#rpm -ivh hpacucli-9.40-12.0.x86_64.rpm

2)查看服务器RAID信息,硬盘是否正常。

[root@monitor~]# hpacucli ctrl all show configSmart Array P410i in Slot 0 (Embedded) (sn: 5001438018042FF0) array A (SAS, Unused Space: 0 MB)logicaldrive 1 (279.4 GB, RAID 1, OK)physicaldrive 1I:1:1 (port 1I:box 1:bay 1, SAS, 300 GB, OK)physicaldrive 1I:1:2 (port 1I:box 1:bay 2, SAS, 300 GB, OK)

3)通过hpacucli ctrl all show config detail命令可以详细地查看RAID和硬盘的信息。

[root@monitor ~]# hpacucli ctrl all show config detailSmart Array P410i in Slot 0 (Embedded) Bus Interface: PCI Slot: 0 Serial Number: 5001438018042FF0 Cache Serial Number: PBCDH0CRH1FH62 RAID 6 (ADG) Status: Disabled Controller Status: OK Chassis Slot: Hardware Revision: Rev C Firmware Version: 5.14 Rebuild Priority: Medium Expand Priority: Medium Surface Scan Delay: 15 secs Monitor and Performance Delay: 60 min Elevator Sort: Enabled Degraded Performance Optimization: Disabled Inconsistency Repair Policy: Disabled Post Prompt Timeout: 0 secs Cache Board Present: True Cache Status: OK Accelerator Ratio: 25% Read / 75% Write Drive Write Cache: Disabled Total Cache Size: 512 MB No-Battery Write Cache: Disabled Cache Backup Power Source: Capacitors Battery/Capacitor Count: 1 Battery/Capacitor Status: OK SATA NCQ Supported: True Array: AInterface Type: SASUnused Space: 0 MBStatus: OKLogical Drive: 1Size: 279.4 GBFault Tolerance: RAID 1Heads: 255Sectors Per Track: 32Cylinders: 65535Stripe Size: 128 KBStatus: OKArray Accelerator: EnabledUnique Identifier: 600508B1001034373220202020200002Disk Name: /dev/cciss/c0d0Mount Points: /boot 99 MBLogical Drive Label: A00ADBD9PR7AMU1472898DMirror Group 0:physicaldrive 1I:1:1 (port 1I:box 1:bay 1, SAS, 300 GB, OK)Mirror Group 1:physicaldrive 1I:1:2 (port 1I:box 1:bay 2, SAS, 300 GB, OK)physicaldrive 1I:1:1Port: 1IBox: 1Bay: 1Status: OKDrive Type: Data DriveInterface Type: SASSize: 300 GBRotational Speed: 10000Firmware Revision: HPD4Serial Number: ECA1PC80GTS31234Model: HPEG0300FBDSPPHY Count: 2PHY Transfer Rate: 6.0GBPS, Unknownphysicaldrive 1I:1:2Port: 1IBox: 1Bay: 2Status: OKDrive Type: Data DriveInterface Type: SASSize: 300 GBRotational Speed: 10000Firmware Revision: HPD7Serial Number:PMX6902DModel: HPEG0300FBDBRPHY Count: 2PHY Transfer Rate: 6.0GBPS, Unknown

[root@monitor ~]#rpm -ivh hp-health-9.40-1602.44.rhel6.x86_64.rpm

2)通过工具hpasmcli可以查看服务器各部件的温度信息,其中Temp表示各部件当前的温度,Threshold表示临界温度,当当前温度超过临界温度的时候就要注意啦。

[root@monitor ~]# hpasmcli -s ‘show temp’Sensor LocationTempThreshold—— ———————#1AMBIENT23C/73F 42C/107F#2CPU#140C/104F 82C/179F#3CPU#240C/104F 82C/179F#4MEMORY_BD33C/91F 87C/188F#5MEMORY_BD33C/91F 78C/172F#6MEMORY_BD-87C/188F#7MEMORY_BD32C/89F 78C/172F#8MEMORY_BD32C/89F 87C/188F#9MEMORY_BD32C/89F 78C/172F#10MEMORY_BD-87C/188F#11MEMORY_BD32C/89F 78C/172F#12POWER_SUPPLY_BAY33C/91F 59C/138F#13POWER_SUPPLY_BAY47C/116F 73C/163F#14MEMORY_BD29C/84F 72C/161F#15PROCESSOR_ZONE32C/89F 73C/163F#16PROCESSOR_ZONE30C/86F 64C/147F#17MEMORY_BD28C/82F 63C/145F#18PROCESSOR_ZONE39C/102F 69C/156F#19SYSTEM_BD35C/95F 69C/156F#20SYSTEM_BD38C/100F 71C/159F#21SYSTEM_BD44C/111F 65C/149F#22SYSTEM_BD45C/113F 71C/159F#23SYSTEM_BD39C/102F 69C/156F#24SYSTEM_BD47C/116F 69C/156F#25SYSTEM_BD35C/95F 63C/145F#26SYSTEM_BD45C/113F 66C/150F#27SCSI_BACKPLANE_ZONE 35C/95F 60C/140F#28SYSTEM_BD73C/163F 110C/230F

3)通过hpasmcli -s ‘show’查看类似于help的帮助信息,监控的时候要重点关注 DIMM(内存)、FANS(风扇)、POWERSUPPLY(电源模块)、SERVER(系统)、CPU、TEMP(温度)等信息。

[root@monitor ~]# hpasmcli -s ‘show’Invalid ArgumentsSHOW ASRSHOW BOOTSHOW DIMM [ SPD ]SHOW F1SHOW FANSSHOW HTSHOW IMLSHOW IPLSHOW NAMESHOW PORTMAPSHOW POWERMETERSHOW POWERSUPPLYSHOW PXESHOW SERIAL [ BIOS | EMBEDDED | VIRTUAL ]SHOW SERVERSHOW TEMPSHOW TPMSHOW UIDSHOW WOL

会产生上面的错误。

首先查看我监控的脚本,由于是通过traper的思路进行监控,log_file文件依次定义了要监控服务器的主机名(hostname),监控项key以及监控的值。

充满了恐惧的声音,一种不确定的归宿的流动。

Zabbix监控HP服务器硬件信息

相关文章:

你感兴趣的文章:

标签云: