linux系统监控、诊断工具之top详解

欢迎进入Linux社区论坛，与200万技术人员互动交流 >>进入

　　接触 linux 的人对于 top 命令可能不会陌生(不同系统名字可能不一样，如 IBM 的 aix 中叫 topas )，它的作用主要用来监控系统实时负载率、进程的资源占用率及其它各项系统状态属性是否正常。

　　下面我们先来看张 top 截图：

　　(1)系统、任务统计信息：

　　前 8 行是系统整体的统计信息。第 1 行是任务队列信息，同 uptime 命令的执行结果。其内容如下：

　　01:06:48

　　当前时间

　　up 1:22

　　系统运行时间，格式为时:分

　　1 user

　　当前登录用户数

　　load average: 0.06, 0.60, 0.48

　　系统负载，即任务队列的平均长度。

　　三个数值分别为 1分钟、5分钟、15分钟前到现在的平均值。

　　注意：这三个值可以用来判定系统是否负载过高――如果值

　　持续大于系统 cpu 个数，就需要优化你的程序或者架构了。

　　(2)进程、 cpu 统计信息：

　　第 2~6 行为进程和CPU的信息。当有多个CPU时，这些内容可能会超过两行。内容如下：

　　Tasks: 29 total

　　进程总数

　　1 running

　　正在运行的进程数

　　28 sleeping

　　睡眠的进程数

　　0 stopped

　　停止的进程数

　　0 zombie

　　僵尸进程数

　　Cpu(s): 0.3% us

　　用户空间占用CPU百分比

　　1.0% sy

　　内核空间占用CPU百分比

　　0.0% ni

　　用户进程空间内改变过优先级的进程占用CPU百分比

　　98.7% id

　　空闲CPU百分比

　　0.0% wa

　　等待输入输出的CPU时间百分比

　　0.0% hi

　　Hardware IRQ

　　0.0% si

　　Software IRQ

　　注：

　　(1)IRQ: IRQ全称为Interrupt Request，即是“中断请求”的意思。

　　(2)st(Steal Time)：Steal time is the percentage of time a virtual CPU waits for a real CPU while the hypervisor is servicing another virtual processor. It’s only relevant in virtualized environments. It represents time when the real CPU was not available to the current virtual machine – it was “stolen” from that VM by the hypervisor (either to run another VM, or for its own needs).

　　So, relatively speaking, what does this mean? A high steal percentage may mean that you may be outgrowing your virtual machine with your hosting company. Other virtual machines may have a larger slice of the CPU’s time and you may need to ask for an upgrade in order to compete. Also, a high steal percentage may mean that your hosting company is overselling virtual machines on your particular server. If you upgrade your virtual machine and your steal percentage doesn’t drop, you may want to seek another provider. A low steal percentage can mean that your applications are working well with your current virtual machine. Since your VM is not wrestling with other VM’s constantly for CPU time, your VM will be more responsive. This may also suggest that your hosting provider is underselling their servers, which is definitely a good thing.0.0% sisi(Software Interrupts)

　　(3)最后两行为内存信息：

　　Mem: 191272k total

　　物理内存总量

　　173656k used

　　使用的物理内存总量

　　17616k free

　　空闲内存总量

　　22052k buffers

　　用作内核缓存的内存量

　　Swap: 192772k total

　　交换区总量

　　0k used

　　使用的交换区总量

　　192772k free

　　空闲交换区总量

　　123988k cached

　　缓冲的交换区总量。

　　内存中的内容被换出到交换区，而后又被换入到内存，但使用过的交换区尚未被覆盖，

　　该数值即为这些内容已存在于内存中的交换区的大小。

　　相应的内存再次被换出时可不必再对交换区写入。

　　PS：如何计算可用内存和已用内存?

　　除了 free -m 之外，也可以看 top：

　　Mem: 255592k total, 167568k used, 88024k free, 25068k buffers

　　Swap: 524280k total, 0k used, 524280k free, 85724k cached

　　3.1 实际的程序可用内存数怎么算呢?

　　The answer is: free + (buffers + cached)

　　88024k + (25068k + 85724k) = 198816k

　　3.2 程序已用内存数又怎么算呢?

　　The answer is: used ? (buffers + cached)

　　167568k ? (25068k + 85724k) = 56776k

　　3.3 怎么判断系统是否内存不足呢?

　　如果你的 swap used 数值大于 0 ，基本可以判断已经遇到内存瓶颈了，要么优化你的代码，要么加内存。

　　(4)进程信息区：

　　统计信息区域的下方显示了各个进程的详细信息。首先来认识一下各列的含义。

　　序号

　　列名

　　含义

　　PID

　　进程id

　　PPID

　　父进程id

　　RUSER

　　Real user name

　　UID

　　进程所有者的用户id

　　USER

　　进程所有者的用户名

　　GROUP

　　进程所有者的组名

　　TTY

　　启动进程的终端名。不是从终端启动的进程则显示为 ?

　　优先级

　　nice值。负值表示高优先级，正值表示低优先级

　　最后使用的CPU，仅在多CPU环境下有意义

　　%CPU

　　上次更新到现在的CPU时间占用百分比

　　TIME

　　进程使用的CPU时间总计，单位秒

　　TIME+

　　进程使用的CPU时间总计，单位1/100秒

　　%MEM

　　进程使用的物理内存百分比

　　VIRT

　　进程使用的虚拟内存总量，单位kb。VIRT=SWAP+RES

　　SWAP

　　进程使用的虚拟内存中，被换出的大小，单位kb。

　　RES

　　进程使用的、未被换出的物理内存大小，单位kb。RES=CODE+DATA

　　CODE

[1][2]

天再高又怎样，踮起脚尖就更接近阳光。

相关文章：

你感兴趣的文章：

标签云：