Table of Contents

High System Load

General Information

Troubleshooting high system load.

Checklist


Understanding System Load

Load average can be seen in both “uptime” and “top”. It shows the load average for the last “1 mins, 5 mins, and 15 mins”.

Traffic/Bridge analogy

Reposting takeaway here in case it goes away. Source: ScoutBlog Load Average

On a Single Core CPU System

Multi-CPU/Core Systems

Calculate Overall CPU Load


Troubleshooting Tools

The following tools are useful when troubleshooting system load.

Typically built in

Need to install (if using a minimal install base)

Base Repo

yum -y install iotop lsof sysstat

Troubleshooting Steps

  1. Know how many processors you have. This is essential to determine if load is high. See “Understanding Load” above for more details.
    1. grep -c proc /proc/cpuinfo
    2. %Load (decimal) = (Load Average / Number Processors)
    3. Example: Number of processors = 2, load average seen = 1.50
    4. 1.50 / 2 = 0.75 or 75% load on the processors
  2. Check load averages
    1. Uptime shows the load average for the last 1, 5, and 15 minutes. If it is too high or trending up, time to investigate further.
    2. uptime
  3. What kind of load
    1. Use vmstat to determine what kind of system load. “vmstat 1” prints stats every 1 second.
      1. vmstat 1
    2. Important columns to take note of:
      1. CPU: “wa” ⇒ Time spent waiting for I/O. If high, something is probably heavily utilizing disk.
      2. CPU: “id” ⇒ CPU time spent idle. If close to 0, CPU is used heavily.
      3. CPU: “sy” ⇒ CPU time spent running system/kernel processes. Mail and firewalls are common causes of high system use.
      4. CPU: “us” ⇒ CPU time spent running user processes. If high, investigate with top.
      5. IO: “bi” ⇒ blocks received from block device each second. (If high, something is heavily reading a disk)
      6. IO: “bo” ⇒ blocks sent to a block device each second. (If high, something is heavily writing to disk)
      7. SWAP: “si” ⇒ Memory swapped in from disk each second.
      8. SWAP: “so” ⇒ Memory swapped to disk each second.
        1. If either are high, memory is most likely also very low.
      9. MEMORY: “free” ⇒ memory free. If this is low, there is probably swapping going on as well.
    3. Further investigate either high CPU/Memory use or Disk I/O

High CPU

Clues that you should investigate high CPU usage:

top

While in top:


High Memory Use

Notes on Linux memory management

Clues that you should investigate high memory usage:

Start top and sort by %mem usage

top -a

While in top:

Other memory columns (shift+ '<' or '>' to change sort columns)


Disk I/O

Clues that you should investigate high Disk I/O:


iostat - View I/O stats with extended statistics, every 3 seconds

iostat -x 3


iotop - Live disk I/O similar to top

iotop


lsof - If a particular device is discovered, another option for further details is to list open files for that mount point.