linux_wiki:high_system_load

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

linux_wiki:high_system_load [2016/03/14 11:28]
billdozor [Traffic/Bridge analogy]
linux_wiki:high_system_load [2019/05/25 23:50]
Line 1: Line 1:
-====== High System Load ====== 
- 
-**General Information** 
- 
-Troubleshooting high system load. 
- 
-**Checklist** 
-  * Distro: Enterprise Linux 6.x 
- 
----- 
-====== Understanding System Load ====== 
- 
-Load average can be seen in both "uptime" and "top". It shows the load average for the last "1 mins, 5 mins, and 15 mins". 
- 
-===== Traffic/Bridge analogy ===== 
- 
-Reposting takeaway here in case it goes away. 
-Source: [[http://blog.scoutapp.com/articles/2009/07/31/understanding-load-averages|ScoutBlog Load Average]] 
- 
-**On a Single Core CPU System** 
-  * The server is a bridge operator. 
-  * Cars are processes. 
-  * Cars on the bridge are using CPU time. 
-  * Cars waiting to go on the bridge are waiting for CPU time (because the bridge is backed up and they cannot get CPU time immediately. 
-  * Load of 0.00 means there is no traffic on the bridge. 
-  * Load of 1.00 means the bridge is at capacity. No more cars(processes) at this very second can get CPU time without waiting. 
-  * Load over 1.00 means there is a backup. 
-    * 2.00 => there are "two lanes" worth of cars(processes). One lane is being processed, another lane is waiting for CPU time. 
- 
-**Multi-CPU/Core Systems** 
-  * Load is relative to how many CPUs are on the system. 
-    * 1 CPU/Core = 100% is load 1.00 
-    * 2 CPU/Cores = 100% is load 2.00 
-    * 4 CPU/Cores = 100% is load 4.00 
-  * Example: From the analogy above, each CPU Core can actively process 1 bridge lane. 
- 
-===== Calculate Overall CPU Load ===== 
-  * Get number of CPUs<code bash>grep -c proc /proc/cpuinfo 
-OR 
-nproc</code> 
-  * Load Average / NumProccessors = decimal % load 
-    * Example: LoadAvg(1.5) / 2 Processors = 0.75 or 75% system load on a dual core system. 
- 
----- 
- 
-====== Troubleshooting Tools ====== 
- 
-The following tools are useful when troubleshooting system load. 
- 
-Typically built in 
-  * top => live system process view 
-  * uptime => system uptime and load averages 
-  * vmstat => virtual memory stats (memory, swap, i/o, cpu) 
- 
-Need to install (if using a minimal install base) 
-  * iostat (sysstat package) => print i/o statistics 
-  * iotop => live disk i/o 
-  * lsof => list open files 
- 
-Base Repo 
-<code bash> 
-yum -y install iotop lsof sysstat 
-</code> 
- 
----- 
- 
-====== Troubleshooting Steps ====== 
- 
-  - **Know how many processors you have**. This is essential to determine if load is high. See "Understanding Load" above for more details. 
-    - <code bash>grep -c proc /proc/cpuinfo</code> 
-    - %Load (decimal) = (Load Average / Number Processors) 
-    - Example: Number of processors = 2, load average seen = 1.50 
-    - 1.50 / 2 = 0.75 or 75% load on the processors 
-  - **Check load averages** 
-    - Uptime shows the load average for the last 1, 5, and 15 minutes. If it is too high or trending up, time to investigate further. 
-    - <code bash>uptime</code> 
-  - **What kind of load** 
-    - Use vmstat to determine what kind of system load. "vmstat 1" prints stats every 1 second. 
-      - <code bash>vmstat 1</code> 
-    - Important columns to take note of: 
-      - CPU: "wa" => Time spent waiting for I/O. If high, something is probably heavily utilizing disk. 
-      - CPU: "id" => CPU time spent idle. If close to 0, CPU is used heavily. 
-      - CPU: "sy" => CPU time spent running system/kernel processes. Mail and firewalls are common causes of high system use. 
-      - CPU: "us" => CPU time spent running user processes. If high, investigate with top. 
-      - IO: "bi" => blocks received from block device each second. (If high, something is heavily reading a disk) 
-      - IO: "bo" => blocks sent to a block device each second. (If high, something is heavily writing to disk) 
-      - SWAP: "si" => Memory swapped in from disk each second. 
-      - SWAP: "so" => Memory swapped to disk each second. 
-        - If either are high, memory is most likely also very low. 
-      - MEMORY: "free" => memory free. If this is low, there is probably swapping going on as well. 
-    - **Further investigate either high CPU/Memory use or Disk I/O** 
- 
----- 
- 
-==== High CPU ==== 
- 
-Clues that you should investigate high CPU usage: 
-  * Low CPU "id" (idle) 
-  * High CPU "sy" (system processes) 
-  * High CPU "us" (user processes). 
- 
-<code bash> 
-top 
-</code> 
- 
-While in top: 
-  * Turn on highlighting: 'z' 
-  * Highlight sort column: 'x' 
- 
----- 
- 
-==== High Memory Use ==== 
- 
-Notes on Linux memory management 
-  * Linux uses free memory in RAM as a buffer cache to speed up application performance. 
-  * When memory is needed, the buffer cache shrinks to allow other applications to use it. 
-  * **Actual free memory = Memory Free + Buffers + Cached** 
- 
-Clues that you should investigate high memory usage: 
-  * Memory free: Very low 
-  * Swap si: High swapping in from disk 
-  * Swap so: High swapping out to disk 
- 
-Start top and sort by %mem usage 
-<code bash> 
-top -a 
-</code> 
- 
-While in top: 
-  * Turn on highlighting: 'z' 
-  * Highlight sort column: 'x' 
- 
-Other memory columns (shift+ '<' or '>' to change sort columns) 
-  * VIRT = virtual memory size used: code, data, and shared libraries plus swapped pages. 
-  * RES = resident size: non-swapped physical memory a process is using 
- 
----- 
- 
-==== Disk I/O ==== 
- 
-  * I/O wait (wa) is the percentage of time a CPU is waiting on disk. 
-    * If I/O wait % is > (1/# CPU cores), then the CPUs are spending a lot of time waiting on disk. 
-  * Easiest ways to improve disk I/O 
-    * Give the system more memory 
-    * Tune the application to use more in memory caches than disk 
- 
-Clues that you should investigate high Disk I/O: 
-  * High CPU "id" (idle) 
-  * High CPU "wa" (wait) 
- 
-\\ 
-**iostat** - View I/O stats with extended statistics, every 3 seconds 
-<code bash> 
-iostat -x 3 
-</code> 
-  * "%util" => If this is close to 100%, the listed "Device" is the one to investigate. 
- 
-\\ 
-**iotop** - Live disk I/O similar to top 
-<code bash> 
-iotop 
-</code> 
- 
-\\ 
-**lsof** - If a particular device is discovered, another option for further details is to list open files for that mount point. 
-  * Device discovered via iostat 
-  * Mount point discovered 
-    * If 'dm' device:<code bash>ls -l /dev/mapper</code> 
-  * Then search lsof for that mount point:<code bash>lsof | grep /var/</code> 
- 
----- 
  
  • linux_wiki/high_system_load.txt
  • Last modified: 2019/05/25 23:50
  • (external edit)