Differences
This shows you the differences between two versions of the page.
Both sides previous revision Previous revision Next revision | Previous revision | ||
linux_wiki:high_system_load [2015/10/06 23:06] billdozor [Troubleshooting Steps] |
linux_wiki:high_system_load [2019/05/25 23:50] (current) |
||
---|---|---|---|
Line 6: | Line 6: | ||
**Checklist** | **Checklist** | ||
- | * Distro: Enterprise Linux 6.x | + | * Distro(s): Enterprise Linux 6 |
+ | |||
+ | ---- | ||
+ | |||
+ | ====== Understanding System Load ====== | ||
+ | |||
+ | Load average can be seen in both " | ||
+ | |||
+ | ===== Traffic/ | ||
+ | |||
+ | Reposting takeaway here in case it goes away. | ||
+ | Source: [[http:// | ||
+ | |||
+ | **On a Single Core CPU System** | ||
+ | * The server is a bridge operator. | ||
+ | * Cars are processes. | ||
+ | * Cars on the bridge are using CPU time. | ||
+ | * Cars waiting to go on the bridge are waiting for CPU time (because the bridge is backed up and they cannot get CPU time immediately. | ||
+ | * Load of 0.00 means there is no traffic on the bridge. | ||
+ | * Load of 1.00 means the bridge is at capacity. No more cars(processes) at this very second can get CPU time without waiting. | ||
+ | * Load over 1.00 means there is a backup. | ||
+ | * 2.00 => there are "two lanes" worth of cars(processes). One lane is being processed, another lane is waiting for CPU time. | ||
+ | |||
+ | **Multi-CPU/ | ||
+ | * Load is relative to how many CPUs are on the system. | ||
+ | * 1 CPU/Core = 100% is load 1.00 | ||
+ | * 2 CPU/Cores = 100% is load 2.00 | ||
+ | * 4 CPU/Cores = 100% is load 4.00 | ||
+ | * Example: From the analogy above, each CPU Core can actively process 1 bridge lane. | ||
+ | |||
+ | ===== Calculate Overall CPU Load ===== | ||
+ | * Get number of CPUs< | ||
+ | OR | ||
+ | nproc</ | ||
+ | * Load Average / NumProccessors = decimal % load | ||
+ | * Example: LoadAvg(1.5) / 2 Processors = 0.75 or 75% system load on a dual core system. | ||
---- | ---- | ||
Line 15: | Line 50: | ||
Typically built in | Typically built in | ||
- | * uptime | + | * top => live system process view |
- | * top | + | * uptime => system uptime and load averages |
- | * vmstat | + | * vmstat |
Need to install (if using a minimal install base) | Need to install (if using a minimal install base) | ||
- | | + | * iostat (sysstat package) |
- | | + | * iotop => live disk i/o |
- | * sar (sysstat package) | + | * lsof => list open files |
Base Repo | Base Repo | ||
<code bash> | <code bash> | ||
- | yum -y install iotop sysstat | + | yum -y install iotop lsof sysstat |
</ | </ | ||
Line 33: | Line 68: | ||
====== Troubleshooting Steps ====== | ====== Troubleshooting Steps ====== | ||
- | - Know how many processors you have. This is essential to determine if load is high. | + | - **Know how many processors you have**. This is essential to determine if load is high. See " |
- | - <code bash> | + | - <code bash> |
- %Load (decimal) = (Load Average / Number Processors) | - %Load (decimal) = (Load Average / Number Processors) | ||
- Example: Number of processors = 2, load average seen = 1.50 | - Example: Number of processors = 2, load average seen = 1.50 | ||
- 1.50 / 2 = 0.75 or 75% load on the processors | - 1.50 / 2 = 0.75 or 75% load on the processors | ||
- | - Check load averages | + | - **Check load averages** |
- Uptime shows the load average for the last 1, 5, and 15 minutes. If it is too high or trending up, time to investigate further. | - Uptime shows the load average for the last 1, 5, and 15 minutes. If it is too high or trending up, time to investigate further. | ||
- <code bash> | - <code bash> | ||
- | - What kind of load | + | - **What kind of load** |
- Use vmstat to determine what kind of system load. " | - Use vmstat to determine what kind of system load. " | ||
- <code bash> | - <code bash> | ||
Line 49: | Line 84: | ||
- CPU: " | - CPU: " | ||
- CPU: " | - CPU: " | ||
+ | - IO: " | ||
+ | - IO: " | ||
- SWAP: " | - SWAP: " | ||
- SWAP: " | - SWAP: " | ||
- If either are high, memory is most likely also very low. | - If either are high, memory is most likely also very low. | ||
- | | + | |
+ | - **Further investigate either high CPU/ | ||
- | ==== High CPU ==== | + | ---- |
+ | |||
+ | ===== High CPU ===== | ||
+ | |||
+ | Clues that you should investigate high CPU usage: | ||
+ | * Low CPU " | ||
+ | * High CPU " | ||
+ | * High CPU " | ||
<code bash> | <code bash> | ||
Line 60: | Line 105: | ||
</ | </ | ||
- | ==== Disk I/O ==== | + | While in top: |
+ | * Turn on highlighting: | ||
+ | * Highlight sort column: ' | ||
+ | ---- | ||
+ | |||
+ | ===== High Memory Use ===== | ||
+ | |||
+ | Notes on Linux memory management | ||
+ | * Linux uses free memory in RAM as a buffer cache to speed up application performance. | ||
+ | * When memory is needed, the buffer cache shrinks to allow other applications to use it. | ||
+ | * **Actual free memory = Memory Free + Buffers + Cached** | ||
+ | |||
+ | Clues that you should investigate high memory usage: | ||
+ | * Memory free: Very low | ||
+ | * Swap si: High swapping in from disk | ||
+ | * Swap so: High swapping out to disk | ||
+ | |||
+ | Start top and sort by %mem usage | ||
+ | <code bash> | ||
+ | top -a | ||
+ | </ | ||
+ | |||
+ | While in top: | ||
+ | * Turn on highlighting: | ||
+ | * Highlight sort column: ' | ||
+ | |||
+ | Other memory columns (shift+ '<' | ||
+ | * VIRT = virtual memory size used: code, data, and shared libraries plus swapped pages. | ||
+ | * RES = resident size: non-swapped physical memory a process is using | ||
+ | |||
+ | ---- | ||
+ | |||
+ | ===== Disk I/O ===== | ||
+ | |||
+ | * I/O wait (wa) is the percentage of time a CPU is waiting on disk. | ||
+ | * If I/O wait % is > (1/# CPU cores), then the CPUs are spending a lot of time waiting on disk. | ||
+ | * Easiest ways to improve disk I/O | ||
+ | * Give the system more memory | ||
+ | * Tune the application to use more in memory caches than disk | ||
+ | |||
+ | Clues that you should investigate high Disk I/O: | ||
+ | * High CPU " | ||
+ | * High CPU " | ||
+ | |||
+ | \\ | ||
+ | **iostat** - View I/O stats with extended statistics, every 3 seconds | ||
+ | <code bash> | ||
+ | iostat -x 3 | ||
+ | </ | ||
+ | * " | ||
+ | |||
+ | \\ | ||
+ | **iotop** - Live disk I/O similar to top | ||
<code bash> | <code bash> | ||
- | iostat | ||
iotop | iotop | ||
</ | </ | ||
+ | |||
+ | \\ | ||
+ | **lsof** - If a particular device is discovered, another option for further details is to list open files for that mount point. | ||
+ | * Device discovered via iostat | ||
+ | * Mount point discovered | ||
+ | * If ' | ||
+ | * Then search lsof for that mount point:< | ||
---- | ---- | ||