Differences
This shows you the differences between two versions of the page.
linux_wiki:high_system_load [2016/03/14 11:32] billdozor |
linux_wiki:high_system_load [2019/05/25 23:50] |
||
---|---|---|---|
Line 1: | Line 1: | ||
- | ====== High System Load ====== | ||
- | |||
- | **General Information** | ||
- | |||
- | Troubleshooting high system load. | ||
- | |||
- | **Checklist** | ||
- | * Distro: Enterprise Linux 6.x | ||
- | |||
- | ---- | ||
- | ====== Understanding System Load ====== | ||
- | |||
- | Load average can be seen in both " | ||
- | |||
- | ===== Traffic/ | ||
- | |||
- | Reposting takeaway here in case it goes away. | ||
- | Source: [[http:// | ||
- | |||
- | **On a Single Core CPU System** | ||
- | * The server is a bridge operator. | ||
- | * Cars are processes. | ||
- | * Cars on the bridge are using CPU time. | ||
- | * Cars waiting to go on the bridge are waiting for CPU time (because the bridge is backed up and they cannot get CPU time immediately. | ||
- | * Load of 0.00 means there is no traffic on the bridge. | ||
- | * Load of 1.00 means the bridge is at capacity. No more cars(processes) at this very second can get CPU time without waiting. | ||
- | * Load over 1.00 means there is a backup. | ||
- | * 2.00 => there are "two lanes" worth of cars(processes). One lane is being processed, another lane is waiting for CPU time. | ||
- | |||
- | **Multi-CPU/ | ||
- | * Load is relative to how many CPUs are on the system. | ||
- | * 1 CPU/Core = 100% is load 1.00 | ||
- | * 2 CPU/Cores = 100% is load 2.00 | ||
- | * 4 CPU/Cores = 100% is load 4.00 | ||
- | * Example: From the analogy above, each CPU Core can actively process 1 bridge lane. | ||
- | |||
- | ===== Calculate Overall CPU Load ===== | ||
- | * Get number of CPUs< | ||
- | OR | ||
- | nproc</ | ||
- | * Load Average / NumProccessors = decimal % load | ||
- | * Example: LoadAvg(1.5) / 2 Processors = 0.75 or 75% system load on a dual core system. | ||
- | |||
- | ---- | ||
- | |||
- | ====== Troubleshooting Tools ====== | ||
- | |||
- | The following tools are useful when troubleshooting system load. | ||
- | |||
- | Typically built in | ||
- | * top => live system process view | ||
- | * uptime => system uptime and load averages | ||
- | * vmstat => virtual memory stats (memory, swap, i/o, cpu) | ||
- | |||
- | Need to install (if using a minimal install base) | ||
- | * iostat (sysstat package) => print i/o statistics | ||
- | * iotop => live disk i/o | ||
- | * lsof => list open files | ||
- | |||
- | Base Repo | ||
- | <code bash> | ||
- | yum -y install iotop lsof sysstat | ||
- | </ | ||
- | |||
- | ---- | ||
- | |||
- | ====== Troubleshooting Steps ====== | ||
- | |||
- | - **Know how many processors you have**. This is essential to determine if load is high. See " | ||
- | - <code bash> | ||
- | - %Load (decimal) = (Load Average / Number Processors) | ||
- | - Example: Number of processors = 2, load average seen = 1.50 | ||
- | - 1.50 / 2 = 0.75 or 75% load on the processors | ||
- | - **Check load averages** | ||
- | - Uptime shows the load average for the last 1, 5, and 15 minutes. If it is too high or trending up, time to investigate further. | ||
- | - <code bash> | ||
- | - **What kind of load** | ||
- | - Use vmstat to determine what kind of system load. " | ||
- | - <code bash> | ||
- | - Important columns to take note of: | ||
- | - CPU: " | ||
- | - CPU: " | ||
- | - CPU: " | ||
- | - CPU: " | ||
- | - IO: " | ||
- | - IO: " | ||
- | - SWAP: " | ||
- | - SWAP: " | ||
- | - If either are high, memory is most likely also very low. | ||
- | - MEMORY: " | ||
- | - **Further investigate either high CPU/Memory use or Disk I/O** | ||
- | |||
- | ---- | ||
- | |||
- | ===== High CPU ===== | ||
- | |||
- | Clues that you should investigate high CPU usage: | ||
- | * Low CPU " | ||
- | * High CPU " | ||
- | * High CPU " | ||
- | |||
- | <code bash> | ||
- | top | ||
- | </ | ||
- | |||
- | While in top: | ||
- | * Turn on highlighting: | ||
- | * Highlight sort column: ' | ||
- | |||
- | ---- | ||
- | |||
- | ===== High Memory Use ===== | ||
- | |||
- | Notes on Linux memory management | ||
- | * Linux uses free memory in RAM as a buffer cache to speed up application performance. | ||
- | * When memory is needed, the buffer cache shrinks to allow other applications to use it. | ||
- | * **Actual free memory = Memory Free + Buffers + Cached** | ||
- | |||
- | Clues that you should investigate high memory usage: | ||
- | * Memory free: Very low | ||
- | * Swap si: High swapping in from disk | ||
- | * Swap so: High swapping out to disk | ||
- | |||
- | Start top and sort by %mem usage | ||
- | <code bash> | ||
- | top -a | ||
- | </ | ||
- | |||
- | While in top: | ||
- | * Turn on highlighting: | ||
- | * Highlight sort column: ' | ||
- | |||
- | Other memory columns (shift+ '<' | ||
- | * VIRT = virtual memory size used: code, data, and shared libraries plus swapped pages. | ||
- | * RES = resident size: non-swapped physical memory a process is using | ||
- | |||
- | ---- | ||
- | |||
- | ===== Disk I/O ===== | ||
- | |||
- | * I/O wait (wa) is the percentage of time a CPU is waiting on disk. | ||
- | * If I/O wait % is > (1/# CPU cores), then the CPUs are spending a lot of time waiting on disk. | ||
- | * Easiest ways to improve disk I/O | ||
- | * Give the system more memory | ||
- | * Tune the application to use more in memory caches than disk | ||
- | |||
- | Clues that you should investigate high Disk I/O: | ||
- | * High CPU " | ||
- | * High CPU " | ||
- | |||
- | \\ | ||
- | **iostat** - View I/O stats with extended statistics, every 3 seconds | ||
- | <code bash> | ||
- | iostat -x 3 | ||
- | </ | ||
- | * " | ||
- | |||
- | \\ | ||
- | **iotop** - Live disk I/O similar to top | ||
- | <code bash> | ||
- | iotop | ||
- | </ | ||
- | |||
- | \\ | ||
- | **lsof** - If a particular device is discovered, another option for further details is to list open files for that mount point. | ||
- | * Device discovered via iostat | ||
- | * Mount point discovered | ||
- | * If ' | ||
- | * Then search lsof for that mount point:< | ||
- | |||
- | ---- | ||