Performance diagnostics on Linux within 60 seconds
When you log in to a Linux server to troubleshoot a performance issue, what should you check in the first minute? At Netflix, we manage a massive EC2 Linux cloud and have access to various performance analysis tools like Atlas for cloud monitoring and Vector for on-demand instance analysis. Although these tools help solve most issues, there are times when logging into an instance and running standard Linux commands is necessary.
In this article, the Netflix Performance Engineering team will guide you through the best practices of performance analysis on the command line in the first 60 seconds using common Linux tools that should be available on your system.
First 60 Seconds: Overview
By running the following ten commands, you can quickly get an overview of the system's processes and resource usage. By analyzing the error messages and signs of saturation from these commands, you can identify potential bottlenecks and optimize resources. Saturation refers to a resource being overloaded beyond its capacity. Once saturated, it often shows up in longer request queues or increased wait times.
Commands to run:
- uptime
- dmesg | tail
- vmstat 1
- mpstat -P ALL 1
- pidstat 1
- iostat -xz 1
- free -m
- sar -n DEV 1
- sar -n TCP,ETCP 1
- top
Some of these commands require the sysstat package to be installed. The data from these commands helps apply the USE method (Usage, Saturation, Error), allowing you to check each resource’s status—such as CPU, memory, and disk. These tools also help eliminate possible causes of the problem, narrowing down the scope of your investigation and guiding further checks.
The following sections provide a brief explanation of these commands, using an example from a production environment. For more detailed information on how to use these tools, refer to their respective man pages.
uptime
This command provides a quick view of the system's average load, showing how many tasks (processes) are in the system. On Linux, these numbers include processes waiting for CPU or I/O (typically disk I/O). It gives a rough indication of the system's load, but other tools are needed for a deeper understanding.
The three numbers represent the average load over the last one, five, and fifteen minutes. This helps understand how the load has changed over time. For example, if the one-minute value is much lower than the fifteen-minute value, it might indicate that the issue has already passed.
dmesg | tail
This command displays the last 10 system messages if they exist. Look for errors that could be causing performance problems. In the example, we see an OOM killer and TCP drop events.
Never skip this step—it's always worth checking.
vmstat 1
Vmstat is a tool that provides statistics on virtual memory. It prints a summary of key server metrics every second. The first row shows averages since boot, not the previous second. Focus on the columns:
- r: Number of processes in the run queue.
- free: Free memory in KB.
- si, so: Swap-in and swap-out rates.
- us, sy, id, wa, st: CPU usage breakdown (user, system, idle, I/O wait, steal time).
If 'r' exceeds the number of CPUs, the CPU may be saturated. High 'wa' indicates a disk bottleneck. High 'sy' could mean inefficient kernel handling of I/O.
mpstat -P ALL 1
This command shows CPU usage per core, helping detect unbalanced loads. A single core running at high utilization may indicate a single-threaded application.
pidstat 1
Pidstat provides a statistical summary of each process, similar to top but with continuous updates. It's useful for real-time monitoring and recording data.
iostat -xz 1
Iostat monitors disk I/O activity. Key metrics include read/write rates, I/O latency, and device utilization. High 'await' or '%util' values may indicate saturation.
free -m
Free shows memory usage in MB, including buffers and cache. Linux uses unused memory as cache, which can be reclaimed when needed. So, high cache usage isn't necessarily a problem.
sar -n DEV 1
Sar monitors network interface throughput. It shows received and transmitted data rates, helping identify if the network is a bottleneck.
sar -n TCP,ETCP 1
Sar provides insights into TCP metrics, such as active and passive connections, and retransmissions. High retransmission rates may indicate network or server issues.
top
Top offers a comprehensive view of system processes and resource usage. While it's easy to run, it doesn't show trends over time as clearly as tools like vmstat or pidstat.
Subsequent Analysis
For deeper analysis, explore more advanced tools and techniques. Brendan's Linux Performance Tools tutorial at Velocity 2015 covers over 40 commands for observability, benchmarking, tuning, and performance analysis.
Direct Burial Fiber Optic Cable
burial fiber optic cable,fiber optic cable buried,direct burial fiber optic cable,direct burial fiber
Guangzhou Jiqian Fiber Optic Cable Co.,ltd , https://www.jqopticcable.com