1. CPU Performance
- Current Usage:
top,htop, ormpstat - Load Average:
uptimeor check the output oftop(the three numbers at the top-right). - Compare the load average to the number of CPU cores (
nproc). - Processes: Monitor high-CPU-consuming processes using
toporps -eo pid,ppid,cmd,%mem,%cpu --sort=-%cpu.
2. Memory Usage
- Total and Free Memory:
free -horvmstat -s. - Swap Usage: Check if swap space is being heavily used (
free -horswapon -s). - Processes Using Most Memory:
toporps -eo pid,ppid,cmd,%mem --sort=-%mem.
3. Disk Usage
- Available Space:
df -hto check disk usage across filesystems. - Inode Usage:
df -ito check inode utilization. - Disk I/O:
iostat,iotop, ordstat. - Error Messages: Review logs in
/var/log/for any disk-related errors.
4. Network Performance
- Network Usage:
iftop,ip -s link, ornetstat. - Connections:
ssornetstatto check open connections and ports. - Packet Loss/Latency:
ping,traceroute, ormtr. - Bandwidth Monitoring:
vnstat,iftop, ornload.
5. System Logs
- General System Logs:
journalctlor/var/log/syslog(for system-wide events). - Kernel Logs:
dmesgorjournalctl -kto check for hardware errors or warnings.
6. Uptime and System Load
- Uptime:
uptimecommand provides server uptime and load averages. - Load Analysis: Investigate load spikes with
saroratop.
7. Running Services and Processes
- Service Status:
systemctl status <service>orservice <service> status. - Zombie/Unnecessary Processes:
ps aux | grep Zto list zombie processes.
8. Security
- Users Logged In:
who,w, orlast. - Unauthorized Logins: Review
/var/log/secureor/var/log/auth.log. - Firewall Rules:
iptables -Lorufw status. - Listening Ports:
ss -tulnornetstat -tuln.
9. Hardware Health
- Temperature and Fan Speed:
sensors(part of lm-sensors package). - RAID Status: Check using
mdadmor vendor tools if RAID is configured.
10. Scheduled Jobs
- Cron Jobs:
crontab -lor check/etc/crontab. - Failures: Examine
/var/log/syslogfor cron-related logs.
11. Backup Status
- Backup Logs: Ensure regular backups are occurring as scheduled.
- Verify Integrity: Test restore procedures periodically.
Automation Tools for Server Health Monitoring
- Nagios, Zabbix, Prometheus, or Datadog for continuous monitoring.
- Custom scripts combining commands like
top,df,iostat, and log parsing can provide quick insights.
By periodically reviewing these parameters, you can ensure the Linux server’s health and address potential issues proactively.