Ahsan Habib

Welcome to my LifeSytle blog

What and how to check any linux Server/Systems health

Posted on November 15, 2024 by Ahsan Habib

Reply

1. CPU Performance

Current Usage: top, htop, or mpstat
Load Average: uptime or check the output of top (the three numbers at the top-right).
Compare the load average to the number of CPU cores (nproc).
Processes: Monitor high-CPU-consuming processes using top or ps -eo pid,ppid,cmd,%mem,%cpu --sort=-%cpu.

2. Memory Usage

Total and Free Memory: free -h or vmstat -s.
Swap Usage: Check if swap space is being heavily used (free -h or swapon -s).
Processes Using Most Memory: top or ps -eo pid,ppid,cmd,%mem --sort=-%mem.

3. Disk Usage

Available Space: df -h to check disk usage across filesystems.
Inode Usage: df -i to check inode utilization.
Disk I/O: iostat, iotop, or dstat.
Error Messages: Review logs in /var/log/ for any disk-related errors.

4. Network Performance

Network Usage: iftop, ip -s link, or netstat.
Connections: ss or netstat to check open connections and ports.
Packet Loss/Latency: ping, traceroute, or mtr.
Bandwidth Monitoring: vnstat, iftop, or nload.

5. System Logs

General System Logs: journalctl or /var/log/syslog (for system-wide events).
Kernel Logs: dmesg or journalctl -k to check for hardware errors or warnings.

6. Uptime and System Load

Uptime: uptime command provides server uptime and load averages.
Load Analysis: Investigate load spikes with sar or atop.

7. Running Services and Processes

Service Status: systemctl status <service> or service <service> status.
Zombie/Unnecessary Processes: ps aux | grep Z to list zombie processes.

8. Security

Users Logged In: who, w, or last.
Unauthorized Logins: Review /var/log/secure or /var/log/auth.log.
Firewall Rules: iptables -L or ufw status.
Listening Ports: ss -tuln or netstat -tuln.

9. Hardware Health

Temperature and Fan Speed: sensors (part of lm-sensors package).
RAID Status: Check using mdadm or vendor tools if RAID is configured.

10. Scheduled Jobs

Cron Jobs: crontab -l or check /etc/crontab.
Failures: Examine /var/log/syslog for cron-related logs.

11. Backup Status

Backup Logs: Ensure regular backups are occurring as scheduled.
Verify Integrity: Test restore procedures periodically.

Automation Tools for Server Health Monitoring

Nagios, Zabbix, Prometheus, or Datadog for continuous monitoring.
Custom scripts combining commands like top, df, iostat, and log parsing can provide quick insights.

By periodically reviewing these parameters, you can ensure the Linux server’s health and address potential issues proactively.