高效运维：10个功能强悍的Shell监控脚本

2025-02-28 12:01:05 RAIZ

在当今复杂多变的运维环境中，Shell脚本因其灵活性和高效性，成为运维工程师不可或缺的工具之一。本文将详细介绍10个功能强悍的Shell监控脚本，旨在帮助运维人员更好地应对各种挑战，确保系统稳定运行。每个脚本均附有详细代码及应用场景，助力运维工作更加智能化、自动化。

1. CPU使用率监控脚本

代码示例：

#!/bin/bash

LOG_FILE="/var/log/cpu_usage.log"
THRESHOLD=80

get_cpu_usage() {
    mpstat 1 1 | awk '/Average/ {print 100 - $NF"%"}' | sed 's/%//g'
}

current_usage=$(get_cpu_usage)

echo"$(date '+%Y-%m-%d %H:%M:%S') CPU Usage: $current_usage" >> $LOG_FILE

if [ $current_usage -ge $THRESHOLD ]; then
    echo"Warning: CPU usage is above $THRESHOLD%!" | mail -s "CPU Usage Alert" admin@example.com
fi

应用场景：

该脚本每秒钟监控一次CPU使用率，并将结果记录到日志文件中。当CPU使用率超过设定的阈值（如80%）时，发送邮件警报给管理员。适用于需要实时监控CPU性能的场景，如高并发Web服务器或数据库服务器。

2. 内存使用情况监控脚本

代码示例：

#!/bin/bash

LOG_FILE="/var/log/memory_usage.log"
THRESHOLD=80

get_memory_usage() {
    free | awk '/^Mem:/{print $3/$2 * 100.0}' | sed 's/%//g'
}

current_usage=$(get_memory_usage)

echo"$(date '+%Y-%m-%d %H:%M:%S') Memory Usage: $current_usage" >> $LOG_FILE

if [ $current_usage -ge $THRESHOLD ]; then
    echo"Warning: Memory usage is above $THRESHOLD%!" | mail -s "Memory Usage Alert" admin@example.com
fi

应用场景：

类似于CPU监控脚本，该脚本监控内存使用情况，并在内存使用率超过阈值时发送警报。适用于内存资源紧张的环境，如大数据处理或内存密集型应用服务器。

3. 磁盘空间监控脚本

代码示例：

#!/bin/bash

LOG_FILE="/var/log/disk_usage.log"
THRESHOLD=90

check_disk_usage() {
    df -h | awk '$NF=="/"{print $5}' | sed 's/%//g'
}

current_usage=$(check_disk_usage)

echo"$(date '+%Y-%m-%d %H:%M:%S') Disk Usage: $current_usage" >> $LOG_FILE

if [ $current_usage -ge $THRESHOLD ]; then
    echo"Warning: Disk usage is above $THRESHOLD%!" | mail -s "Disk Usage Alert" admin@example.com
fi

应用场景：

监控根分区的磁盘使用情况，当磁盘使用率超过阈值时发送警报。适用于需要确保磁盘空间充足以避免数据丢失或系统崩溃的场景。

4. 网络接口流量监控脚本

代码示例：

#!/bin/bash

LOG_FILE="/var/log/network_traffic.log"
THRESHOLD_IN=100000  # 100Mbps in bytes/sec
THRESHOLD_OUT=100000 # 100Mbps in bytes/sec

get_network_traffic() {
    iface="eth0"
    rx=$(cat /sys/class/net/$iface/statistics/rx_bytes)
    tx=$(cat /sys/class/net/$iface/statistics/tx_bytes)
    sleep 1
    rx_new=$(cat /sys/class/net/$iface/statistics/rx_bytes)
    tx_new=$(cat /sys/class/net/$iface/statistics/tx_bytes)
    rx_rate=$(( (rx_new - rx) / 1024 / 1024 ))
    tx_rate=$(( (tx_new - tx) / 1024 / 1024 ))
    echo"$rx_rate $tx_rate"
}

current_rx_rate $(echo $(get_network_traffic) | awk '{print $1}')
current_tx_rate $(echo $(get_network_traffic) | awk '{print $2}')

echo"$(date '+%Y-%m-%d %H:%M:%S') Network Traffic: RX $current_rx_rate MB/s, TX $current_tx_rate MB/s" >> $LOG_FILE

if [ $current_rx_rate -ge $THRESHOLD_IN ] || [ $current_tx_rate -ge $THRESHOLD_OUT ]; then
    echo"Warning: Network traffic exceeds thresholds!" | mail -s "Network Traffic Alert" admin@example.com
fi

应用场景：

监控特定网络接口（如eth0）的入站和出站流量，并在流量超过设定阈值时发送警报。适用于需要确保网络带宽充足以避免性能瓶颈的场景，如Web服务器或视频流媒体服务器。

5. 进程监控脚本

代码示例：

#!/bin/bash

LOG_FILE="/var/log/process_monitor.log"
PROCESS_NAME="nginx"

check_process() {
    pgrep -x $PROCESS_NAME > /dev/null
    if [ $? -eq 0 ]; then
        echo"Process $PROCESS_NAME is running."
        return 0
    else
        echo"Process $PROCESS_NAME is NOT running."
        return 1
    fi
}

status=$(check_process)

echo"$(date '+%Y-%m-%d %H:%M:%S') $PROCESS_NAME Status: $status" >> $LOG_FILE

if [ $status -eq 1 ]; then
    echo"Warning: $PROCESS_NAME is not running!" | mail -s "$PROCESS_NAME Down Alert" admin@example.com
    # Optionally, you can add a command to start the process here
    # systemctl start nginx
fi

应用场景：

监控特定进程（如nginx）的运行状态，并在进程异常终止时发送警报。适用于需要确保关键服务持续运行的场景，如Web服务器或数据库服务。

6. 日志轮转监控脚本

代码示例：

#!/bin/bash

LOG_DIR="/var/log"
DAYS_THRESHOLD=7

rotate_logs() {
    find $LOG_DIR -type f -name "*.log" -mtime +$DAYS_THRESHOLD -exec gzip {} \; -exec rm {} \;
}

rotate_logs

echo "$(date '+%Y-%m-%d %H:%M:%S') Logs rotated for files older than $DAYS_THRESHOLD days." >> /var/log/log_rotation.log

应用场景：

定期轮转日志文件，压缩并删除超过指定天数（如7天）的旧日志文件。适用于需要管理大量日志文件以节省磁盘空间的场景，如大型Web应用或系统日志服务器。

7. 系统负载监控脚本

代码示例：

#!/bin/bash

LOG_FILE="/var/log/system_load.log"
THRESHOLD=5.0

get_system_load() {
    uptime | awk -F'load average:''{ print $2 }' | awk '{ print $1,$2,$3 }'
}

current_load=$(get_system_load)

echo"$(date '+%Y-%m-%d %H:%M:%S') System Load: $current_load" >> $LOG_FILE

load_avg=$(echo$current_load | awk '{print $1}')
if [ $(echo"$load_avg > $THRESHOLD" | bc -l) -eq 1 ]; then
    echo"Warning: System load is above $THRESHOLD!" | mail -s "System Load Alert" admin@example.com
fi

应用场景：

监控系统负载，并在负载超过设定阈值时发送警报。适用于需要确保系统性能稳定的场景，如高并发应用服务器或数据库服务器。

8. 服务状态监控脚本

代码示例：

#!/bin/bash

LOG_FILE="/var/log/service_status.log"
SERVICES=("nginx""mysql""redis")

check_services() {
    for service in"${SERVICES[@]}"; do
        systemctl is-active --quiet $service
        if [ $? -ne 0 ]; then
            echo"$(date '+%Y-%m-%d %H:%M:%S') Service $service is NOT active." >> $LOG_FILE
            echo"Warning: Service $service is down!" | mail -s "$service Down Alert" admin@example.com
        else
            echo"$(date '+%Y-%m-%d %H:%M:%S') Service $service is active." >> $LOG_FILE
        fi
    done
}

check_services

应用场景：

监控一组关键服务（如nginx、mysql、redis）的状态，并在服务异常时发送警报。适用于需要确保多个服务协同运行的环境，如微服务架构或复杂应用部署。

9. I/O性能监控脚本

代码示例：

#!/bin/bash

LOG_FILE="/var/log/io_performance.log"
THRESHOLD_READ=5000  # IOPS threshold for read operations
THRESHOLD_WRITE=5000 # IOPS threshold for write operations

get_io_performance() {
    iostat -dx 1 1 | awk '/^Device/ { if ($1 != "Device") print $1, $12, $13 }' | sed 's/ //g'
}

current_io=$(get_io_performance)
device=$(echo$current_io | awk '{print $1}')
read_iops=$(echo$current_io | awk '{print $2}')
write_iops=$(echo$current_io | awk '{print $3}')

echo"$(date '+%Y-%m-%d %H:%M:%S') $device IO Performance: Read $read_iops IOPS, Write $write_iops IOPS" >> $LOG_FILE

if [ $read_iops -ge $THRESHOLD_READ ] || [ $write_iops -ge $THRESHOLD_WRITE ]; then
    echo"Warning: I/O performance exceeds thresholds for $device!" | mail -s "I/O Performance Alert" admin@example.com
fi

应用场景：

监控特定磁盘设备的I/O性能，包括读和写操作的IOPS（每秒输入输出操作数），并在性能超过设定阈值时发送警报。适用于需要确保存储系统性能的场景，如数据库服务器或大数据处理平台。

10. 系统安全监控脚本

代码示例：

#!/bin/bash

LOG_FILE="/var/log/security_monitor.log"
UNAUTHORIZED_USERS=("user1""user2")

check_unauthorized_users() {
    for user in"${UNAUTHORIZED_USERS[@]}"; do
        ifid"$user" &>/dev/null; then
            echo"$(date '+%Y-%m-%d %H:%M:%S') Unauthorized user $user found on the system." >> $LOG_FILE
            echo"Warning: Unauthorized user $user detected!" | mail -s "Security Alert" admin@example.com
        fi
    done
}

check_recent_logins() {
    lastb | awk '{print $1, $3}' | sort | uniq -c | sort -nr | head -n 10
}

check_unauthorized_users
recent_logins=$(check_recent_logins)
echo"$(date '+%Y-%m-%d %H:%M:%S') Recent Failed Logins:\n$recent_logins" >> $LOG_FILE

# Optionally, you can set up an alert based on the number of failed logins or specific patterns

应用场景：

监控系统中是否存在未经授权的用户账户，并检查最近的失败登录尝试。适用于需要加强系统安全性的场景，如敏感数据处理环境或高安全性要求的服务器。

本文介绍了10个功能强悍的Shell监控脚本，涵盖了CPU、内存、磁盘、网络、进程、日志、系统负载、服务状态、I/O性能以及系统安全等多个方面的监控需求。这些脚本不仅能够帮助运维人员实时监控系统的运行状态，还能在异常情况发生时及时发送警报，从而有效保障系统的稳定性和安全性。希望这些脚本能够成为你运维工具箱中的得力助手，助力你的运维工作更加高效、智能。

将竭诚为客户提供更专业的个性化信息技术服务

将竭诚为客户提供更专业的个性化信息技术服务

互联网 + 餐饮服务、工业企业、医疗教育

互联网 + 餐饮服务、工业企业、医疗教育

高效运维：10个功能强悍的Shell监控脚本

1. CPU使用率监控脚本

2. 内存使用情况监控脚本

3. 磁盘空间监控脚本

4. 网络接口流量监控脚本

5. 进程监控脚本

6. 日志轮转监控脚本

7. 系统负载监控脚本

8. 服务状态监控脚本

9. I/O性能监控脚本

10. 系统安全监控脚本

您想了解哪方面的产品解决方案？

关于我们

产品&服务

帮助与支持

招贤纳士