Overview
You may have noticed some alarms related to an STP node being down. The problem occurs when /var reaches 100% on disk space. On syslog, you may find these two errors:
-
Audit daemon is low on disk space for logging
-
The audit daemon is now halting the system
Solution
The audit daemon is low on disk space for logging because the /var file system does not have enough free space. The audit daemon is halting the server because of this. You may start by removing unwanted files from this partition to free up some space.
The audit daemon is probably flooding the /var/log/messages file, consuming space on the /var partition, and leading to the issue. If that is the case, stop the audit service temporarily in all the servers to prevent it from shutting down any new node. For that, run the following commands as root:
systemctl stop auditd
systemctl disable auditd
Please note that no alarm is raised for the low disk space issue once it reaches a critical threshold. That must be done at the OS level, and the application does not manage OS traps. In case you want to create the OS trap manually, please follow these steps:
- Update /etc/sysconfig/snmpd as follows:
OPTIONS="-Ln -I -smux -p /var/run/snmpd.pid -c /etc/snmp/snmpd.conf udp:11114 udp:161"
- Edit /etc/sysconfig/snmptrapd as follows:
OPTIONS=-Lsd -m ALL -F "%02.2h:%02.2j:%02.2k TRAP%w.%q %v from %B" 11173 162
- Make sure of the disableAuthorization value on /etc/snmp/snmptrapd.conf
disableAuthorization yes
- Edit the /etc/snmp/snmpd.conf file as follows, so the traps do not appear only in the /var/log/messages file in the localhost (replace [remote_IP_address] with the IP of the node where the trap should be sent and set the threshold to the desired value):
########################################################################### snmpd.conf ########################################################################### rocommunity public rwcommunity private syslocation test syscontact mbalance sysservices 78 trapsink 127.0.0.1 public trapcommunity public authtrapenable 1 createUser internal MD5 "snmp1234" iquerySecName internal rouser internal defaultMonitors yes monitor -r 1 -u internal -o dskErrorMsg "Disk Usage Error" dskErrorFlag != 0 trap2sink [remote_IP_address] public 162 includeAllDisks 70% (The value that you want)
- Restart the SNMPD service:
systemctl restart snmpd
- Restart the SNMPTrapD service:
systemctl restart snmptrapd
Make sure the OS MIBs listed below are installed on the supervision system:
- HOST-RESOURCES-MIB.txt
- HOST-RESOURCES-TYPES.txt
- UCD-DISKIO-MIB.txt
- RFC1213-MIB
- UCD-SNMP-MIB
- DISMAN-EVENT-MIB
The MIB files can be extracted directly from the servers under /usr/local/share/snmp/mibs. This configuration will send an SNMP trap if the disks usage exceeds 70% (or the value you specified)