Overview
This article provides the solution to fix a PACache 6.X, 7.X system with a Disabled status caused by a disconnected traffic interface. When this happens, your PeerApp server will stop handling traffic and the System Status dashboard shows as "Disabled" and the 'cluster has been disabled' alert appears in the System Events alerts as shown below:
Please click on any images in this article to enlarge.
Solution
System Status 'Disabled' has several possible root causes and resolution interventions depending on the actual problem root cause.
You can check which PeerApp system version you have from UBView:
When the problem occurs due to a disconnected traffic interface, the first step is for you to check and verify at the router end whether ports are enabled and thereafter verify and check physical connectivity.
If both are confirmed to be OK, further troubleshooting should be done by the PeerApp Support team therefore submit a support ticket for further assistance.
<supportagent>
Check if the traffic interface is down by doing the following steps:
- Log in to the customer's GUI
- Click on Hardware View
- Look for eth4 and onwards (can be more than 5 depending on the customer's set up):
If eth4 and the ones after it are red, it means data interfaces are down.
If the traffic interface is down, the system can go into a Disabled state.
According to the Routing Table, Ticket Triage and Escalation Paths, L2 agents should use the steps below to check if the traffic interface has been disconnected:
-
SSH into the server.
- Change to the root user.
-
Search for traffic interface error messages in the event log file for the day the system became Disabled. The log file format is messages-YYYYMMDD.
# grep -P 'eth\d' /var/log/messages-20191124
- If the traffic interface has become disconnected and temporized, the response to the previous command will not be empty. You should see a response similar to the following:
Nov 24 00:02:29 ce-1 kernel: dpvi: dpvi_rcv_port_status() eth4 NIC Link is Down
Nov 24 00:02:30 ce-1 kernel: bpmod: eth4 is in the Disconnect mode now; WDT has expired.
Nov 24 00:02:27 ce-1 NetworkManager[1101]: <info> (eth4): link disconnectedThe "WDT has expired" error is caused by a Timeout when there is an interface interruption on the link and the bypass control drivers timeout with the interfaces. If the link interruption is greater than the timeout value on the affected eth interface, PeerApp will drop the interface and log this error.
-
Check and verify at the router end whether ports are enabled or not and thereafter verify and check physical connectivity. This can be done by executing the
ethtool
console command as shown below:Ensure the affected ports are up and enabled and physical connectivity is OK then run the
ethtool
command once more to confirm that the link is successfully detected.
If the system is integrated with bounce mode but the data NIC card is using a bypass NIC card, then the bypass NIC should be changed to standard mode. Check bypass function is disabled or not as follows:
#bpctl_util all get_bypass_info
Expected output
Can't open device file: bpctl
If the bypass function is active, the above command will print as below
bpctl_util all get_bypass_info
00:0b.0 eth4
Name PE210G2BPI9LRD
Firmware version 0xaf
00:0c.0 eth5 slave
To change Bypass NIC to Standard mode, Look at this article "Setting the Silicom-bypass NIC to Standard Mode".
To fix this problem you can restart the VM (Virtual Machine). This is the equivalent of an interface reset. There are two ways to restart the VM:
-
If you have access to the guest machine, run the command below:
# reboot
-
If you do not have access to the guest machine, run the command below from the host machine:
# virsh ub restart
Verification
Check to ensure the system has been Enabled again.
</supportagent>