Overview
An N+1 cluster is showing degraded, and there are more than one CE (Cache Engine) on standby state when there should be only one. This article explains a workaround to enable the system again.
Solution
To solve this issue, we will first restart any standby node, except the leader, and verify if it fixes the issue. If the issue is not solved, then we will restart the leader which will start the leader election process and eventually contact all CEs again.
Follow the steps below for all CEs (Cache Engines) in standby except for the leader:
-
On the UltraBand console (run
su admin
on the command prompt), capture the output of the following commands for later comparison:console> show status
console> show volumes
console> show eth-status
console> show uptime
-
Get the standby CE number, from Status > Logical View, in the Cache Engines section. In this example, the CE number is 3.
-
Stop the CE:
console> oper server 3
console> stop
console> exit
-
Verify the CE status until it stops:
console> show status
-
Start the CE:
console> oper server 3
console> start
console> exit
-
Verify the CE status until it starts. It can last up to 3 to 4 hours, depending on the CMDB size.
console> show status
-
Once the CE has started, capture the output and compare it with the result from step 1.
console> show status
console> show volumes
console> show eth-status
console> show uptime
-
Repeat steps 1 to 7 for every CE in standby (except for the leader in the first instance).
-
Verify if the system is enabled, as described in the Verification section.
If, even after restarting all but the leader CEs in standby, there are still more than one CE (Cache Engine) in standby, please restart the leader following the same steps described above. You can get the leader CE number from the Cache Engines section in Status > Logical View, or by running show leader
in the console.
Note: If the issue persists please create a support ticket and one of our agents will help you.
Testing
To verify if the solution has worked, go to Status > Logical View and check whether there is only one CE on standby.