You notice that the nodes are degraded in your PeerApp system due to two or more caches on standby and the issue is repeated even after you restart the whole grid.
This might be happening due to a communication problem in your environment. A cache engine stops responding so the leader moves another available one to standby and you end with two cache engines on standby.
A workaround for resolving this issue would be to restart your system while changing the management/iSCSI switch configuration. Raise a support ticket explaining the issue and provide a MW to the PeerApp support team for the system restart.
During the restart process, you'll have to perform the following steps;
- Audit switch configuration and make sure all server and storage connected ports on the switch are defined as stp_edged-port.
- Change MTU size for storage port to 900.
While beginning to restart the system, you (a support agent) will check and clean disk space on the Management server and cache engines where there's insufficient storage, then reboot the server and start service.
Refer to this JIRA for detailed RCA and help in troubleshooting similar cases.