Overview
This article will demonstrate how to reboot a LION server 1 or a LION server 2. This information will be used by customers for troubleshooting purposes.
Note: This is a delicate operation because in several cases when the database was rebooted, it froze the cluster on the other server, thus creating a loss of the 2 servers. In this case the SLadmin -dumpall should roll again on the other server.
Step-By-Step Guide
1. Start putting back on the server to the level 9.
- If you did a shutdown -hr now command, the server should go the level 9. Verify with who –r
- If you did a shutdown now only, you will have to start server manually. Press the “ON” button , and verify with who –r
- Verify the “LED”, the numbers change while server rebooting. If you have numbers hanging on the led and you are not getting the prompt, ex 888, this will indicate a major problem. Refer this to Bell staff.
- The server goes up by level and when the level 9 is reached and finished (about 10 minutes) you will be able to 'log' on the server.
2. Perform these verifications before restarting the cluster.
2.1 Verify server CONFIG and JOB CONTROL.
2.1.1 In the ntdb_main, for the server config.
Admin, System maintenance, server config, Host name= (lion1 or 2)
Do a search.
Daemons should be disable, see how to get down this server if there are any that are enable.
2.1.2 In the ntdb_main, for the Job Control.
Admin, System maintenance, Job Control, Host name= (lion1 or 2)
Do a search with the status running.
If you have process running,
highlight the jobs and do CANCEL for NTDB Job Manager ,NTDB Sync Server Config, NTDB DQ_Server_Alarm, NTDB Archive Log Monitor and 3 Lion Intercept Search
2.2 Verify inittab.
On the server that rebooted: processes in the inittab file:
They should be stop after the action on the server config, job control
grep lion /etc/inittab you should see the following: (annexe A)
lion_in_mms:23:once:/usr/ntdb/run/scripts/lion_set_mms.sh INT OFF
If there are processes that are running, like:
NTDB_Archiv_1:????:respawn....
NTDB_DQ_Ser_1:????:respawn....
NTDB_MMS_Mo_1:????:respawn.... ......etc (annexe B)
Note 1: You will need Bell support before continuing. Bell support will have to kill the processes.
Note 2: Wait for confirmation from Bell before you continue to bring the Server back up.
The command to remove a process from inittab is as follows:
rmitab process_name (example: rmitab NTDB_Archiv_1)
3. Bring up the database
Do clust_start
Wait for the message 'Completed' and monitor with tailha, for the initialization of the cluster. Verify closely the messages that appear on the screen. Take particular attention on varyon of volume groups. Sometimes dbmsvg does not get varied on with the clust_start command. In this case we have to varyon the volume group manually. To do this use the command
varyonvg -c ???? (???? = volume group ex: dbmsvg)
In this case we would use varyonvg -c dbmsvg
(THIS should be performed by Bell support)
At the end of tailha you should see
SQLBDA>connect
ORACLE instance started
Database Mounted
Startup completed on………. and the initialisation of the TNS
Do a Ctrl-C to exit
Note: As per the Flash Bulletin 36 from Nortel, there may be a problem with database freeze. If problem arises notify the person in charge.
DATABASE FREEZE
Verify the FULL QUEUE is accumulating calls (pegs) and that the
PROVIDER SELECTED is not accumulating calls
another way to verify is :
enter : oralog (to enter directory) or cd $ORACLE_HOME/rdbms/log
enter : tail –100 alert_LINXX.log (xx= 01 or 02)
If the last line you see is ‘after database mount’ This means the process is hung, wait one minute, verify that the server is taking calls if the full queue is accumulating calls and provider selected, is not accumulating the server is hung.
If this condition arises :
- enter : db_abort (on the server that is rebooting)
- The other server should take the calls. Verify with the SLadmin
- Advise the support
They may take an ORACLE system state dump (see procedure flash bulletin 36).
If other server loses the instance (at the same time it rebooting)
if this condition arises : enter db_abort could take about five minutes
4. Check Oracle and process
look clu 3 cluster process should be rolled
look ora several process ora that will roll
clstat -a the 2 nodes should be UP
lion look no process should roll
5. Reset enable server daemons
5.1. Access the NTDB main menu:
- Go to admin> System maintenance> Server config> Host name = server
5.2. Start with NTDB JOB MANAGER:
- Choose the process NTDB JOB MANAGER
- Click on modify
- Enter "1" + click enable
- Make "commit" to complete the changes
5.3. Choose NTDB Sync Server Config:
- Choose the process Sync Server Config
- Click modify
- Write "1" to the instance + + click enable
- Make commit to complete the changes
5.4. Continue with the rest of the NTDB .... process
- Choose the NTDB processes (log mo archive, DQ-Server-a, MMS mo, RQ-serv)
- Click modify
- Write "1" to the instance + + click enable
- Make commit to complete the changes
5.5. Back Intercept search (traffic)
- Choose the Intercept search process
- Click on modify
- Enter "3" + click enable
- Make "commit" to complete the changes
- Calls may have restarted, SLadmin -dumpall
- Check for the full tail, it could increase a bit BUT will stop
5.6. Process to mount on LION1 and 2
- Setting up the Intercept call detail process
- Choose the process Intercept call detail
- Click on modify
- Enter "1" click on enable
- Make "commit" to complete the changes
5.7. Process to mount only on LION1
- Mounting the process Intercept Apply Pending
- Choose the process Intercept Apply Pending
- click on modify
- enter "1" click on enable
- Make "commit" to complete the changes
Note: on both servers keep LionX.25 disable
On server 2 keep intercept apply pending disable
DO NOT GO FROM LPS PROCESS
5.8. Reassemble the processes of the workstation
- If the server crashed, you will need to disable and enable instances
- In the server config
- in the hostname put bc_lws ??
- in the Job Name take NTDB Job Manager
- do search
- Choose the NTDB Job Manager process
- click on modify
- enter "1" click on enable
- Make "commit" to complete the changes
- Then, enable enable the NTDB Sync Server and then NTDB DQ Client Alarm
Note: make sure those who were down are still down
6. Check weight and traffic on LION servers
6.1. SLadmin ) dump
6.2. SLadmin -dumpall
6.3. You can use the s traffic.sh 3 servers
7. Check the LION process
lion look should have approx. 11 job_exec, 1 sync_d 3 incpt_d
8. Check the DB, cluster and process of the 2 servers
8.1. look clu 3 cluster processes should be rolled
8.2. pps -s cluster the 3 process with an ACTIVE status
8.3. ora look several ora processes that will roll
8.4. pps -s mms 4 MMS processes with an ACTIVE status
9. Put DISABLE the autostart on LEGATO (flash bulletin 2003004221)
On the server lion 1 with a user root,
9.1. export the display export DISPLAY = cams1ou2: 0
9.2. open legato nwadmin &
9.3. Choose Customize> Groups>View>Details
9.4. on the autostart line , click on DISABLE and do the apply
9.5. close the application
10. Check the SNAPSHOT
For the LION1 server ,
10.1. Check if SNAPSHOTS.sql works with LION3.
10.2. See procedure 53 of the alarms-problems book , section LION
11. Stop messages ..zero subnet not allowed ... RIP
For the LION1 server , do
swcons / dev / null
12. Check the server
ckserver or CHECKlion and CHECKall
Confirmation
If the process was successful, the rebooted LION Server should power back on normally.