Overview
This article explains the information required by the Skyvera support team in order to analyze a core dump generated by a process restarting spontaneously.
Solution
You might see a process restarting spontaneously in the syslog file (/var/log/messages) as in the example below.
Aug 21 10:59:02 rtr01 systemd-logind: New session 346158 of user textpass.
Aug 21 10:59:02 rtr02 systemd: Started Session 346158 of user textpass.
And this restart could generate a core file. To check if there is a core file related to the restart, you can search for files created around the date and time specified in the previous syslog (e.g. Aug 21 10:59:02), in a different directory for RHEL and Solaris operating systems:
- RHEL
ls -ltrh /var/TKLC/core
- Solaris
ls -ltrh /var/core
Then you will need to provide the following information to the Skyvera support team so that the R&D team can analyze the issue.
The core files are named following the standard syntax
core.<process_name>.<process_id>.gz
For example: core.textpass.13714.gz
IMPORTANT: Before raising a ticket with us make sure that the process name shown in the core file name belongs to a Lithium process and not to a different software running in the machine like an antivirus, server monitoring tool, etc.
Core files
For example, in an RHEL server:
$ls -ltrh /var/TKLC/core/
total 1.3G
-rw-rw-rw- 1 root root 278M Aug 16 18:51 core.tp_ams.11388.gz
-rw-rw-rw- 1 root root 464M Aug 21 10:42 core.textpass.13714.gz
-rw-rw-rw- 1 root root 516M Sep 10 01:33 core.tp_hub.11345.gz
You can see a core file around the time when the texpass process restarted in the syslog example.
gdb or pstack/pflags output
You will need to provide the backtrace for all the relevant core files. The backtrace needs to be collected or run from the system where the core file was generated or a system with the exact same component version, and it's different for RHEL and Solaris operating systems.
RHEL
- Change to root user.
- Debug the component, which is the process the core file belongs to. You will be taken to the gdb prompt.
gdb /usr/TextPass/bin/<component>
In the example above, the component is textpass.gdb /usr/TextPass/bin/textpass
- Redirect the gdb output to a file.
(gdb) set logging on
(gdb) set logging file <log_file_name>.log - To get the backtrace you will have to run the following commands on the gdb prompt. Make sure the core file is uncompressed (not .gz).
(gdb) core /var/TKLC/core/<core file name>
(gdb) bt
(gdb) bt full
(gdb) quit
Solaris
- Change to root user.
- Print the stack trace, redirecting the output to a file.
pstack -F <core filename> > <pstack_file_name>.log
- Print the tracing flags, signals, and other status information, redirecting the output to a file.
pflags -r <core filename> > <pflags_file_name>.log
syslog
The full syslog file specifying the date and time when you saw the textpass process restarting.
tp_walkall
The tp_walkall output of the node where the core files occurred.