You can use the SDA COPY command or the DCL COPY command in your site-specific startup procedure. Digital recommends using the SDA COPY command because it marks the dump file as copied. This is particularly important if the dump was written into the paging file, SYS$SYSTEM:PAGEFILE.SYS, because the SDA COPY command releases to the pager the pages that were occupied by the dump.
Using /IGNORE=NOBACKUP
Because system dump files are set to NOBACKUP, the Backup utility (BACKUP) does not copy dump files to tape unless you use the qualifier /IGNORE=NOBACKUP when invoking BACKUP. When you use the SDA COPY command to copy the system dump file to another file, the new file is not set to NOBACKUP.
As included in the distribution kit, SYS$SYSTEM:SYSDUMP.DMP is
protected against world access. Because a dump file can contain
privileged information, Digital recommends that you continue to protect
dump files from universal read access.
2.3 Invoking SDA in the Site-Specific Startup Command Procedure
Because a listing of the SDA output is an important source of information in determining the cause of a system failure, it is a good idea to have SDA produce such a listing after every failure. The system manager can ensure the creation of a listing by modifying the site-specific startup command procedure SYS$MANAGER:SYSTARTUP_VMS.COM so that it invokes SDA when the system is booted.
When invoked in the site-specific startup procedure, SDA executes the specified commands only if the system is booting immediately after a system failure. SDA examines a flag in the dump file's header that indicates whether it has already processed the file. If the flag is set, SDA merely exits. If the flag is clear, SDA executes the specified commands and sets the flag. This flag is clear when the operating system initially writes a crash dump, except for those resulting from an operator-requested shutdown (for instance, SYS$SYSTEM:SHUTDOWN.COM).
Using SYSDUMP.DMP
The following example shows typical commands that you might add to your site-specific startup command procedure to produce an SDA listing after each failure.
$ ! $ ! Print dump listing if system just failed $ ! $ ANALYZE/CRASH_DUMP SYS$SYSTEM:SYSDUMP.DMP COPY SYS$SYSTEM:SAVEDUMP.DMP ! Save dump file SET OUTPUT DISK1:SYSDUMP.LIS ! Create listing file READ/EXEC ! Read symbols into the SDA symbol table SHOW CRASH ! Display crash information SHOW STACK ! Show current stack SHOW SUMMARY ! List all active processes SHOW PROCESS/PCB/PHD/REG ! Display current process SHOW SYMBOL/ALL ! Print system symbol table EXIT $ PRINT DISK1:SYSDUMP.LIS
The COPY command in the preceding example saves the contents of the file SYS$SYSTEM:SYSDUMP.DMP. If your system's startup command file does not save a copy of the contents of this file, this crash dump information is lost in the next system failure, when the system saves the information about the new failure, overwriting the contents of SYS$SYSTEM:SYSDUMP.DMP.
Using PAGEFILE.SYS
If you are using the SYS$SYSTEM:PAGEFILE.SYS as the crash dump file, you must include SDA commands in SYS$MANAGER:SYSTARTUP_VMS.COM that free the space occupied by the dump so that the pager can use it. For instance:
$ ANALYZE/CRASH_DUMP SYS$SYSTEM:PAGEFILE.SYS . . . COPY dump_filespec EXIT
SDA performs certain tasks prior to bringing a dump into memory, presenting its initial displays, and accepting command input. This section describes those tasks, which include the following:
For detailed information about the investigation of a system failure, see Section 9.
Requirements
To be able to analyze a dump file, your process must have the following:
If your process satisfies these conditions, you can issue the DCL command ANALYZE/CRASH_DUMP to invoke SDA. If you do not specify the name of a dump file in the command, SDA prompts you for the name of the file, as follows:
$ ANALYZE/CRASH_DUMP _Dump File:
The default file specification is as follows:
disk:[default-dir]SYSDUMP.DMP
disk and [default-dir] represent the disk and
directory specified in your last SET DEFAULT command.
3.2 Mapping the Contents of the Dump File
SDA first attempts to map the contents of physical memory as stored in the specified dump file. To do this, it must first locate the system page table (SPT) among its contents. The SPT contains one entry for each page of system virtual address space.
The SPT appears at the largest physical addresses in a typical configuration. As a result, if a dump file is too small, the SPT cannot be written to it in the event of system failure.
If SDA cannot find the SPT in the dump file, it displays either of the following messages:
%SDA-E-SPTNOTFND, system page table not found in dump file
%SDA-E-SHORTDUMP, the dump only contains m out of n pages of physical memory
If SDA displays either of these error messages, you cannot analyze the crash dump, but must take steps to ensure that any subsequent dump can be preserved. To do this, you must increase the size of the dump file, as indicated in Section 2.1, or adjust the system DUMPSTYLE parameter, as discussed in Section 2.1.2.
Under certain conditions, the system might not save some memory locations in the system dump file. For instance, during halt/restart bugchecks, the system does not preserve the contents of general registers. If such a bugcheck occurs, SDA indicates in the SHOW CRASH display that the contents of the registers were destroyed. Additionally, if a bugcheck occurs during system initialization, the contents of the register display might be unreliable. The symptom of such a bugcheck is a SHOW SUMMARY display that shows no processes or only the swapper process.
Also, if you use an SDA command to access a virtual address that has no corresponding physical address, SDA displays the following error message:
%SDA-E-NOTINPHYS, 'location' not in physical memory
When you analyze a subset dump file, if you use an SDA command to access a virtual address that has a corresponding physical address but was not saved in the dump file, SDA displays the following error message:
%SDA-E-MEMNOTSVD, memory not saved in the dump file
After locating and reading the system dump file, SDA attempts to read the system symbol table file into the SDA symbol table. This file, named SYS$SYSTEM:SYS.STB by default, contains most of the global symbols used by the operating system. SDA also reads into its symbol table a subset of SYS$SYSTEM:SYSDEF.STB, called SYS$SYSTEM:REQSYSDEF.STB, that it requires to identify locations in memory.
If SDA cannot find the system symbol table file, or if it is given a file that is not a system symbol table in the /SYMBOL qualifier to the ANALYZE command, it halts with a fatal error.
When SDA finishes building its symbol table, it displays a message identifying itself and the immediate cause of the crash. In the following example, the cause of the crash was an illegal exception occurring at an IPL above IPL$_ASTDEL or while using the interrupt stack.
Dump taken on 28-Jan-1993 18:10:09.79 INVEXCEPTN, Exception while above ASTDEL or on interrupt stack
After displaying the crash summary, SDA executes the commands in the SDA initialization file, if you have established one. SDA refers to its initialization file by using the logical name SDA$INIT. If SDA cannot find the file defined as SDA$INIT, it searches for the file SYS$LOGIN:SDA.INIT.
The initialization file can contain SDA commands that read symbols into SDA's symbol table, define keys, establish a log of SDA commands and output, or perform other tasks. For instance, you might want to use an SDA initialization file to augment SDA's symbol table with definitions helpful in locating system code.
If you issue the following command, SDA includes those symbols that define many of the system's data structures, including those in the I/O database:
READ SYS$SYSTEM:SYSDEF.STB
You might also find it very helpful to define those symbols that identify the modules in the images that make up the executive. You can do this by issuing the following command:
READ/EXECUTIVE SYS$LOADABLE_IMAGES
After SDA executes the commands in the initialization file, it displays its prompt, as follows:
SDA>
The SDA> prompt indicates that you can use SDA interactively and enter SDA commands.
An SDA initialization file can invoke a command procedure with the @
command. However, such command procedures cannot themselves invoke a
command procedure (that is, you cannot have nested command procedures).
4 Analyzing a Running System
Occasionally, an internal problem hinders system performance but does not cause a system failure. By allowing you to examine the running system, SDA provides the means to search for the solution to the problem without disturbing the operating system. For example, you can use SDA to examine the stack and memory of a process that is stalled in a scheduler state, such as a miscellaneous wait (MWAIT) or a suspended (SUSP) state (see the Guide to OpenVMS Performance Management).
If your process has change-mode-to-kernel (CMKRNL) privilege, you can invoke SDA to examine the system. Use the following DCL command:
$ ANALYZE/SYSTEM
OpenVMS System analyzer SDA>
The SDA> prompt indicates that you can use SDA interactively and enter SDA commands. When analyzing a running system, SDA sets its process context to that of the process running SDA.
If you are undertaking an analysis of a running system, take the following considerations into account:
Caution
When using SDA to analyze a running system, use caution in interpreting its displays. Because system states change frequently, it is possible that the information SDA displays might be inconsistent with the actual, volatile state of the system at any given moment.
%SDA-E-CMDNOTVLD, command not valid on the running system
When invoked to analyze either a crash dump or a running system, SDA establishes a default context from which it interprets certain commands.
When the subject of analysis is a uniprocessor system, SDA's context is solely process context. That is, SDA can interpret its process-specific commands in the context of either the process current on the uniprocessor or some other process in some other scheduling state.
When you initially invoke SDA to analyze a crash dump, its process context defaults to that of the process that was current at the time of the crash. When you invoke SDA to analyze a running system, its process context defaults to that of the current process; that is, the one executing SDA.
You can change SDA's process context by issuing any of the following commands:
In a uniprocessor system only one CPU exists, and the concept of SDA CPU context is not an issue. However, for a multiprocessor system with more than one active CPU, SDA must maintain an idea of CPU context to provide a way of displaying information bound to a specific CPU, such as the reason for the bugcheck exception, the currently executing process, the current IPL, the contents of CPU registers, and any owned spin locks. When you first invoke SDA to analyze a crash dump, the SDA current CPU is the CPU that induced the system failure.
Changing the CPU Context
You can use several SDA commands to change the CPU context. When you change the CPU context, the "SDA current process" is changed to the current process on the "SDA current CPU" to synchronize CPU context and process context. If no current process is on the "SDA current CPU," the "SDA current process" is undefined; no process context information will be available until you set SDA process context to a specific process.
Type HELP PROCESS_CONTEXT for specific information about the "SDA current process."
The following SDA commands change the "SDA current CPU":
Command | Description |
---|---|
SET CPU cpu_id | Changes the "SDA current CPU" to CPU cpu_id |
SHOW CPU cpu_id | Changes the "SDA current CPU" to CPU cpu_id |
SHOW CRASH | Changes the "SDA current CPU" to the CPU that induced the system failure |
If you select a process that is the current process on a CPU, the following commands change the "SDA current CPU" to that CPU:
No other SDA commands affect the "SDA current CPU."
Note
When you analyze the running system, you cannot use the SET CPU and SHOW CPU commands because SDA does not have access to all the CPU-specific information about the running system.
In a uniprocessor system, process context might be the process that is current on the CPU or the process in whose context process-specific SDA commands are interpreted. For a multiprocessor system with more than one active CPU, however, the meaning of SDA process context changes so that it includes a way to display information relevant to a specific process both when the process is current on a processor and when the process is not.
You can use several SDA commands to change SDA process context. Following is a list of the results of some of these changes:
Type HELP CPU_CONTEXT for specific information about the "SDA current CPU."
The following SDA commands change the "SDA current process":
Command | Description |
---|---|
SET PROCESS name | Changes the "SDA current process" to the named process |
SET PROCESS /INDEX=n | Changes the "SDA current process" to the process with index n |
SHOW PROCESS name | Changes the "SDA current process" to the named process |
SHOW PROCESS /INDEX=n | Changes the "SDA current process" to the process with index n |
The following commands change the SDA process context if the "SDA current process" is not the current process on the selected CPU:
Command | Description |
---|---|
SET CPU cpu_id | Changes the "SDA current process" to the current process on CPU cpu_id |
SHOW CPU cpu_id | Changes the "SDA current process" to the current process on CPU cpu_id |
SHOW CRASH | Changes the "SDA current process" to the current process on the CPU that induced the system failure |
No other SDA commands affect the "SDA current process."
Note
When you analyze the running system, CPU context is not used because all the CPU-specific information might not be available.
Changing the SDA CPU Context
When you invoke SDA to analyze a crash dump from a multiprocessing system with more than one active CPU, SDA maintains a second dimension of context---its CPU context---that allows it to display certain processor-specific information, such as the reason for the bugcheck exception, the currently executing process, the current IPL, the contents of processor-specific registers, the interrupt stack pointer (ISP), and the spin locks owned by the processor. When you invoke SDA to analyze a multiprocessor's crash dump, its CPU context defaults to that of the processor that induced the system failure.³
You can change the SDA CPU context by using any of the following commands:
Changing CPU context involves an implicit change in process context in either of the following ways:
Likewise, changing process context can involve a switch of CPU context as well. For instance, if you issue a SET PROCESS command for a process that is current on another CPU, SDA automatically changes its CPU context to that of the CPU on which that process is current. The following commands can have this effect if the name or index number (nn) refers to a current process:
The following sections describe the format of SDA commands and the
expressions you can use with SDA commands.
8.1 General Command Format
SDA uses a command format similar to that used by the DCL interpreter. You issue commands in this general format:
command-name[/qualifier...] [parameter][/qualifier...] [!comment]
where:
command-name | Is an SDA command. Each command tells the utility to perform a function. Commands can consist of one or more words, and can be abbreviated to the number of characters that make the command unique. For example, SH stands for SHOW and SE stands for SET. |
/qualifier | Modifies the action of an SDA command. A qualifier is always preceded by a slash (/). Several qualifiers can follow a single parameter or command name, but a slash must precede each. You can abbreviate qualifiers to the shortest string of characters that uniquely identifies the qualifier. |
parameter |
Is the target of the command. For example, SHOW PROCESS RUSKIN tells
SDA to display the context of the process RUSKIN. The command EXAMINE
80104CD0;40 displays the contents of 40 bytes of memory, beginning with
location 80104CD0.
When you supply part of a file specification as a parameter, SDA assumes default values for the omitted portions of the specification. The default device SYS$DISK and default directory are those specified in your most recent SET DEFAULT command. See the OpenVMS DCL Dictionary for a description of the DCL command SET DEFAULT. |
!comment | Consists of text that describes the command, but this text is not actually part of the command. Comments are useful for documenting SDA command procedures. When executing a command, SDA ignores the exclamation point (!) and all characters that follow it on the same line. |
You can use expressions as parameters for some SDA commands, such as SEARCH and EXAMINE. To create expressions, you can use any of the following elements:
The following sections describe elements other than numerals.
8.2.1 Radix Operators
Radix operators determine which numeric base SDA uses to evaluate expressions. You can use one of three radix operators to specify the radix of the numeric expression that follows the operator:
The default radix is hexadecimal. SDA displays hexadecimal numbers with
leading zeros and decimal numbers with leading spaces.
8.2.2 Arithmetic and Logical Operators
There are two types of arithmetic and logical operators, both of which are listed in Table SDA-8.
In evaluating expressions containing binary operators, SDA performs logical AND, OR, and XOR operations, and multiplication, division, and arithmetic shifting before addition and subtraction. Note that the SDA arithmetic operators perform integer arithmetic on 32-bit operands.
Operator | Action |
---|---|
Unary Operators | |
# | Performs a logical NOT of the expression |
+ | Makes the value of the expression positive |
-- | Makes the value of the expression negative |
@ | Evaluates the following expression as a virtual address, then uses the contents of that address as value |
G | Adds 80000000 16 to the value of the expression¹ |
H | Adds 7FFE0000 16 to the value of the expression² |
Binary Operators | |
+ | Addition |
-- | Subtraction |
* | Multiplication |
& | Logical AND |
| | Logical OR |
\ | Logical XOR |
/ | Division³ |
@ | Arithmetic shifting |
SDA uses parentheses as precedence operators. Expressions
enclosed in parentheses are evaluated first. SDA evaluates nested
parenthetical expressions from the innermost to the outermost pairs of
parentheses.
8.2.4 Symbols
Names of symbols can contain from 1 to 31 alphanumeric characters and can include the dollar sign ($) and underscore (_) characters. Symbols can take values from --7FFFFFFF16 to 7FFFFFFF16.
By default, SDA copies symbols into its symbol table from the files SYS$SYSTEM:SYS.STB and SYS$SYSTEM:REQSYSDEF.STB. To add more symbols to the symbol table, you can use the following SDA commands:
In addition, SDA provides the symbols described in Table SDA-9.
Symbol | Meaning |
---|---|
. (period) | Current location |
2P_CDDB | Address of alternate CDDB for MSCP-served device¹ |
2P_UCB | Address of alternate UCB for dual-pathed device¹ |
AMB | Associated mailbox UCB pointer¹ |
AP | Argument pointer² |
CDDB | Address of class driver descriptor block for MSCP-served device¹ |
CLUSTRLOA | Base address of loadable VAXcluster code |
CRB | Address of channel request block¹ |
DDB | Address of device data block¹ |
DDT | Address of driver dispatch table¹ |
nnDRIVER | Base address of a driver prologue table (DPT); such a symbol exists for each loaded device driver in the system³ |
ESP | Executive stack pointer² |
FP | Frame pointer² |
FPEMUL | Base address of the code that emulates floating-point instructions |
G | 80000000 16, the base address of system space |
H | 7FFE0000 16 |
IRP | Address of I/O request packet¹ |
JIB | Job information block |
KSP | Kernel stack pointer² |
LNM | Address of logical name block for mailbox¹ |
MCHK | Address within loadable CPU-specific routines |
MSCP | Address of loadable MSCP server code |
ORB | Address of object rights block¹ |
P0BR | Base register for the program region (P0)² |
P0LR | Length register for the program region (P0)² |
P1BR | Base register for the control region (P1)² |
P1LR | Length register for the control region (P1)² |
PC | Program counter² |
PCB | Process control block |
PDT | Address of port descriptor table¹ |
PHD | Process header |
PSL | Processor status longword² |
R0 through R11 | General registers² |
RMS | Base address of the RMS image |
RWAITCNT | Resource wait count for MSCP-served device¹ |
SB | Address of system block¹ |
SCSLOA | Base address of loadable common SCS services |
SP | Current stack pointer of a process² |
SSP | Supervisor stack pointer² |
SYSLOA | Base address of loadable processor-specific system code |
TMSCP | Address of loadable TMSCP server code |
UCB | Address of unit control block¹ |
USP | User stack pointer² |
VCB | Address of volume control block for mounted device¹ |
When SDA displays an address, it displays that address both in
hexadecimal and as a symbol, if possible. If the address is within
FFF16 of the value of a symbol, SDA displays the symbol plus
the offset from the value of that symbol to the address. If more than
one symbol's value is within FFF16 of the address, SDA
displays the symbol whose value is the closest. If no symbols have
values within FFF16 of the address, SDA displays no symbol.
(For an example, see the description of the SHOW STACK command.)
9 Investigating System Failures
This section discusses how the operating system handles internal errors and suggests procedures that can aid you in determining the causes of these errors. To conclude, it illustrates, through detailed analysis of a sample system failure, how SDA helps you find the causes of operating system problems.
For a complete description of the commands discussed in the sections
that follow, refer to the SDA Commands section.
9.1 General Procedure for Analyzing System Failures
When the operating system detects an internal error so severe that normal operation cannot continue, it signals a condition known as a fatal bugcheck and shuts itself down. A specific bugcheck code describes each such error.
To resolve the problem, you must find the reason for the bugcheck. Most failures are caused by errors in user-written device drivers or other privileged code not supplied by Digital. To identify and correct these errors, you need a listing of the code in question.
4556P001.HTM OSSG Documentation 22-NOV-1996 14:13:03.33
Copyright © Digital Equipment Corporation 1996. All Rights Reserved.