Error log reports are primarily intended for use by Digital support representatives to identify hardware problems. System managers often find error log reports useful in identifying recurrent system failures that require outside attention.
Parts of the Error Logging Facility
The error logging facility consists of the parts shown in Table 18-2.
Part | Description |
---|---|
Executive routines | Detect errors and events, and write relevant information into error log buffers in memory. |
Error Formatter (ERRFMT) |
The
ERRFMT process, which starts when the system is booted,
periodically empties error log buffers, transforms the descriptions of
errors into standard formats, and stores formatted information in an
error log file on the system disk. (See Section 18.3.2.)
The Error Formatter allows you to send mail to the SYSTEM account or another user if the ERRFMT process encounters a fatal error and deletes itself. (See Section 18.3.3.) |
Error Log utility (ERROR LOG) | Invokes the Error Log Report Formatter (ERF), which selectively reports the contents of an error log file. You invoke ERROR LOG by entering the DCL command ANALYZE/ERROR_LOG. (See Section 18.4.2.) |
DECevent | Selectively reports the contents of an event log file; you invoke DECevent by entering the DCL command DIAGNOSE. (See Section 18.5.) |
The executive routines and the Error Formatter (ERRFMT) process operate continuously without user intervention. The routines fill the error log buffers in memory with raw data on every detected error and event. When one of the available buffers becomes full, or when a time allotment expires, ERRFMT automatically writes the buffers to SYS$ERRORLOG:ERRLOG.SYS.
Sometimes a burst of errors can cause the buffer to fill up before ERRFMT can empty them. You can detect this condition by noting a skip in the error sequence number of the records reported in the error log reports. As soon as ERRFMT frees the buffer space, the executive routines resume preserving error information in the buffers.
The ERRFMT process displays an error message on the system console terminal and stops itself if it encounters excessive errors while writing the error log file. Section 18.3.1 explains how to restart the ERRFMT process.
The Error Formatter (ERRFMT) process is started automatically at boot time. The following sections explain how to perform these tasks:
Task | Section |
---|---|
Restart the ERRFMT process, if necessary | Section 18.3.1 |
Maintain error log files | Section 18.3.2 |
Send mail if the ERRFMT process is deleted | Section 18.3.3 |
To restart the ERRFMT process, follow these steps:
$ @SYS$SYSTEM:STARTUP ERRFMT
Note
If disk quotas are enabled on the system disk, ERRFMT starts only if UIC [1,4] has sufficient quotas.
Because the error log file, SYS$ERRORLOG:ERRLOG.SYS, is a shared file, ERRFMT can write new error log entries while the Error Log utility reads and reports on other entries in the same file.
ERRLOG.SYS increases in size and remains on the system disk until you explicitly rename or delete it. Therefore, devise a plan for regular maintenance of the error log file. One method is to rename ERRLOG.SYS on a daily basis. If you do this, the system creates a new error log file. You might, for example, rename the current copy of ERRLOG.SYS to ERRLOG.OLD every morning at 9:00. To free space on the system disk, you can then back up the renamed version of the error log file on a different volume and delete the file from the system disk.
Another method is to keep the error log file on a disk other than the system disk by defining the logical name SYS$ERRORLOG to be the device and directory where you want to keep error log files; for example:
$ DEFINE/SYSTEM/EXECUTIVE SYS$ERRORLOG DUA2:[ERRORLOG]
To define this logical name each time you start up the system, add the logical name definition to your SYLOGICALS.COM procedure. See Section 5.2.5 for details.
Be careful not to delete error log files inadvertently. You might also want to adopt a file-naming convention that includes a beginning or ending date for the data in the file name.
The Error Formatter (ERRFMT) allows users to send mail to the system manager or to another designated user if the ERRFMT process encounters a fatal error and deletes itself.
Two system logical names, ERRFMT$_SEND_MAIL and ERRFMT$_SEND_TO, control this feature:
You can define these logical names in one of two ways:
If ERRFMT$_SEND_MAIL is defined to be TRUE, you receive a mail message with a subject line saying that ERRFMT is about to delete itself. The operator log file and the output displayed at the system console, OPA0:, contain more detailed information about the failure encountered and instructions on how to restart ERRFMT; however, you are often not at the console to see this information.
If you are using ERRFMT in one mode, for example, with sending mail enabled, and you want to disable sending mail, use the system manager's account to edit SYS$STARTUP:SYLOGICAL.COM, adding the following command:
$ DEFINE/SYSTEM ERRFMT$_SEND_MAIL FALSE
To reenable sending mail, use the system manager's account to edit SYS$STARTUP:SYLOGICAL.COM, adding the following command:
$ DEFINE/SYSTEM ERRFMT$_SEND_MAIL TRUE
Sending mail to the SYSTEM account is enabled by default. However, you can define ERRFMT$_SEND_TO to send mail to another user if ERRFMT is about to delete itself.
To change the user name to receive mail, use the system manager's account to edit SYS$STARTUP:SYLOGICAL.COM, adding an appropriate logical name DEFINE command. For example:
$ DEFINE/SYSTEM ERRFMT$_SEND_TO R_SMITH
Digital recommends that you do not use distribution lists and multiple user names.
Use the Error Log utility (ERROR LOG) to report selectively on the contents of an error log file. You must have the SYSPRV privilege to run ERROR LOG.
ERROR LOG supports most OpenVMS-supported hardware, such as adapters, disks, tapes, CPUs, and memories, but not all communications devices. Some synchronous communications devices are supported.
The operating system automatically writes messages to the latest version of an error log file, SYS$ERRORLOG:ERRLOG.SYS, as the events shown in Table 18-3 occur.
Event | Description |
---|---|
Errors | Device errors, device timeouts, machine checks, bus errors, memory errors (hard or soft error correcting code [ECC] errors), asynchronous write errors, and undefined interrupts |
Volume changes | Volume mounts and dismounts |
System events | System startups, messages from the Send Message to Error Logger ($SNDERR) system service, and time stamps |
You can use ERROR LOG to process error log entries for the following forms of optional output:
Section 18.4.2 explains how to produce error log reports. See the OpenVMS System Management Utilities Reference Manual for examples of error log reports.
The error reports that ERROR LOG produces are useful in two ways:
The detailed contents of the reports are most meaningful to Digital support representatives. However, you can use the reports as an important indicator of the system's reliability. For example, using the DCL command SHOW ERROR, you might see that a particular device is producing a relatively high number of errors. You can then use ERROR LOG to obtain a more detailed report and decide whether to consult your support representative.
If a system component fails, a Digital support representative can study the error reports of the system activity leading up to and including the failure. If a device fails, you can generate error reports immediately after the failure; for example:
Your support representative can then run the appropriate diagnostic program for a thorough analysis of the failed device. Using the combined error logging and diagnostic information, your support representative can proceed to correct the device.
Error reports allow you to anticipate potential failures. Effective use of the Error Log utility in conjunction with diagnostic programs can significantly reduce the amount of system downtime.
You enter the DCL command in the following format:
ANALYZE/ERROR_LOG [/qualifier(s)][filespec[,...]]
where:
qualifier | Specifies the function the ANALYZE/ERROR_LOG command is to perform. |
filespec | Specifies one or more files that contain information to be interpreted for the error log report. |
See the OpenVMS System Management Utilities Reference Manual for details about the command and its parameters and for examples of error log reports.
ERROR LOG issues error messages for inconsistent error log entries. Use the Help Message facility to look up explanations and suggested user actions for these messages.
The following steps show how to produce an error log report for all entries in the error log file and how to print the report:
$ SET PROCESS/PRIVILEGE=SYSPRV
$ SET DEFAULT SYS$ERRORLOG
$ DIRECTORY
$ ANALYZE/ERROR_LOG/OUTPUT=ERRORS.LIS
$ PRINT ERRORS.LIS
Example
$ SET PROCESS/PRIVILEGE=SYSPRV $ SET DEFAULT SYS$ERRORLOG $ DIRECTORY (1) Directory SYS$SYSROOT:[SYSERR] ERRLOG.OLD;2 ERRLOG.OLD;1 ERRLOG.SYS;1 Total of 3 files. $ ANALYZE/ERROR_LOG/OUTPUT=ERRORS.LIS ERRLOG.OLD (2) $ PRINT ERRORS.LIS (3)
Following are explanations of the commands in the example.
This section briefly explains how to specify report formats and produce a report of selected entries.
Table 18-4 contains error log report options. For more details about options and examples of error log reports using options, see the OpenVMS System Management Utilities Reference Manual.
In Order To... | You Can... |
---|---|
Specify report formats |
Change report formats by using qualifiers, including the following:
|
Specify a display device for reports | Use the /OUTPUT qualifier to send reports to a terminal for display or to a disk or magnetic tape file. By default, the system sends the report to the SYS$OUTPUT device. Because error log reports are 72 columns wide, you can display them on the terminal screen. |
Produce a report of selected entries |
Use qualifiers to produce error log reports for specific types of
events and for a specified time interval. For example, you can process
error log entries by selecting a time interval using the /SINCE,
/BEFORE, or /ENTRY qualifiers.
You can specify error log entries for specific events by using the qualifiers /INCLUDE and /EXCLUDE. These qualifiers form a filter to determine which error log entries are selected or rejected. In addition, you can generate error log reports for one or more OpenVMS Cluster members by using the /NODE qualifier. |
Exclude unknown error log entries | By default, when ANALYZE/ERROR_LOG encounters an unknown device, CPU, or error log entry, the utility produces the entry in hexadecimal longword format. Exclude these entries from the report by specifying /EXCLUDE=UNKNOWN_ENTRIES in the command line. |
The DECevent Event Management utility (DECevent) provides the interface between a system user and the operating system's event log files.
DECevent allows system users to produce ASCII reports derived from system event entries. The format of the ASCII reports depends on the command entered on the command language interpreter (CLI) with a maximum character limit of 255 characters.
DECevent uses the error log file, SYS$ERRORLOG:ERRLOG.SYS, as the default input file, unless you specify another input file.
Event reports are useful for determining preventive maintenance by helping to identify areas within the system showing potential failure. Event reports also aid in the diagnosis of a failure by documenting events that led to the failure.
The contents of the event reports are most meaningful to your Digital support representative. However, you can use the event reports as an indicator of system reliability. For example, while using the DCL command SHOW ERROR, you might see that a particular device is producing a higher than normal number of events. You can use DECevent to obtain various detailed reports and determine if you need to contact your Digital support representative.
If a system component fails, your Digital support representative can use the event reports to create a history of events leading up to and including the failure.
Used in conjunction with diagnostic programs, event reports significantly reduce the amount of system down time.
DECevent produces five types of reports:
Report Type | Description |
---|---|
Full (default) | Provides a translation of all available information for each entry in the event log. |
Brief | Provides a translation of key information for each entry in the event log. |
Terse | Provides binary event information and displays register values and other ASCII messages in a condensed format. |
Summary | Provides a statistical summary of the event entries in the event log. |
Fast Error (FSTERR) | Provides a quick, one-line per-entry report of your event log for a variety of disk devices. |
These report types are mutually exclusive; in other words, you can select only one report type in a command.
Section 18.5.5 contains examples of types of reports. The OpenVMS System Management Utilities Reference Manual contains additional examples of the types of reports produced by DECevent.
The following sections explain how to use DECevent:
Task | Section |
---|---|
Invoking and exiting DECevent | Section 18.5.2 |
Using DECevent qualifiers | Section 18.5.3 |
Using additional DECevent commands | Section 18.5.4 |
Producing DECevent reports | Section 18.5.5 |
In addition, restrictions are listed in Section 18.5.6.
To invoke DECevent, enter the following command:
$ DIAGNOSE/TRANSLATE [/qualifier(s)] [file-spec][,..]
Note
The /TRANSLATE qualifier is the default qualifier; typing it on the command line is not necessary.
DECevent does not prompt you. To exit from DECevent, press Ctrl/C and Return (otherwise, no prompt is returned).
The DECevent qualifiers shown and described in Table 18-5 allow you to change the format of the reports that DECevent produces.
Qualifier | Description |
---|---|
/BEFORE | Specifies that only those entries dated earlier than the stated date and time are to be selected for the event report |
/BINARY | Controls whether the binary error log records are converted to ASCII text or copied to the specified output file |
/BRIEF | Generates a brief report |
/CONTINUOUS | Specifies events are formatted in real time, as they are logged by the operating system event logger |
/DUMP | Specifies the output to be a brief report followed by a dump of information from the input event log file |
/ENTRY | Generates a report that includes the specified entry range or starts at the specified entry number |
/EXCLUDE | Excludes events generated by the specified device class, device name, or error log entry type from the report |
/FSTERR | Generates a quick, one-line-per-entry report for an event log entry for disks |
/FULL | Generates a full report (default), which provides all available information for an event log entry |
/INCLUDE | Includes events generated by the specified device class, device name, or error log entry type in the report |
/INTERACTIVE | Allows users to exit from the command line interface and enter the DECevent interactive command shell |
/LOG | Controls whether informational messages that specify the number of entries selected and rejected for each input file are sent to SYS$OUTPUT |
/NODE | Generates a report consisting of event entries for specific nodes in a cluster |
/OUTPUT | Specifies the output file for the report |
/REJECTED | Allows you to specify the name of a file that will contain binary records for rejected entries |
/SINCE | Specifies that only those entries dated later than the stated date and time are to be selected for the report |
/SUMMARY | Generates an event report that consists of a statistical summary |
/TERSE | Generates an event report consisting of binary event information, register values and ASCII messages in a condensed format |
/TRANSLATE | Is the default qualifier for the DIAGNOSE command verb |
Do not use the /BINARY qualifier with any report type qualifier (/FULL, /BRIEF, /TERSE, /SUMMARY, and /FSTERR) or with the /OUTPUT qualifier.
In addition to the qualifiers listed in Table 18-5, DECevent contains a set of DIRECTORY commands and a set of SHOW commands:
This section contains examples of DECevent commands and reports.
To produce a full report, use the /FULL qualifier. The full report format provides a translation of all available information for each entry in the event log. The full report is the default report type if a report type is not specified in the command line.
Both of the following commands will produce a full report format:
$ DIAGNOSE/TRANSLATE/FULL $ DIAGNOSE
(/TRANSLATE and /FULL are defaults.)
Example 18-1 shows the format of a full report.
Example 18-1 Full Report Format
******************************** ENTRY 1 ******************************** Logging OS 1. OpenVMS System Architecture 2. Alpha OS version V7.1 Event sequence number 1583. Timestamp of occurrence 18-APR-1996 09:21:18 System uptime in seconds 58004. Error mask x00000000 Flags x0001 Dynamic Device Recognition present Host name COGENT Alpha HW model DEC 3000 Model 400 System type register x00000004 DEC 3000 Unique CPU ID x00000002 mpnum x000000FF mperr x000000FF Event validity -1. Unknown validity code Event severity -1. Unknown severity code Entry type 100. Major Event class 3. IO Subsystem IO Minor Class 1. MSCP IO Minor Sub Class 5. Logged Message ---- Device Profile ---- Vendor Product Name RAID 0 - Host Based Unit Name COGENT$DPA Unit Number 10. Device Class x0001 Disk ---- IO SW Profile ---- VMS DC$_CLASS 1. VMS DT$_TYPE 175. ---- MSCP Logged Msg ---- Logged Message Type Code 22. RAID Message RAID Event Type 8. Remove Member Distinguished Member 0. Member Index 1. RAID Urgency 4. Global Disk Error RAID Status x00180009 Bit 00 - Reduced Bit 03 - Striped Bit 19 - FE Dis FE Bit 20 - BC Buff Copy Off RAIDset Name KGB ****************************************************************************
To produce a brief report, use the /BRIEF qualifier. The brief report format provides translation of key information for each entry in the event log. For example:
$ DIAGNOSE/TRANSLATE/BRIEF
6017P057.HTM OSSG Documentation 22-NOV-1996 14:22:40.55
Copyright © Digital Equipment Corporation 1996. All Rights Reserved.