Guidelines for OpenVMS Cluster Configurations

You can run the CLUSTER_CONFIG.COM procedure to set up an additional node in a SCSI cluster, as shown in Example A-2.

Example A-2 Adding a Node to a SCSI Cluster

$ @SYS$MANAGER:CLUSTER_CONFIG 
 
           Cluster Configuration Procedure 
 
 
    Use CLUSTER_CONFIG.COM to set up or change an OpenVMS Cluster configuration. 
    To ensure that you have the required privileges, invoke this procedure 
    from the system manager's account. 
 
    Enter ? for help at any prompt. 
 
            1. ADD a node to a cluster. 
            2. REMOVE a node from the cluster. 
            3. CHANGE a cluster member's characteristics. 
            4. CREATE a duplicate system disk for CLU21. 
            5. EXIT from this procedure. 
 
    Enter choice [1]: 
 
    The ADD function adds a new node to a cluster. 
 
    If the node being added is a voting member, EXPECTED_VOTES in 
    every cluster member's MODPARAMS.DAT must be adjusted, and the 
    cluster must be rebooted. 
 
 
    WARNING - If this cluster is running with multiple system disks and 
              if common system files will be used, please, do not 
              proceed unless you have defined appropriate logical 
              names for cluster common files in SYLOGICALS.COM. 
              For instructions, refer to the OpenVMS Cluster Systems 
              manual. 
 
 
              Do you want to continue [N]? y 
 
    If the new node is a satellite, the network databases on CLU21 are 
    updated. The network databases on all other cluster members must be 
    updated. 
 
    For instructions, refer to the OpenVMS Cluster Systems manual. 
 
What is the node's DECnet node name? SATURN 
What is the node's DECnet node address? 7.77 
Is SATURN to be a clustered node with a shared SCSI bus (Y/N)? y 
Will SATURN be a satellite [Y]? N 
Will SATURN be a boot server [Y]? 
 
    This procedure will now ask you for the device name of SATURN's system root. 
    The default device name (DISK$BIG_X5T5:) is the logical volume name of 
    SYS$SYSDEVICE:. 
 
What is the device name for SATURN's system root [DISK$BIG_X5T5:]? 
What is the name of SATURN's system root [SYS10]? SYS2 
    Creating directory tree SYS2 ... 
    System root SYS2 created 
 
    NOTE: 
        All nodes on the same SCSI bus must be members of the same cluster 
        and must all have the same non-zero disk allocation class or each 
        will have a different name for the same disk and data corruption 
        will result. 
 
Enter a value for SATURN's ALLOCLASS parameter [7]: 
Does this cluster contain a quorum disk [N]? 
Updating network database... 
Size of pagefile for SATURN [10000 blocks]? 
   .
   .
   .

A.7.2 Error Reports and OPCOM Messages in Multihost SCSI Environments

Certain common operations, such as booting or shutting down a host on a multihost SCSI bus, can cause other hosts on the SCSI bus to experience errors. In addition, certain errors that are unusual in a single-host SCSI configuration may occur more frequently on a multihost SCSI bus.

These errors are transient errors that OpenVMS detects, reports, and recovers from without losing data or affecting applications that are running. This section describes the conditions that generate these errors and the messages that are displayed on the operator console and entered into the error log.

A.7.2.1 SCSI Bus Resets

When a host connected to a SCSI bus first starts, either by being turned on or by rebooting, it does not know the state of the SCSI bus and the devices on it. The ANSI SCSI--2 standard provides a method called BUS RESET to force the bus and its devices into a known state. A host typically asserts a RESET signal one or more times on each of its SCSI buses when it first starts up and when it shuts down. While this is a normal action on the part of the host asserting RESET, other hosts consider this RESET signal an error because RESET requires that the hosts abort and restart all I/O operations that are in progress.

A host may also reset the bus in the midst of normal operation if it detects a problem that it cannot correct in any other way. These kinds of resets are uncommon, but they occur most frequently when something on the bus is disturbed. For example, an attempt to hot plug a SCSI device while the device is still active (see Section A.7.6) or halting one of the hosts with Ctrl/P can cause a condition that forces one or more hosts to issue a bus reset.

A.7.2.2 SCSI Timeouts

When a host exchanges data with a device on the SCSI bus, there are several different points where the host must wait for the device or the SCSI adapter to react. In an OpenVMS system, the host is allowed to do other work while it is waiting, but a timer is started to make sure that it does not wait too long. If the timer expires without a response from the SCSI device or adapter, this is called a timeout.

There are three kinds of timeouts:

Disconnect timeout---The device accepted a command from the host and disconnected from the bus while it processed the command but never reconnected to the bus to finish the transaction. This error happens most frequently when the bus is very busy. See Section A.7.5 for more information. The disconnect timeout period varies with the device, but for most disks, it is about 20 seconds.
Selection timeout---The host tried to send a command to a device on the SCSI bus, but the device did not respond. This condition might happen if the device did not exist or if it were removed from the bus or powered down. (This failure is not more likely with a multi-initiator system; it is mentioned here for completeness.) The selection timeout period is about 0.25 seconds.
Interrupt timeout---The host expected the adapter to respond for any other reason, but it did not respond. This error is usually an indication of a busy SCSI bus. It is more common if you have initiator unit numbers set low (0 or 1) rather than high (6 or 7). The interrupt timeout period is about 4 seconds.

Timeout errors are not inevitable on SCSI OpenVMS Cluster systems. However, they are more frequent on SCSI buses with heavy traffic and those with two initiators. They do not necessarily indicate a hardware or software problem. If they are logged frequently, you should consider ways to reduce the load on the SCSI bus (for example, adding an additional bus).

A.7.2.3 Mount Verify

Mount verify is a condition declared by a host about a device. The host declares this condition in response to a number of possible transient errors, including bus resets and timeouts. When a device is in the mount verify state, the host suspends normal I/O to it until the host can determine that the correct device is there, and that the device is accessible. Mount verify processing then retries outstanding I/Os in a way that insures that the correct data is written or read. Application programs are unaware that a mount verify condition has occurred as long as the mount verify completes.

If the host cannot access the correct device within a certain amount of time, it declares a mount verify timeout, and application programs are notified that the device is unavailable. Manual intervention is required to restore a device to service after the host has declared a mount verify timeout. A mount verify timeout usually means that the error is not transient. The system manager can choose the timeout period for mount verify; the default is one hour.

A.7.2.4 Shadow Volume Processing

Shadow volume processing is a process similar to mount verify, but it is for shadow set members. An error on one member of a shadow set places the set into the volume processing state, which blocks I/O while OpenVMS attempts to regain access to the member. If access is regained before shadow volume processing times out, then the outstanding I/Os are reissued and the shadow set returns to normal operation. If a timeout occurs, then the failed member is removed from the set. The system manager can select one timeout value for the system disk shadow set, and one for application shadow sets. The default value for both timeouts is 20 seconds.

Note
The SCSI disconnect timeout and the default shadow volume processing timeout are the same. If the SCSI bus is heavily utilized so that disconnect timeouts may occur, it may be desirable to increase the value of the shadow volume processing timeout. (A recommended value is 60 seconds.) This may prevent shadow set members from being expelled when they experience disconnect timeout errors.

A.7.2.5 Expected OPCOM Messages in Multihost SCSI Environments

When a bus reset occurs, an OPCOM message is displayed as each mounted disk enters and exits mount verification or shadow volume processing.

When an I/O to a drive experiences a timeout error, an OPCOM message is displayed as that drive enters and exits mount verification or shadow volume processing.

If a quorum disk on the shared SCSI bus experiences either of these errors, then additional OPCOM messages may appear, indicating that the connection to the quorum disk has been lost and regained.

A.7.2.6 Error Log Basics

In the OpenVMS system, the Error Log utility allows device drivers to save information about unusual conditions that they encounter. In the past, most of these unusual conditions have happened as a result of errors such as hardware failures, software failures, or transient conditions (for example, loose cables).

If you type the DCL command SHOW ERROR, the system displays a summary of the errors that have been logged since the last time the system booted. For example:

$ SHOW ERROR
Device                           Error Count 
SALT$PKB0:                               6 
$1$DKB500:                              10 
PEA0:                                    1 
SALT$PKA0:                               9 
$1$DKA0:                                 0

In this case, 6 errors have been logged against host SALT's SCSI port B (PKB0), 10 have been logged against disk $1$DKB500, and so forth.

To see the details of these errors, you can use the command ANALYZE/ERROR/SINCE=dd-mmm-yyyy:hh:mm:ss at the DCL prompt. The output from this command displays a list of error log entries with information similar to the following:

******************************* ENTRY    2337. ******************************* 
 ERROR SEQUENCE 6.                               LOGGED ON:  CPU_TYPE 00000002 
 DATE/TIME 29-MAY-1995 16:31:19.79                            SYS_TYPE 0000000D 
 
<identification information> 
 
       ERROR TYPE            03 
                                       COMMAND TRANSMISSION FAILURE 
       SCSI ID               01 
                                       SCSI ID = 1. 
       SCSI LUN              00 
                                       SCSI LUN = 0. 
       SCSI SUBLUN           00 
                                       SCSI SUBLUN = 0. 
       PORT STATUS     00000E32 
                                       %SYSTEM-E-RETRY, RETRY OPERATION 
 
<additional information>

For this discussion, the key elements are the ERROR TYPE and, in some instances, the PORT STATUS fields. In this example, the error type is 03, COMMAND TRANSMISSION FAILURE, and the port status is 00000E32, SYSTEM-E-RETRY.

A.7.2.7 Error Log Entries in Multihost SCSI Environments

The error log entries listed in this section are likely to be logged in a multihost SCSI configuration, and you usually do not need to be concerned about them. You should, however, examine any error log entries for messages other than those listed in this section.

ERROR TYPE 0007, BUS RESET DETECTED
Occurs when the other system asserts the SCSI bus reset signal. This happens when:
- A system's power-up self-test runs.
- A console INIT command is executed.
- The EISA Configuration Utility (ECU) is run.
- The console BOOT command is executed (in this case, several resets occur).
- System shutdown completes.
- The system detects a problem with an adapter or a SCSI bus (for example, an interrupt timeout).
This error causes all mounted disks to enter mount verification.
ERROR TYPE 05, EXTENDED SENSE DATA RECEIVED
When a SCSI bus is reset, an initiator must get "sense data" from each device. When the initiator gets this data, an EXTENDED SENSE DATA RECEIVED error is logged. This is expected behavior.
ERROR TYPE 03, COMMAND TRANSMISSION FAILURE
PORT STATUS E32, SYSTEM-E-RETRY
Occasionally, one host may send a command to a disk while the disk is exchanging error information with the other host. Many disks respond with a SCSI "BUSY" code. The OpenVMS system responds to a SCSI BUSY code by logging this error and retrying the operation. You are most likely to see this error when the bus has been reset recently. This error does not always happen near resets, but when it does, the error is expected and unavoidable.
ERROR TYPE 204, TIMEOUT
An interrupt timeout has occurred (see Section A.7.2.2). The disk is put into mount verify when this error occurs.
ERROR TYPE 104, TIMEOUT
A selection timeout has occurred (see Section A.7.2.2). The disk is put into mount verify when this error occurs.

A.7.3 Restrictions and Known Problems

The current release of OpenVMS Cluster software has the following restrictions when multiple hosts are configured on the same SCSI bus:

A node's access to a disk will not failover from a direct SCSI path to an MSCP served path. This is not expected to be a significant limitation, since most of the failures that cause a SCSI disk to become inaccessible to one node on the SCSI bus impacts all the nodes on the SCSI bus. Thus, when a failure occurs, the served path to the disk tends to fail at the same time that the direct path fails.
Conversely, a node's access to a disk will not fail over from an MSCP served path to a direct SCSI path. Normally, this type of failover is not a consideration, because when OpenVMS discovers both a direct and a served path, it chooses the direct path permanently. However, you must avoid situations in which the MSCP served path becomes available first and is selected by OpenVMS before the direct path becomes available. To avoid this situation, observe the following rules:
- A node that has a direct path to a SCSI system disk must boot the disk directly from the SCSI port, not over the LAN.
- If a node is running the MSCP server, then a SCSI disk must not be added to the multihost SCSI bus after a second node boots (either by physically inserting it or by reconfiguring an HSZ40).
  If you add a device after two nodes boot and then configure the device using SYSMAN, the device might become visible to one of the systems through the served path before the direct path is visible. Depending upon the timing of various events, this problem can sometimes be avoided by using the following procedure:
```
$ MCR SYSMAN 
SYSMAN> SET ENVIRONMENT/CLUSTER 
SYSMAN> IO AUTOCONFIGURE 
```
  To ensure that the direct path to a new device is used (including HSZ40 virtual devices), reboot each node after a device is added.
  If there are two paths to a device, the $DEVICE_SCAN system service and the F$DEVICE lexical function list each device on a shared bus twice. Devices on the shared bus are also listed twice in the output from the DCL command SHOW DEVICE if you boot a non-SCSI system disk. These double listings are errors in the display programs. They do not indicate a problem or imply that the MSCP served path is being used instead of the direct SCSI path. These display errors are expected to be corrected in a future release of the operating system.
When a system powers up, boots, or shuts down, it resets the SCSI bus. These resets cause other hosts on the SCSI bus to experience I/O errors. For Files-11 volumes, the Mount Verification facility automatically recovers from these errors and completes the I/O. As a result, the user's process continues to run without error.
This level of error recovery is not possible for volumes that are mounted with the /FOREIGN qualifier. Instead, the user's process receives an I/O error notification if it has I/O outstanding when a bus reset occurs.
If possible, avoid mounting foreign devices on multihost SCSI buses. If foreign devices are mounted on the shared bus, make sure that systems on that bus do not assert a SCSI bus reset while I/O is being done to foreign devices.
When the ARC console is enabled on a multihost SCSI bus, it sets the SCSI target ID for all local host adapters to 7. This setting causes a SCSI ID conflict if there is already a host or device on a bus at ID 7. A conflict of this type typically causes the bus, and possibly all the systems on the bus, to hang.
The ARC console is used to access certain programs, such as the KZPSA configuration utilities. If you must run the ARC console, first disconnect the system from multihost SCSI buses and from buses that have a device at SCSI ID 7.
Any SCSI bus resets that occur when a system powers up, boots, or shuts down cause other systems on the SCSI bus to log errors and display OPCOM messages. This is expected behavior and does not indicate a problem.
Abruptly halting a system on a multihost SCSI bus (for example, by typing Ctrl/P on the console) may leave the SCSI adapter in a state that can interfere with the operation of the other host on the bus. You shoul initialize, boot, or continue an abruptly halted system as soon as possible after it has been halted.
All I/O to a disk drive must be stopped while its microcode is updated. This typically requires more precautions in a multihost environment than are needed in a single-host environment. Refer to Section A.7.6.3 for the necessary procedures.
The EISA Configuration Utility (ECU) causes a large number of SCSI bus resets. These resets cause the other system on the SCSI bus to pause while its I/O subsystem recovers. It is suggested (though not required) that both systems on a shared SCSI bus be shut down when the ECU is run.

The current release of OpenVMS Cluster systems also places one new restriction on the SCSI quorum disk, whether the disk is located on a single-host SCSI bus or a multihost SCSI bus: the SCSI quorum disk must support tagged command queuing (TCQ). This is required because of the special handling that quorum I/O receives in the OpenVMS SCSI drivers.

This restriction is not expected to be significant, because all disks on a multihost SCSI bus must support tagged command queuing (see Section A.7.7), and because quorum disks are normally not used on single-host buses.

A.7.4 Troubleshooting

The following sections describe troubleshooting tips for solving common problems in an OpenVMS Cluster system that uses a SCSI interconnect.

A.7.4.1 Termination Problems

Verify that two terminators are on every SCSI interconnect (one at each end of the interconnect). The BA350 enclosure, the DWZZA, and the KZxxx adapters have internal terminators that are not visible externally (see Section A.4.4.)

A.7.4.2 Booting or Mounting Failures Caused by Incorrect Configurations

OpenVMS automatically detects configuration errors described in this section and prevents the possibility of data loss that could result from such configuration errors, either by bugchecking or by refusing to mount a disk.

A.7.4.2.1 Bugchecks During the Bootstrap Process

There are three types of configuration error that can cause a bugcheck during booting. The bugcheck code is: VAXCLUSTER, Error detected by OpenVMS Cluster software. These are described in this section.

When OpenVMS boots, it determines which devices are present on the SCSI bus by sending an inquiry command to every SCSI ID. When a device receives the inquiry, it indicates its presence by returning data that indicates whether it is a disk, tape, or processor.

Some processor devices (host adapters) answer the inquiry without assistance from the operating system; others require that the operating system be running. The adapters supported in OpenVMS Cluster systems require the operating system to be running. These adapters, with the aid of OpenVMS, pass information in their response to the inquiry that allows the recipient to detect the following configuration errors:

Different controller device names on the same SCSI bus
Unless a port allocation class is being used, the OpenVMS device name of each adapter on the SCSI bus must be identical (for example, all named PKC0). Otherwise, the OpenVMS Cluster software cannot coordinate the host's accesses to storage (see Section A.6.2 and Section A.6.3).
OpenVMS can check this automatically because it sends the controller letter in the inquiry response. A booting system receives this response, and it compares the remote controller letter with the local controller letter. If a mismatch is detected, then an OPCOM message is printed, and the system stops with an VAXCLUSTER bugcheck to prevent the possibility of data loss. See the description of the NOMATCH error in the Help Message utility. (To use the Help Message utility for NOMATCH, enter HELP/MESSAGE NOMATCH at the DCL prompt.)
Different or zero allocation class values.
Each host on the SCSI bus must have the same nonzero disk allocation class value, or matching port allocation class values. Otherwise, the OpenVMS Cluster software cannot coordinate the host's accesses to storage (see Section A.6.2 and Section A.6.3).
OpenVMS is able to automatically check this, because it sends the needed information in the inquiry response. A booting system receives this response, and compares the remote value with the local value. If a mismatch or a zero value is detected, then an OPCOM message is printed, and the system stops with a VAXCLUSTER bugcheck to prevent the possibility of data loss. See the description of the ALLODIFF and ALLOZERO errors in the Help Message utility.
Unsupported processors
There may be processors on the SCSI bus that are not running OpenVMS or that do not return the controller name or allocation class information needed to validate the configuration. If a booting system receives an inquiry response and the response does not contain the special OpenVMS configuration information, then an OPCOM message is printed and an VAXCLUSTER bugcheck occurs. See the description of the CPUNOTSUP error in either the Help Message utility or in the OpenVMS Version 6.2 New Features Manual.
(If your system requires the presence of an OpenVMS Cluster processor device on a SCSI bus, then refer to the CPUNOTSUP message description in either the Help Message utility or in the OpenVMS Version 6.2 New Features Manual for instructions on the use of a special SYSGEN parameter for this case.)

A.7.4.2.2 Mount Failures

There are two types of configuration error that can cause a disk to fail to mount.

First, when a system boots from a disk on the shared SCSI bus, it may fail to mount the system disk. This happens if there is another system on the SCSI bus that is already booted, and the other system is using a different device name for the system disk. (Two systems will disagree about the name of a device on the shared bus if their controller names or allocation classes are misconfigured, as described in the previous section.) If the system does not first execute one of the bugchecks described in the previous section, then the following error message is displayed on the console:

%SYSINIT-E- error when mounting system device, retrying..., status = 007280B4

The decoded representation of this status is:

VOLALRMNT,  another volume of same label already mounted

This error indicates that the system disk is already mounted in what appears to be another drive in the OpenVMS Cluster system, so it is not mounted again. To solve this problem, check the controller letters and allocation class values for each node on the shared SCSI bus.

Second, SCSI disks on a shared SCSI bus will fail to mount on both systems unless the disk supports tagged command queuing (TCQ). This is because TCQ provides a command-ordering guarantee that is required during OpenVMS Cluster state transitions.

OpenVMS determines that another processor is present on the SCSI bus during autoconfiguration, using the mechanism described in Section A.7.4.2.1. The existence of another host on a SCSI bus is recorded and preserved until the system reboots.

This information is used whenever an attempt is made to mount a non-TCQ device. If the device is on a multihost bus, the mount attempt fails and returns the following message:

%MOUNT-F-DRVERR, fatal drive error.

If the drive is intended to be mounted by multiple hosts on the same SCSI bus, then it must be replaced with one that supports TCQ.

Note that the first processor to boot on a multihost SCSI bus does not receive an inquiry response from the other hosts because the other hosts are not yet running OpenVMS. Thus, the first system to boot is unaware that the bus has multiple hosts, and it allows non-TCQ drives to be mounted. The other hosts on the SCSI bus detect the first host, however, and they are prevented from mounting the device. If two processors boot simultaneously, it is possible that they will detect each other, in which case neither is allowed to mount non-TCQ drives on the shared bus.

A.7.4.3 Grounding

Having excessive ground offset voltages or exceeding the maximum SCSI interconnect length can cause system failures or degradation in performance. See Section A.7.8 for more information about SCSI grounding requirements.

A.7.4.4 Interconnect Lengths

Adequate signal integrity depends on strict adherence to SCSI bus lengths. Failure to follow the bus length recommendations can result in problems (for example, intermittent errors) that are difficult to diagnose. See Section A.4.3 for information on SCSI bus lengths.

A.7.5 SCSI Arbitration Considerations

Only one initiator (typically, a host system) or target (typically, a peripheral device) can control the SCSI bus at any one time. In a computing environment where multiple targets frequently contend for access to the SCSI bus, you could experience throughput issues for some of these targets. This section discusses control of the SCSI bus, how that control can affect your computing environment, and what you can do to achieve the most desirable results.

Control of the SCSI bus changes continually. When an initiator gives a command (such as READ) to a SCSI target, the target typically disconnects from the SCSI bus while it acts on the command, allowing other targets or initiators to use the bus. When the target is ready to respond to the command, it must regain control of the SCSI bus. Similarly, when an initiator wishes to send a command to a target, it must gain control of the SCSI bus.

If multiple targets and initiators want control of the bus simultaneously, bus ownership is determined by a process called arbitration, defined by the SCSI standard. The default arbitration rule is simple: control of the bus is given to the requesting initiator or target that has the highest unit number.

The following sections discuss some of the implications of arbitration and how you can respond to arbitration situations that affect your environment.

A.7.5.1 Arbitration Issues in Multiple-Disk Environments

When the bus is not very busy, and bus contention is uncommon, the simple arbitration scheme is adequate to perform I/O requests for all devices on the system. However, as initiators make more and more frequent I/O requests, contention for the bus becomes more and more common. Consequently, targets with lower ID numbers begin to perform poorly, because they are frequently blocked from completing their I/O requests by other users of the bus (in particular, targets with the highest ID numbers). If the bus is sufficiently busy, low-numbered targets may never complete their requests. This situation is most likely to occur on systems with more than one initiator because more commands can be outstanding at the same time.

  6318P011.HTM
  OSSG Documentation
  26-NOV-1996 11:20:29.32

Legal