| [Home] | [Comments] | [Ordering info] | [Help]

[HR]

  6017P037.HTM
  OSSG Documentation
  22-NOV-1996 14:22:10.21

Copyright © Digital Equipment Corporation 1996. All Rights Reserved.

Legal

OpenVMS System Manager's Manual
[Digital logo]
[HR]

OpenVMS System Manager's Manual


Previous | Contents

OpenVMS Cluster transitions do not change the state of the queue manager. Newly available nodes do not attempt to start the queue manager (unless you enter START/QUEUE/MANAGER).

Use the /CLUSTER qualifier to stop a clusterwide queue manager. If you enter the obsolete command STOP/QUEUE/MANAGER (without the /CLUSTER qualifier), the command performs the same function as STOP/QUEUES/ON_NODE. (Use STOP/QUEUES/ON_NODE to stop all queues on a single node without stopping the queue manager.)

12.7.2 Restarting the Queue Manager

The queue manager restarts automatically whenever you reboot the system. However, you might need to enter START/QUEUE/MANAGER for one of the following reasons:

How to Perform This Task

To restart the queue manager, use a command in the following format:

START/QUEUE/MANAGER[/ON=(node,...)] [dirspec] 

Specify the /ON=(node,...) qualifier and dirspec parameter only if you want to change the value you are currently using for the qualifier or parameter. The command you enter to start the queue manager is stored in the queue database, with any qualifier or parameter you specify. If you do not specify a qualifier or parameter, the queue manager is started using the node list and location (if any) stored in the queue database.

If the queue manager does not start, see Section 12.11.1 for a troubleshooting checklist.

12.8 Using Multiple Queue Managers

You can use multiple queue managers to distribute the batch and print work load among nodes and disk volumes. You need to understand what multiple queue managers are and how to create additional queue managers.

12.8.1 Understanding Multiple Queue Managers

Explanations of items related to the operation of multiple queue managers follow.

Restrictions on Using Multiple Queue Managers

Multiple queue managers have the following restrictions:

Names of Multiple Queue Managers

The process name for a queue manager is the first twelve characters of the queue manager name. The default queue manager name is SYS$QUEUE_MANAGER; the default queue manager process name is QUEUE_MANAGE. If you create an additional queue manager named PRINT_MANAGER, the process name is PRINT_MANAGE.

Know the process names of all your queue managers so that you can troubleshoot queue manager problems, as explained in Section 12.11.

Multiple Queue Managers' Use of Queue Database Files

Multiple queue managers share a single master file. However, a queue database with multiple queue managers contains a queue file and a journal file for each queue manager, as explained in Section 12.2.

Commands for Managing Multiple Queue Managers

By default, the following commands affect the default queue manager SYS$QUEUE_MANAGER or the queues running on the default queue manager:

The /NAME_OF_MANAGER qualifier allows you to specify a different queue manager for these commands.

12.8.2 Creating Additional Queue Managers

To create one or more additional queue managers, follow these steps:
  1. Follow steps 1 and 2 in Section 12.5.
  2. To create an additional queue manager, enter a command in the following format:
    START/QUEUE/MANAGER/ADD/NAME_OF_MANAGER=name[/ON=(node,...)] [dirspec] 
    

    where:
    /ADD Creates an additional queue manager in the existing master file and creates new queue and journal files
    /NAME_OF_MANAGER= name Creates a non-default queue manager with a name up to 31 characters long. You can create a maximum of five queue managers.
    /ON= (node,...) Allows you to customize failover of the queue manager. For more information, see Section 12.6.
    dirspec Specifies the location of the queue and journal files, as explained in Section 12.3.2. Use this parameter if you are creating the queue and journal files in a location other than the default.

Caution

Do not specify the /NEW_VERSION qualifier when you create an additional queue manager: multiple queue managers share a single master file. An additional queue file and journal file are created automatically for each additional queue manager.

Example

The command in the following example creates and starts a new queue manager named BATCH_MANAGER.

$ START/QUEUE/MANAGER/ADD/NAME_OF_MANAGER=BATCH_MANAGER/ON=(A,B,*) DUA2:[QUEUES]

12.8.2.1 Creating and Moving Queues with Multiple Queue Managers

When you create a queue with the INITIALIZE/QUEUE command, specify the name of the queue manager on which it is to run by including the /NAME_OF_MANAGER qualifier. If you do not specify the /NAME_OF_MANAGER qualifier, the queue is created to run on the default queue manager, SYS$QUEUE_MANAGER.

To move an existing queue from its original queue manager to a different queue manager, delete the queue with the DELETE/QUEUE command and re-create the queue with the INITIALIZE/QUEUE command.

12.8.2.2 Maintaining Queue Managers

When entering DCL commands to maintain the queue manager, be sure to specify the /NAME_OF_MANAGER qualifier to specify the queue manager to which the command is to apply. If you do not specify the /NAME_OF_MANAGER qualifier, the command is executed on the default queue manager, SYS$QUEUE_MANAGER.

Example

In the following example:

$ START/QUEUE/MANAGER/NEW_VERSION/NAME_OF_MANAGER=PRINT_MANAGER -
_$ /ON=(JADE,RUBY,*)
$ START/QUEUE/MANAGER/ADD/NAME_OF_MANAGER=BATCH_MANAGER -
_$ /ON=(OPAL,PEARL,*)
$ SHOW QUEUE/MANAGERS/FULL                               
Master file:  SYS$COMMON:[SYSEXE]QMAN$MASTER.DAT; 
 
Queue manager PRINT_MANAGER, running, on JADE:: 
  /ON=(JADE,RUBY,*) 
  Database location:  SYS$COMMON:[SYSEXE] 
 
Queue manager BATCH_MANAGER, running, on OPAL:: 
  /ON=(OPAL,PEARL,*) 
  Database location:  SYS$COMMON:[SYSEXE] 

12.9 Saving and Restoring the Queue Database

Each time you want to preserve changes to your queue configuration, save a copy of your queue database files. In this way, if your queue database files are not accessible, you can restore the queue database you have saved; you thus avoid having to redefine forms and characteristics and reinitialize each queue.

12.9.1 Saving Queue Database Files

To save a record-by-record copy of your queue database files while the queuing system is functioning, perform the following steps. This procedure saves definitions of queues, forms, and characteristics. No job information is preserved. (Digital recommends not saving the journal file because timed and pending jobs might be reexecuted after the journal file is restored.)

How to Perform This Task

  1. To save the master file, enter an OpenVMS Convert utility (CONVERT) command in the following format:
    CONVERT/SHARE QMAN$MASTER.DAT master-filename 
    

    where master-filename is the name of the file to which QMAN$MASTER.DAT is to be copied.
    For more information about CONVERT, see the OpenVMS Record Management Utilities Reference Manual.
  2. Enter a CONVERT command in the following format to save the queue file:
    CONVERT/SHARE SYS$QUEUE_MANAGER.QMAN$QUEUES queue-filename 
    

    where queue-filename is the name of the file to which SYS$QUEUE_MANAGER.QMAN$QUEUES is to be copied.
  3. Use the Backup utility (BACKUP) to save the files created with CONVERT. Use a command in the following format:
    BACKUP/LOG masterfile-name, queue-filename device:saveset-name/LABEL=label 
    

    For more information about the Backup utility, see the OpenVMS System Management Utilities Reference Manual.

Example

The following example is a simple procedure showing how to save the queue database.

$ SET DEFAULT SYS$COMMON:[SYSEXE]
$ CONVERT/SHARE QMAN$MASTER.DAT MASTERFILE_9SEP.KEEP;
$ CONVERT/SHARE SYS$QUEUE_MANAGER.QMAN$QUEUES QFILE_9SEP.KEEP;
$ INITIALIZE MUA0: QDB
$ MOUNT/FOREIGN MUA0:
%MOUNT-I-MOUNTED, QDB mounted on _LILITH$MUA0:
$ BACKUP/LOG MASTERFILE_9SEP.KEEP,QFILE_9SEP.KEEP MUA0:QDB_9SEP.SAV/LABEL=QDB
%BACKUP-S-COPIED, copied SYS$COMMON:[SYSEXE]MASTERFILE_9SEP.KEEP;
%BACKUP-S-COPIED, copied SYS$COMMON:[SYSEXE]QFILE_9SEP.KEEP;
$ DISMOUNT MUA0:

12.9.2 Restoring Queue Database Files

When you restore queue database files, all queue, form, characteristic, and queue manager information is restored. However, information about jobs in the queues is not restored.

How to Perform This Task

  1. If the queue manager is running, stop it by entering the STOP/QUEUE/MANAGER/CLUSTER command.
  2. Delete all three queue database files. (You must delete all three files, even if only one or two of them are lost.)
  3. Use the MOUNT command to mount the disk or tape containing the queue database backup.
  4. Use the Backup utility (BACKUP) to restore the queue file and master file from the save set you created in step 3 of Section 12.9. If the master file or queue file is stored in a location other than the default, make sure you restore it to the correct location or that you specify the new location when you start the queue manager.

    Note

    When you restore your queue database, you must always restore both the master and queue files, even if you lost only one of those files.

  5. Start the queue manager with the START/QUEUE/MANAGER command. Do not enter the /NEW_VERSION qualifier: a new, empty journal file will be created automatically.

Example

The following example is a simple procedure showing how to restore the queue database from tape.

$ STOP/QUEUE/MANAGER/CLUSTER
$ SET DEFAULT SYS$COMMON:[SYSEXE]
$ DELETE SYS$QUEUE_MANAGER.QMAN$JOURNAL;,SYS$QUEUE_MANAGER.QMAN$QUEUES;, -
_$ QMAN$MASTER.DAT;
$ MOUNT/FOREIGN MUA0:
%MOUNT-I-MOUNTED, QDB mounted on _LILITH$MUA0:
$ BACKUP/LOG MUA0:QDB_9SEP.SAV/SELECT=[SYSEXE]MASTERFILE_9SEP.KEEP; -
_$ QMAN$MASTER.DAT;
%BACKUP-S-CREATED, created SYS$COMMON:[SYSEXE]QMAN$MASTER.DAT;1
$ SET MAGTAPE/REWIND MUA0:
$ BACKUP/LOG MUA0:QDB_9SEP.SAV/SELECT=[SYSEXE]QFILE_9SEP.KEEP; -
_$ SYS$QUEUE_MANAGER.QMAN$QUEUES
%BACKUP-S-CREATED, created SYS$COMMON:[SYSEXE]SYS$QUEUE_MANAGER.QMAN$QUEUES;1
$ DISMOUNT MUA0:
$ START/QUEUE/MANAGER

12.10 Maximizing Queuing System Performance

The following resources have the most effect on queuing system performance:

Use the following methods to maximize your queuing system's performance:

12.11 Solving Queue Manager Problems

Use the following sections to help solve queue manager problems:
Topic For More Information
Avoiding common problems: a troubleshooting checklist Section 12.11.1
If the queue manager does not start Section 12.11.2
If the queuing system stops or the queue manager does not run on specific nodes Section 12.11.3
If the queue manager becomes unavailable Section 12.11.4
If the queuing system does not work on a specific OpenVMS Cluster node Section 12.11.5
If you see inconsistent queuing behavior on different OpenVMS Cluster nodes Section 12.11.6
Reporting a queuing system problem to Digital support representatives Section 12.12

12.11.1 Avoiding Common Problems: A Troubleshooting Checklist

To avoid the most common queuing system problems, make sure you have met the following requirements:
Requirement For More Information
QMAN$MASTER is identically defined on all nodes in the cluster. Section 12.3
The queue database is in the specified location. Section 12.3
The queue database disk is mounted and available. Section 12.3
The node list specified with the /ON qualifier contains a sufficient number of nodes. If you specify a node list, Digital recommends you include an asterisk (*) at the end of the node list. Section 12.11.4
The system address parameters SCSNODE and SCSSYSTEMID match the DECnet for OpenVMS node name and node ID. Section 12.11.5

12.11.2 If the Queue Manager Does Not Start

If the queue manager does not start when you enter the START/QUEUE/MANAGER command, the system displays the following message:

%JBC-E-QMANNOTSTARTED, queue manager could not be started 

12.11.2.1 Investigating the Problem

Search the operator log file SYS$MANAGER:OPERATOR.LOG (or look on the operator console) for messages from the queue manager and job controller for information about the problem, as follows:

$ SEARCH SYS$MANAGER:OPERATOR.LOG/WINDOW=5 QUEUE_MANAGE, 
JOB_CONTROL,BATCH_MANAGE

Use the information provided with these messages to further investigate the problem, making sure you have met the requirements listed in Section 12.11.1.

12.11.2.2 Cause

The cause of the problem is the system's inability to find the queue master file. Often the logical is not defined correctly, or the disk is not available. For example, the following message indicates that the master queue file does not exist in the expected location:

%%%%%%%%%%%  OPCOM  13-MAR-1996 15:53:52.84  %%%%%%%%%%% 
Message from user JOB_CONTROL on ABDCEF 
%JBC-E-OPENERR, error opening SYS$COMMON:[SYSEXE]QMAN$MASTER.DAT 
 
%%%%%%%%%%%  OPCOM  13-MAR-1996 15:53:53.04  %%%%%%%%%%% 
Message from user JOB_CONTROL on ABDCEF 
-SYSTEM-W-NOSUCHFILE, no such file        

12.11.2.3 Correcting the Problem

On systems with multiple queue managers, search for messages displayed by additional queue managers by including their process names in the search string. To display information about queue managers running on your system, use the SHOW QUEUE/MANAGERS command as explained in Section 12.4. Correct any problem indicated in the displayed information.

Example

 
$ START/QUEUE/MANAGER DUA55:[SYSQUE] (1)  
%JBC-E-QMANNOTSTARTED, queue manager could not be started (2)
$ SEARCH SYS$MANAGER:OPERATOR.LOG /WINDOW=5 QUEUE_MANAGE,JOB_CONTROL (3)
%%%%%%%%%%%  OPCOM  14-APR-1996 18:55:18.23  %%%%%%%%%%% 
Message from user QUEUE_MANAGE on CATNIP 
%QMAN-E-OPENERR, error opening DUA55:[SYSQUE]SYS$QUEUE_MANAGER.QMAN$QUEUES; 
 
%%%%%%%%%%%  OPCOM  14-APR-1996 18:55:18.29  %%%%%%%%%%% 
Message from user QUEUE_MANAGE on CATNIP 
-RMS-F-DEV, error in device name or inappropriate device type for operation 
 
%%%%%%%%%%%  OPCOM  14-APR-1996 18:55:18.31  %%%%%%%%%%% 
Message from user QUEUE_MANAGE on CATNIP 
-SYSTEM-W-NOSUCHDEV, no such device available (4)
$ START/QUEUE/MANAGER DUA5:[SYSQUE] (5)
 
  1. This command attempts to start the queue manager, specifying DUA55:[SYSQUE] as the location of the queue and journal files.
  2. The error message indicates that the queue manager did not start.
  3. This command searches the operator log file for relevant messages. The SEARCH command does not include a second queue manager name, such as BATCH_MANAGE.
  4. This message indicates that the queue file could not be opened because device DUA55: does not exist.
  5. This command, which correctly specifies DUA5:[SYSQUE] as the location for the queue and journal files, successfully starts the queue manager.

For more information about multiple queue managers and their process names, see Section 12.8.1.

12.11.3 If the Queuing System Stops or the Queue Manager Does Not Run on Specific Nodes

Use this section if the queue manager does not run on a specific node in the cluster, or if the queuing system stops, especially after one of the following:

12.11.3.1 Investigating the Problem

Check the operator log that was current at the time the queue manager started up or failed over. Search the log for operator messages from the queue manager.

On systems with multiple queue managers, also search for messages displayed by additional queue managers by including their process names in the search string. To display information about queue managers running on your system, use the SHOW QUEUE/MANAGERS command, as explained in Section 12.4.

For more information about multiple queue managers and their process names, see Section 12.8.1.

The following messages indicate that the queue database is not in the specified location:

%%%%%%%%%%%  OPCOM   4-FEB-1996 15:06:25.21  %%%%%%%%%%% 
Message from user QUEUE_MANAGE on MANGLR 
%QMAN-E-OPENERR, error opening CLU$COMMON:[SYSEXE]SYS$QUEUE_MANAGER.QMAN$QUEUES; 
 
%%%%%%%%%%%  OPCOM   4-FEB-1996 15:06:27.29  %%%%%%%%%%% 
Message from user QUEUE_MANAGE on MANGLR 
-RMS-E-FNF, file not found 
 
%%%%%%%%%%%  OPCOM   4-FEB-1996 15:06:27.45  %%%%%%%%%%% 
Message from user QUEUE_MANAGE on MANGLR 
-SYSTEM-W-NOSUCHFILE, no such file 
 

The following messages indicate that the queue database disk is not mounted:

%%%%%%%%%%%  OPCOM   4-FEB-1996 15:36:49.15  %%%%%%%%%%% 
Message from user QUEUE_MANAGE on MANGLR 
%QMAN-E-OPENERR, error opening DISK888:[QUEUE_DATABASE]SYS$QUEUE_MANAGER.QMAN$QUEUES; 
 
%%%%%%%%%%%  OPCOM   4-FEB-1996 15:36:51.69  %%%%%%%%%%% 
Message from user QUEUE_MANAGE on MANGLR 
-RMS-F-DEV, error in device name or inappropriate device type for operation 
 
%%%%%%%%%%%  OPCOM   4-FEB-1996 15:36:52.20  %%%%%%%%%%% 
Message from user QUEUE_MANAGE on MANGLR 
-SYSTEM-W-NOSUCHDEV, no such device available 

12.11.3.2 Cause

The queuing system does not work correctly under the following circumstances:

In general, the queuing system will be shut off completely if the queue manager encounters a serious error and forces a crash or failover twice in two minutes consecutively on the same node. Therefore, the queuing system may have stopped, or it may continue to run if the queue manager moves to yet another node on which it can access the database after the original failed startup.

12.11.3.3 Correcting the Problem

Perform the following steps:

  1. If the queue manager is stopped, enter START/QUEUE/MANAGER and include the following:
  2. On all nodes specified in the node list (except on any nodes that boot from the disk where the queue database files are stored), add a MOUNT command to the SYLOGICALS.COM procedure to mount the disk that holds the master file. You do not need to explicitly mount the disk on a node where it is the system disk.

    12.11.4 If the Queue Manager Becomes Unavailable

    The queue manager becomes unavailable if it does not start or has stopped running.

    12.11.4.1 Investigating the Problem

    To investigate the problem, enter SHOW CLUSTER to see if the nodes on the list are available.

    12.11.4.2 Cause

    An insufficient failover node list might have been specified for the queue manager, so that none of the nodes in the failover list is available to run the queue manager.

    12.11.4.3 Correcting the Problem

    Make sure the queue manager list contains a sufficient number of nodes by entering START/QUEUE/MANAGER with the /ON qualifier to specify a node list appropriate for your configuration.

    If you are in doubt about what nodes to specify, Digital recommends that you specify an asterisk (*) wildcard character as the last node in the list; the asterisk indicates that any remaining node in the cluster can run the queue manager. Specifying the asterisk prevents your queue manager from becoming unavailable because of an insufficient node list.

    12.11.5 If the Queuing System Does Not Work on a Specific OpenVMS Cluster Node

    Use this section if the queuing system does not work on a specific node when it starts up.

    12.11.5.1 Investigating the Problem

    Perform the following steps:

    1. Search the operator log that was current when the problem existed for the following messages. These messages are broadcast every 30 seconds after the affected node boots.
      %%%%%%%%%%%  OPCOM   4-FEB-1996 15:36:49.15  %%%%%%%%%%% 
      Message from user QUEUE_MANAGE on ZNFNDL 
      %QMAN-E-COMMERROR, unexpected error #5 in communicating with node CSID 000000 
       
      %%%%%%%%%%%  OPCOM   4-FEB-1996 15:36:49.15  %%%%%%%%%%% 
      Message from user QUEUE_MANAGE on ZNFNDL 
      -SYSTEM-F-WRONGACP, wrong ACP for device_ 
      
    2. Compare the node's value for the system address parameters SCSNODE and SCSSYSTEMID with the values for the DECnet node name and node ID, as follows:
      $ RUN SYS$SYSTEM:SYSMAN
      SYSMAN> PARAMETERS SHOW SCSSYSTEMID
      Parameter Name            Current    Default     Min.     Max.     Unit  Dynamic 
      --------------            -------    -------    -------  -------   ----  ------- 
      SCSSYSTEMID                 19941          0        -1        -1 Pure-numbe 
      SYSMAN> PARAMETERS SHOW SCSNODE
      Parameter Name            Current    Default     Min.     Max.     Unit  Dynamic 
      --------------            -------    -------    -------  -------   ----  ------- 
      SCSNODE                 "RANDY  "    "    "    "    "    "ZZZZ" Ascii 
      SYSMAN> EXIT
      $ RUN SYS$SYSTEM:NCP
      NCP> SHOW EXECUTOR SUMMARY
       
       
      Node Volatile Summary as of  5-FEB-1996 15:50:36 
       
      Executor node = 19.45 (DREAMR) 
       
      State                    = on 
      Identification           = DECnet for OpenVMS V7.1 
       
                             
      NCP> EXIT
      $ WRITE SYS$OUTPUT 19*1024+45
      19501
      

    12.11.5.2 Cause

    If the DECnet node name and node ID do not match the SCSNODE and SCSSYSTEMID system address parameters, IPC (interprocess communication, an operating system internal mechanism) cannot work properly and the affected node will not be able to participate in the queuing system.

    12.11.5.3 Correcting the Problem

    Perform the following steps:

    1. Modify the system address parameters SCSNODE and SCSSYSTEMID or modify the DECnet node name and node ID, so the values match.
      For more information on these system parameters, see the OpenVMS System Management Utilities Reference Manual. For more information on the DECnet node name and node ID, see the DECnet for OpenVMS Guide to Networking.
    2. Reboot the system.

    12.11.6 If You See Inconsistent Queuing Behavior on Different OpenVMS Cluster Nodes

    Use this section if you see the following symptoms:

    12.11.6.1 Investigating the Problem

    Perform the following steps:

    1. Enter SHOW LOGICAL to translate the QMAN$MASTER logical name within the environment of each node in the cluster. If there is no translation on any given node, then translate the default value of SYS$COMMON:[SYSEXE].
      If the SHOW LOGICAL translations show a different physical disk name on one or more nodes, you have identified the problem.
    2. Check the operator log files that were current at the time that one of the affected nodes booted. Search for an OPCOM message similar to the following from the process JOB_CONTROL:
      %%%%%%%%%%%  OPCOM   4-FEB-1996 14:41:20.88  %%%%%%%%%%% 
      Message from user JOB_CONTROL on MANGLR 
      %JBC-E-OPENERR, error opening BOGUS:[QUEUE_DIR]QMAN$MASTER.DAT; 
       
      %%%%%%%%%%%  OPCOM   4-FEB-1996 14:41:21.12  %%%%%%%%%%% 
      Message from user JOB_CONTROL on MANGLR 
      -RMS-E-FNF, file not found 
      

    12.11.6.2 Cause

    This problem may be caused by different definitions for the logical name QMAN$MASTER on different nodes in the cluster, causing multiple queuing environments. You typically find this problem in OpenVMS Cluster environments when you have just added a system disk or moved the queuing database.

    12.11.6.3 Correcting the Problem

    Perform the following steps:

    1. If only one queue manager and queue database exist, skip to step 2.
      If more than one queue manager and queue database exist, perform the following steps:
      1. Enter a command in the following format on one of the nodes where the QMAN$MASTER logical name is incorrectly defined:
        STOP/QUEUE/MANAGER/CLUSTER/NAME_OF_MANAGER=name 
        

        where /NAME_OF_MANAGER specifies the name of the queue manager to be stopped.
      2. Delete all three files for the invalid queue database. (On systems with multiple queue managers, you might have more than three invalid files.)
    2. Reassign the logical name QMAN$MASTER on the affected systems and correct the definition in the startup procedure where the logical name is defined (usually SYLOGICALS.COM).
    3. Enter STOP/QUEUE/MANAGER/CLUSTER on an unaffected node to stop the valid queue manager.
    4. Enter START/QUEUE/MANAGER on any node and verify that the queuing system is working properly.

    12.12 Reporting a Queuing System Problem to Digital

    If you encounter problems with the queuing system that you need to report to Digital support representative, provide the information in the following table. This information will help Digital representatives diagnose your problem. Please provide as much of the information as possible. Previous | Next | Contents