Guide to DECthreads

Before your program creates a thread, it should set up all requirements that the new thread needs in order to execute. For example, if your program must set the new thread's scheduling parameters, do so with attributes objects when you create it, rather than trying to use pthread_setschedparam() or other routines afterwards. To set global data for the new thread or to create synchronization objects, do so before you create the thread, else set them in a pthread_once() initialization routine that is called from each thread.

3.1.3 Using Scheduling to Accomplish Synchronization

Avoid using scheduling policy and scheduling priority attributes of threads as a synchronization mechanism.

In a uniprocessor system, only one thread can run at a time, and when a higher-priority thread becomes runnable, it immediately preempts a lower-priority running thread. Therefore, a thread running at higher priority might erroneously be presumed not to need a mutex to access shared data.

On a multiprocessor system, higher- and lower-priority threads are likely to run at the same time. Situations can even arise where higher-priority threads are waiting to run while the threads that are running have a lower priority.

If you know that your code is going to run only on a uniprocessor implementation, never try to use scheduling as a synchronization mechanism. When you design the code correctly from the beginning, your code will be safer, more portable, and upwardly compatible to a new release of the system with SMP support.

3.2 Memory Synchronization Between Threads

Your multithreaded program that uses DECthreads must ensure that access to data shared between threads is synchronized.

The POSIX.1c standard guarantees that the following functions will synchronize memory with respect to other threads:

fork() pthread_cond_signal()

pthread_create() pthread_cond_broadcast()

pthread_join() sem_post()

pthread_mutex_lock() sem_trywait()

pthread_mutex_trylock() sem_wait()

pthread_mutex_unlock() wait()

pthread_cond_wait() waitpid()

pthread_cond_timedwait()

If a call to one of these functions returns an error, synchronization is not guaranteed. For example, an unsuccessful call to pthread_mutex_trylock() does not necessarily provide any synchronization.

3.3 Using Shared Memory

Most threads do not operate independently. They cooperate to accomplish a task, and cooperation requires communication. There are many ways that threads can communicate, and which method is most appropriate depends on the task.

Threads that cooperate only rarely (for example, a boss thread that only sends off a request for workers to do long tasks) may be satisfied with a relatively slow form of communication. Threads that must cooperate more closely (for example, a set of threads performing a parallelized matrix operation) need fast communication---maybe even to the extent of using machine-specific atomic hardware operations.

Most mechanisms for thread communication involve the use of shared memory, exploiting the fact that all threads within a process share their full address space. Although all addresses are shared, there are three kinds of memory that are characteristically used for communication. The following sections describe the scope (or, the range of locations in the program where code can access the memory) and lifetime (or, the length of time the memory exists) of each of the three types of memory.

3.3.1 Using Static Memory

Static memory is allocated by the language compiler when it translates source code, so the scope is controlled by the rules of the compiler. For example, in the C language, a variable declared as extern can be accessed anywhere, and a static variable can be referenced within the source module or routine, depending on where it is declared.

In this discussion static memory is not the same as the C language static storage class. Rather, static memory refers to any variable that is permanently allocated at a particular address for the life of the program.

It is appropriate to use static memory in your multithreaded program when you know that only one instance of an object exists throughout the application. For example, if you want to keep a list of active contexts or a mutex to control some shared resource, you would not want individual threads to have their own copies of that data.

The scope of static memory depends on your programming language's scoping rules. The lifetime of static memory is the life of the program.

3.3.2 Using Stack Memory

Stack memory is allocated by code generated by the language compiler at run time, generally when a routine is initially called. When the program returns from the routine, the storage ceases to be valid (although the addresses still exist and might be accessible).

Generally, the storage is valid for the entire execution of the routine, and the actual address can be calculated and passed to other threads; however, this depends on programming language rules. If you pass the address of stack memory to another thread, you must ensure that all other threads are finished processing that data before the routine returns; otherwise the stack will be cleared, and values might be altered by subsequent calls. The other threads will not be able to determine that this has happened, and erroneous behavior will result.

The scope of stack memory is the routine or a block within the routine. The lifetime is no longer than the time during which the routine executes.

3.3.3 Using Dynamic Memory

Dynamic memory is allocated by the program as a result of a call to some memory management function (for example, the C language run-time function malloc() or the OpenVMS common run-time function LIB$GET_VM).

Dynamic memory is referenced through pointer variables. Although the pointer variables are scoped depending on their declaration, the dynamic memory itself has no intrinsic scope or lifetime. It can be accessed from any routine or thread that is given its address and will exist until explicitly made free. In a language supporting automatic garbage collection, it will exist until the run-time system detects that there are no references to it. (If your language supports garbage collection, be sure the garbage collector is thread safe.)

The scope of dynamic memory is anywhere a pointer containing the address can be referenced. The lifetime is from allocation to deallocation.

Typically dynamic memory is appropriate to manage persistent context. For example, in a thread-reentrant routine that is called multiple times to return a stream of information (such as to list all active connections to a server or to return a list of users), using dynamic memory allows the program to create multiple contexts that are independent of all the program's threads. Thus, multiple threads could share a given context, or a single thread could have more than one context.

3.4 Managing a Thread's Stack

For each thread created by your program, DECthreads sets a default stack size that is acceptable to most applications. You can also set the stacksize attribute in a thread attributes object, to specify the stack size needed by the next thread created.

This section discusses the cases in which the stack size is insufficient (resulting in stack overflow) and how to determine the optimal size of the stack.

Most compilers on VAX systems do not probe the stack. Portable code that supports threads should use as little stack memory as practical.

Most compilers on Alpha systems generate code in the procedure prologue that probes the stack, ensuring there is enough space for the procedure to run.

3.4.1 Using a Stack Guard Region

DECthreads provides a stack guard region, or no-access memory at the end of each thread's stack, to allow a very fast and simple probe. If you create a thread that might need to allocate large arrays on the stack, create the thread using a thread attributes object that specifies a large guardsize attribute. A large stack guard region can help to prevent one thread from overflowing into another thread's stack region.

The low-level memory regions that form a stack guard region are also known as guard pages.

3.4.2 Handling Stack Overflow

A program can receive a memory error (access violation, bus error, or segmentation fault) when it overflows its stack. It is often necessary to run the program under control of your system's debugger to determine where these errors occur. However, if the debugger shares resources with the target process, perhaps allocating its own data objects on the target process's stack, the debugger might not function properly when the stack overflows. In this case, you might be required to analyze the target process by means other than the debugger.

To set the stacksize attribute in a thread attributes object, use the pthread_attr_stacksize() routine. (See Section 2.3.2.4 for more information.)

A useful debugging technique: If a thread receives a memory access exception during a routine call or when accessing a local variable, increase the size of the stack. Of course, not all memory access exceptions indicate a stack overflow.

For programs that you cannot run under a debugger, determining a stack overflow is more difficult. This is especially true if the program continues to run after receiving a memory access exception. For example, if a stack overflow occurs while a mutex is locked, the mutex might not be released as the thread recovers or terminates. When the program attempts to lock that mutex again, it hangs.

3.4.3 Sizing the Stack

To determine the optimal size of a thread's stack, multiply the largest number of nested subroutine calls by the size of the call frames and local variables. Add to that number an extra amount of memory to accommodate interrupts. Determine this figure is difficult, because stack frames vary in size, and because it might not be possible to estimate the depth of library function call frames.

You can also run your program using a profiling tool that measures actual stack use. This is commonly done by "poisoning" the stack before it is used by writing a distinctive pattern, and then checking for that pattern after the thread completes. DECthreads uses this mechanism when run with metering enabled. (For more information, see Section D.1.3.) Remember: Use of profiling monitoring tools typically increases the amount of stack memory that your program uses.

3.5 Scheduling Issues

There are programming issues that are unique to the scheduling-related attributes of threads.

3.5.1 Real-Time Scheduling

Use care when writing code that uses real-time scheduling to control the priority of threads:

Review Section 3.1. Scheduling of threads is not the same as synchronizing of threads.
Threads with higher priority do not necessarily make your code run faster. Real-time priority adds overhead that can slow a program down, especially when interfacing with other libraries. For example, a higher-priority thread that polls for keyboard input may block work being done by other threads.
Watch for pitfalls like priority inversion. It is best to avoid relying on real-time scheduling, except where necessary to meet design goals. On the other hand, most systems that interact with external devices have some real-time aspect.

3.5.2 Priority Inversion

Priority inversion occurs when the interaction among a group of three or more threads causes that group's highest-priority thread to be blocked from executing. For example, a higher-priority thread waits for a resource locked by a low-priority thread, and the low-priority thread waits while a middle-priority thread executes. The higher-priority thread is made to wait while a thread of lower priority (the middle-priority thread) executes.

You can address the phenomenon of priority inversion as follows:

To avoid priority inversion, associate a priority (at least as high as the highest-priority thread that will use it) with each resource and force any thread using that object to first increase its priority to that associated with the object.
To minimize the chance that an occurrence of priority inversion will cause a complete blockage of higher-priority threads, use the (default) throughput scheduling policy. The throughput scheduling policy allows even low-priority threads to execute eventually and to release the resources they hold.
The FIFO and RR scheduling policies do not provide for resumption of the low-priority thread if the middle-priority thread executes indefinitely.

3.6 Using Synchronization Objects

The following sections discuss how to determine when to use a mutex versus a condition variable, and how to use mutexes to prevent two erroneous behaviors that are common in multithreaded programs: race conditions and deadlocks.

Also discussed is why you should signal a condition variable with the associated mutex locked.

3.6.1 Distinguishing Proper Usage of Mutexes and Condition Variables

Use a mutex for tasks with fine granularity. Examples of a "fine-grained" task are those that serialize access to shared memory or make simple modifications to shared memory. This typically corresponds to a critical section of a few program statements or less.

Mutex waits are not interruptible. Threads waiting to acquire a mutex cannot be alerted or canceled.

Do not use a condition variable to protect access to data. Rather, use it to wait for data to assume a desired state. Always use a condition variable with a mutex that protects the shared data. Condition variable waits are interruptible.

See Section 2.4.1 and Section 2.4.2 for more information about mutexes and condition variables.

3.6.2 Avoiding Race Conditions

A race condition occurs when two or more threads perform an operation, and the result of the operation depends on unpredictable timing factors; specifically, when each thread executes and waits and when each thread completes the operation.

For example, if two threads execute routines and each increments the same variable (such as x = x + 1), the variable could be incremented twice and one of the threads could use the wrong value. For example:

Thread A increments variable x.
Thread A is interrupted (or blocked, or scheduled off), and thread B is started.
Thread B starts and increments variable x.
Thread B is interrupted (or blocked, or scheduled off), and thread A is started.
Thread A checks the value of x and performs an action based on that value.
The value of x differs from when thread A incremented it, and the program's behavior is incorrect.

Race conditions result from lack of (or ineffectual) synchronization. To avoid race conditions, ensure that any variable modified by more than one thread has only one mutex associated with it, and ensure that all accesses to the variable are made after acquiring that mutex.

See Section 3.6.4 for another example of a race condition.

3.6.3 Avoiding Deadlocks

A deadlock occurs when a thread holding a resource is waiting for a resource held by another thread, while that thread is also waiting for the first thread's resource. Any number of threads can be involved in a deadlock if there is at least one resource per thread. A thread can deadlock on itself. Other threads can also become blocked waiting for resources involved in the deadlock.

Following are two techniques you can use to avoid deadlocks:

Use sequence numbers with fast mutexes. Associate a sequence number with each mutex and acquire mutexes in sequence. Never attempt to acquire a mutex with a sequence number lower than that of a mutex the thread already holds.
If a thread needs to acquire a mutex with a lower sequence number, it must first release all mutexes with a higher sequence number (after ensuring that the protected data is in a consistent state).
Use a recursive mutex. This technique is useful when a thread needs to acquire the same mutex more than once before releasing it. This technique can help prevent a thread from deadlocking on itself.

3.6.4 Signaling a Condition Variable

When you are signaling a condition variable and that signal might cause the condition variable to be deleted, signal or broadcast the condition variable with the mutex locked.

The following C code fragment is executed by a releasing thread (Thread A):

 
   pthread_mutex_lock (m); 
 
   ... /* Change shared variables to allow another thread to proceed */ 
 
   predicate = TRUE; 
   pthread_mutex_unlock (m); 
                               (1)
   pthread_cond_signal (cv);   (2)

The following C code fragment is executed by a potentially blocking thread (thread B):

 
   pthread_mutex_lock (m); 
   while (!predicate ) 
         pthread_cond_wait (cv, m); 
 
   pthread_mutex_unlock (m); 
   pthread_cond_destroy (cv);

If thread B is allowed to run while thread A is at this point, it finds the predicate true and continues without waiting on the condition variable. Thread B might then delete the condition variable with the pthread_cond_destroy() routine before thread A resumes execution.
When thread A executes this statement, the condition variable does not exist and the program fails.

The previous code fragments also demonstrate a race condition. The program depends on a sequence of events among multiple threads, but it does not enforce the desired sequence. Signaling the condition variable while holding the associated mutex eliminates the race condition. That prevents thread B from deleting the condition variable until after thread A has signaled it.

This problem can occur when the releasing thread is a worker thread and the waiting thread is a boss thread, and the last worker thread tells the boss thread to delete the variables that are being shared by boss and worker.

Code the signaling of a condition variable with the mutex locked as follows:

pthread_mutex_lock (m); 
... /* Change shared variables to allow some other thread to proceed */ 
 
pthread_cond_signal (cv);   
pthread_mutex_unlock (m);

3.7 One-Time Initialization

Your program might have one or more routines that must be executed before any thread executes code in your facility, but that must be executed only once, regardless of the sequence in which threads start executing. For example, your program can initialize mutexes, condition variables, or thread-specific data keys---each of which must be created only once---in a threads-oriented initialization routine.

Use the pthread_once() routine to ensure that your program's initialization routine executes only once---that is, by the first thread that attempts to initialize your program's threads-oriented resources. Multiple threads can call the pthread_once() routine, and DECthreads ensures that the specified routine is called only once.

On the other hand, rather than use the pthread_once() routine, your program can statically initialize a mutex and a flag, then simply lock the mutex and test the flag. In many cases, this technique might be more straightforward to implement.

Finally, you can select initialization mechanisms, such as OpenVMS LIB$INITIALIZE, Digital UNIX dynamic loader __init_ code, or Win32 DLL initialization handlers for Windows NT and Windows 95.

3.8 Managing Dependencies Upon Other Libraries

Because multithreaded programming has become common only recently, many existing code libraries are incompatible with multithreaded routines. For example, many of the traditional C run-time library routines maintain state across multiple calls using static storage. This storage can become corrupted if routines are called from multiple threads at the same time. Even if the calls from multiple threads are serialized, code that depends upon a sequence of return values might not work.

For example, the Digital UNIX getpwent(2) routine returns the entries in the password file in sequence. If multiple threads call getpwent(2) repeatedly, even if the calls are serialized, no thread can obtain all entries in the password file.

Library routines might be compatible with multithreaded programming to different extents. The important distinctions are thread reentrancy and thread safety.

3.8.1 Thread Reentrancy

A routine is thread reentrant if it performs correctly despite being called simultaneously or sequentially by different threads. For example, the standard C run-time library function strtok() can be made thread reentrant most efficiently by adding an argument that specifies a context for the sequence of tokens. Thus, multiple threads can simultaneously parse different strings without interfering with each other.

The ideal thread-reentrant routine has no dependency on static data. Because static data must be synchronized using mutexes and condition variables, there is always a performance penalty due to the time required to lock and unlock the mutex and also in the loss of potential parallelism throughout the program. A routine that does not use any data that is shared between threads can proceed without locking.

If you are developing new interfaces, make sure that any persistent context information (like the last-token-returned pointer in strtok()) is passed explicitly so that multiple threads can process independent streams of information independently. Return information to the caller through routine values, output parameters (where the caller passes the address and length of a buffer), or by allocating dynamic memory and requiring the caller to free that memory when finished. Try to avoid using errno for returning error or diagnostic information; use routine return values instead.

3.8.2 Thread Safety

A routine is thread safe if it can be called simultaneously from multiple threads without risk of corruption. Generally this means that it does some simple level of locking (perhaps using the DECthreads global lock) to prevent simultaneously active calls in different threads. See Section 3.8.3.3 for information about the DECthreads global lock.

Thread-safe routines might be inefficient. For example, a Digital UNIX stdio package that is thread safe might still block all threads in the process while waiting to read or write data to a file. Routines such as localtime(3) or strtok(), which traditionally rely on static storage, can be made thread safe by using thread-specific data instead of static variables.

3.8.3 Lacking Thread Safety

When your program must call a routine that is not thread safe, your program must ensure serialization and exclusivity of the unsafe routine across all threads in the program.

If a routine is not specifically documented as thread reentrant or thread safe, assume that it is not safe to use as-is with your multithreaded program. Never assume that a routine is fully thread reentrant unless it is expressly documented as such; a routine can use static data in ways that are not obvious from its interface. A routine carefully written to be thread reentrant but that calls some other routine that is not thread safe without proper protection, is itself not thread safe.

3.8.3.1 Using Mutex Around Call to Unsafe Code

Holding a mutex while calling any unsafe code accomplishes this. All threads and libraries using the routine should use the same mutex. Note that even if two libraries carefully lock a mutex around every call to a given routine, if each library uses a different mutex, the routine is not protected against multiple simultaneous calls from different libraries.

Note that your program might be required to protect a series of calls, rather than just a single call, to routines that are not thread safe.

3.8.3.2 Using or Copying Static Data Before Releasing the Mutex

In many cases your program must protect more than just the call itself to a routine that is not thread safe. Your program must use or copy any static return values before releasing the mutex that is being held.

3.8.3.3 Using the DECthreads Global Lock

To ensure serialization and exclusivity of the unsafe code, DECthreads provides one global lock that can be used by all threads in a program when calling routines or code that is not thread safe. The global lock allows a thread to acquire the lock recursively, so that you do not need to be concerned if you call a routine that also may acquire the global lock.

Acquire the global lock by calling pthread_lock_global_np(); release the global lock by calling pthread_unlock_global_np().

Because there is only one global lock, you do not need to fully analyze all of the dependencies in unsafe code that your program calls. For example, with private locks to protect unsafe code, one lock might protect calls to the stdio routine, while another protects calls to math routines. However, if stdio next calls a math routine without acquiring the math routine lock, the call is just as unsafe as if no locks were used.

  6493P004.HTM
  OSSG Documentation
  22-NOV-1996 13:20:01.09

Legal

fork()	pthread_cond_signal()
pthread_create()	pthread_cond_broadcast()
pthread_join()	sem_post()
pthread_mutex_lock()	sem_trywait()
pthread_mutex_trylock()	sem_wait()
pthread_mutex_unlock()	wait()
pthread_cond_wait()	waitpid()
pthread_cond_timedwait()