Bugs to fix:

- some blocking routines (like sem_wait()) don't work if SA's aren't
  running yet, because the alarm system isn't up and running or there is no
  thread context to switch to. It would be weird to use them that
  way, but it's perfectly legal.
- There is a race between pthread_cancel() and
  pthread_cond_broadcast() or pthread_exit() about removing an item
  from the sleep queue. The locking protocols there need a little
  adjustment.
- pthread_sig.c: pthread__kill_self() passes a bogus ucontext to the handler.
  This is probably not very important.
- pthread_sig.c: Come up with a signal trampoline naming convention like
  libc's, so that GDB will have an easier time with things.
- Consider moving pthread__signal_tramp() to its own file, and building
  it with -fasync-unwind-tables, so that DWARF2 EH unwinding works through
  it.  (This is required for e.g. GCC's libjava.)
- Add locking to ld.elf_so so that multiple threads doing lazy binding
  doesn't trash things.
- Verify the cancel stub symbol trickery.


Interfaces/features to implement:
- pthread_atfork()
- priority scheduling
- libc integration: 
   - foo_r interfaces
- system integration
   - some macros and prototypes belong in headers other than pthread.h


Features that need more/better regression tests:
 - pthread_cond_broadcast()
 - pthread_once()
 - pthread_get/setspecific()
 - signals


Things that need fixing:
- Recycle dead threads for new threads.

Ideas to play with:
- Explore the trapcontext vs. usercontext distinction in ucontext_t.
- Get rid of thread structures when too many accumulate (is this
  actually a good idea?)
- Adaptive spin/sleep locks for mutexes.
- Currently, each thread uses two real pages of memory: one at the top
  of the stack for actual stack data, and one at the bottom for the
  pthread_st. If we can get suitable space above the initial stack for
  main(), we can cut this to one page per thread. Perhaps crt0 should
  do something different (give us more space) if libpthread is linked
  in?
- Figure out whether/how to expose the inline version of
  pthread_self().
- Along the same lines, figure out whether/how to use registers reserved
  in the ABI for thread-specific-data to implement pthread_self().
- Figure out what to do with changing stack sizes.
