tor-browser

The Tor Browser
git clone https://git.dasho.dev/tor-browser.git
Log | Files | Refs | README | LICENSE

using_io_timeouts_and_interrupts_on_nt.rst (5662B)


      1 This technical memo is a cautionary note on using NetScape Portable
      2 Runtime's (NSPR) IO timeout and interrupt on Windows NT 3.51 and 4.0.
      3 Due to a limitation of the present implementation of NSPR IO on NT,
      4 programs must follow the following guideline:
      5 
      6 If a thread calls an NSPR IO function on a file descriptor and the IO
      7 function fails with <tt>PR_IO_TIMEOUT_ERROR</tt> or
      8 <tt>PR_PENDING_INTERRUPT_ERROR</tt>, the file descriptor must be closed
      9 before the thread exits.
     10 
     11 In this memo we explain the problem this guideline is trying to work
     12 around and discuss its limitations.
     13 
     14 .. _NSPR_IO_on_NT:
     15 
     16 NSPR IO on NT
     17 -------------
     18 
     19 The IO model of NSPR 2.0 is synchronous and blocking. A thread calling
     20 an IO function is blocked until the IO operation finishes, either due to
     21 a successful IO completion or an error. If the IO operation cannot
     22 complete before the specified timeout, the IO function returns with
     23 <tt>PR_IO_TIMEOUT_ERROR</tt>. If the thread gets interrupted by another
     24 thread's <tt>PR_Interrupt()</tt> call, the IO function returns with
     25 <tt>PR_PENDING_INTERRUPT_ERROR</tt>.
     26 
     27 On Windows NT, NSPR IO is implemented using NT's *overlapped* (also
     28 called *asynchronous*) *IO*. When a thread calls an IO function, the
     29 thread issues an overlapped IO request using the overlapped buffer in
     30 its <tt>PRThread</tt> structure. Then the thread is put to sleep. In the
     31 meantime, there are dedicated internal threads (called the *idle
     32 threads*) monitoring the IO completion port for completed IO requests.
     33 If a completed IO request appears at the IO completion port, an idle
     34 thread fetches it and wakes up the thread that issued the IO request
     35 earlier. This is the normal way the thread is awakened.
     36 
     37 .. _IO_Timeout_and_Interrupt:
     38 
     39 IO Timeout and Interrupt
     40 ------------------------
     41 
     42 However, NSPR may wake up the thread in two other situations:
     43 
     44 -  if the overlapped IO request is not completed before the specified
     45   timeout. (Note that we can't specify timeout on overlapped IO
     46   requests, so the timeouts are all handled at the NSPR level.) In this
     47   case, the error is <tt>PR_IO_TIMEOUT_ERROR</tt>.
     48 -  if the thread gets interrupted by another thread's
     49   <tt>PR_Interrupt()</tt> call. In this case, the error is
     50   <tt>PR_PENDING_INTERRUPT_ERROR</tt>.
     51 
     52 These two errors are generated by the NSPR layer, so the OS is oblivious
     53 of what is going on and the overlapped IO request is still in progress.
     54 The OS still has a pointer to the overlapped buffer in the thread's
     55 <tt>PRThread</tt> structure. If the thread subsequently exists and its
     56 <tt>PRThread</tt> structure gets deleted, the pointer to the overlapped
     57 buffer will be pointing to freed memory. This is problematic.
     58 
     59 .. _Canceling_Overlapped_IO_by_Closing_the_File_Descriptor:
     60 
     61 Canceling Overlapped IO by Closing the File Descriptor
     62 ------------------------------------------------------
     63 
     64 Therefore, we need to cancel the outstanding overlapped IO request
     65 before the thread exits. NT's <tt>CancelIo()</tt> function would be
     66 ideal for this purpose. Unfortunately, <tt>CancelIo()</tt> is not
     67 available on NT 3.51. So we can't go this route as long as we are
     68 supporting NT 3.51. The only reliable way to cancel outstanding
     69 overlapped IO request that works on both NT 3.51 and 4.0 is to close the
     70 file descriptor, hence the rule of thumb stated at the beginning of this
     71 memo.
     72 
     73 .. _Limitations:
     74 
     75 Limitations
     76 -----------
     77 
     78 This seemingly harsh way to force the completion of outstanding
     79 overlapped IO request has the following limitations:
     80 
     81 -  It is difficult for threads to shared a file descriptor. For example,
     82   suppose thread A and thread B call <tt>PR_Accept()</tt> on the same
     83   socket, and they time out at the same time. Following the rule of
     84   thumb, both threads would close the socket. The first
     85   <tt>PR_Close()</tt> would succeed, but the second <tt>PR_Close()</tt>
     86   would be freeing freed memory. A solution that may work is to use a
     87   lock to ensure only one thread can be using that socket at all times.
     88 -  Once there is a timeout or interrupt error, the file descriptor is no
     89   longer usable. Suppose the file descriptor is intended to be used for
     90   the life time of the process, for example, the logging file, this is
     91   really not acceptable. A possible solution is to add a
     92   <tt>PR_DisableInterrupt()</tt> function to turn off interrupts when
     93   accessing such file descriptors.
     94 
     95 ..
     96 
     97   *A related known bug is that timeout and interrupt don't work for
     98   <tt>PR_Connect()</tt> on NT. This bug is due to a different
     99   limitation in our NT implementation.*
    100 
    101 .. _Conclusions:
    102 
    103 Conclusions
    104 -----------
    105 
    106 As long as we need to support NT 3.51, we need to program under the
    107 guideline that after an IO timeout or interrupt error, the thread must
    108 make sure the file descriptor is closed before it exits. Programs should
    109 also take care in sharing file descriptors and using IO timeout or
    110 interrupt on files that need to stay open throughout the process.
    111 
    112 When we stop supporting NT 3.51, we can look into using NT 4's
    113 <tt>CancelIo()</tt> function to cancel outstanding overlapped IO
    114 requests when we get IO timeout or interrupt errors. If
    115 <tt>CancelIo()</tt> really works as advertised, that should
    116 fundamentally solve this problem.
    117 
    118 If these limitations with IO timeout and interrupt are not acceptable to
    119 the needs of your programs, you can consider using the Win95 version of
    120 NSPR. The Win95 version runs without trouble on NT, but you would lose
    121 the better performance provided by NT fibers and asynchronous IO.
    122 
    123 |
    124 
    125 .. _Original_Document_Information:
    126 
    127 Original Document Information
    128 -----------------------------
    129 
    130 -  Author: larryh@netscape.com
    131 -  Last Updated Date: December 1, 2004