Wednesday, May 19, 2010

Understanding ReadDirectoryChangesW - Part 1

The longest, most detailed description in the world of how to successfully use ReadDirectoryChangesW.

This is Part 1 of 2. This part describes the theory and Part 2 describes the implementation.

Go to the GitHub repo for this article or just download the sample code.


I have spent this week digging into the barely-documented world of ReadDirectoryChangesW and I hope this article saves someone else some time. I believe I've read every article I could find on the subject, as well as numerous code samples. Almost all of the examples, including the one from Microsoft, either have significant shortcoming or have outright mistakes.

You'd think that this problem would have been a piece of cake for me, having been the author of Multithreading Applications in Win32, where I wrote a chapter about the differences between synchronous I/O, signaled handles, overlapped I/O, and I/O completion ports. Except that I only write overlapped I/O code about once every five years, which is just about long enough for me to forget how painful it was the last time. This endeavor was no exception.

Four Ways to Monitor Files and Directories

First, a brief overview of monitoring directories and files. In the beginning there was SHChangeNotifyRegister. It was implemented using Windows messages and so required a window handle. It was driven by notifications from the shell (Explorer), so your application was only notified about things that the shell cared about - which almost never aligned with what you cared about. It was useful for monitoring things that the user did in Explorer, but not much else.

SHChangeNotifyRegister was fixed in Windows Vista so it could report all changes to all files, but is was too late - there are still several hundred million Windows XP users and that's not going to change any time soon.

SHChangeNotifyRegister also had a performance problem, since it was based on Windows messages. If there were too many changes, your application would start receiving roll-up messages that just said "something changed" and you had to figure out for yourself what had really happened. Fine for some applications, rather painful for others.

Windows 2000 brought two new interfaces, FindFirstChangeNotification and ReadDirectoryChangesW. FindFirstChangeNotification is fairly easy to use but doesn't give any information about what changed. Even so, it can be useful for applications such as fax servers and SMTP servers that can accept queue submissions by dropping a file in a directory. ReadDirectoryChangesW does tell you what changed and how, at the cost of additional complexity.

Similar to SHChangeNotifyRegister, both of these new functions suffer from a performance problem. They can run significantly faster than shell notifications, but moving a thousand files from one directory to another will still cause you to lose some (or many) notifications. The exact cause of the missing notifications is complicated. Surprisingly, it apparently has little to do with how fast you process notifications.

Note that FindFirstChangeNotification and ReadDirectoryChangesW are mutually exclusive. You would use one or the other, but not both.

Windows XP brought the ultimate solution, the Change Journal, which could track in detail every single change, even if your software wasn't running. Great technology, but equally complicated to use.

The fourth and final solution is is to install a File System Filter, which was used in the popular SysInternals FileMon tool. There is a sample of this in the Windows Driver Kit (WDK). However, this solution is essentially a device driver and so potentially can cause system-wide stability problems if not implemented exactly correctly.

For my needs, ReadDirectoryChangesW was a good balance of performance versus complexity.

The Puzzle

The biggest challenge to using ReadDirectoryChangesW is that there are several hundred possibilities for combinations of I/O mode, handle signaling, waiting methods, and threading models. Unless you're an expert on Win32 I/O, it's extremely unlikely that you'll get it right, even in the simplest of scenarios. (In the list below, when I say "call", I mean a call to ReadDirectoryChangesW.)

A. First, here are the I/O modes:
  1. Blocking synchronous
  2. Signaled synchronous
  3. Overlapped asynchronous
  4. Completion Routine (aka Asynchronous Procedure Call or APC)
B. When calling the WaitForXxx functions, you can:
  1. Wait on the directory handle.
  2. Wait on an event object in the OVERLAPPED structure.
  3. Wait on nothing (for APCs.)
C. To handle notifications, you can use:
  1. Blocking
  2. WaitForSingleObject
  3. WaitForMultipleObjects
  4. WaitForMultipleObjectsEx
  5. MsgWaitForMultipleObjectsEx
  6. I/O Completion Ports
D. For threading models, you can use:
  1. One call per worker thread.
  2. Multiple calls per worker thread.
  3. Multiple calls on the primary thread.
  4. Multiple threads for multiple calls. (I/O Completion Ports)
Finally, when calling ReadDirectoryChangesW, you specify flags to choose what you want to monitor, including file creation, last modification date change, attribute changes, and other flags. You can use one flag per call  and issue multiple calls or you can use use multiple flags in one call. Multiple flags is always the right solution. If you think you need to use multiple calls with one flag per call to make it easier to figure out what to do, then you need to read more about the data contained in the notification buffer returned by ReadDirectoryChangesW.

If your head is now swimming in information overload, you can easily see why so many people have trouble getting this right.

Recommended Solutions

So what's the right answer? Here's my opinion, depending on what's most important:

Simplicity - A2C3D1 - Each call to ReadDirectoryChangesW runs  in its own thread and sends the results to the primary thread with PostMessage. Most appropriate for GUI apps with minimal performance requirements. This is the strategy used in CDirectoryChangeWatcher on CodeProject. This is also the strategy used by Microsoft's FWATCH sample.

Performance - A4C6D4 - The highest performance solution is to use I/O completion ports, but, as an aggressively multithreaded solution, it's also a very complex solution that should be confined to servers. It's unlikely to be necessary in any GUI application. If you aren't a multithreading expert, stay away from this strategy.

Balanced - A4C5D3 - Do everything in one thread with Completion Routines. You can have as many outstanding calls to ReadDirectoryChangesW as you need. There are no handles to wait on, since Completion Routines are dispatched automatically. You embed the pointer to your object in the callback, so it's easy to keep callbacks matched up to their original data structure.

Originally I had thought that GUI applications could use MsgWaitForMultipleObjectsEx to intermingle change notifications with Windows messages. This turns out not to work because dialog boxes have their own message loop that's not alertable, so a dialog box being displayed would prevent notifications from being processed. Another good idea steamrolled by reality.

Wrong Techniques

As I was researching this solution, I saw a lot of recommendations that ranged from dubious to wrong to really, really wrong. Here's some commentary on what I saw.

If you are using the Simplicity solution above, don't use blocking calls because the only way to cancel it is with the undocumented technique of closing the handle or the Vista-only technique of CancelSynchronousIo. Instead, use the Signal Synchronous I/O mode by waiting on the directory handle. Also, to terminate threads, don't use TerminateThread, because that doesn't clean up resources and can cause all sorts of problems. Instead, create a manual-reset event object that is used as the the second handle in the call to WaitForMultipleObjects.When the event is set, exit the thread.

If you have dozens or hundreds of directories to monitor, don't use the Simplicity solution. Switch to the Balanced solution. Alternatively, monitor a root common directory and ignore files you don't care about.

If you have to monitor a whole drive, think twice (or three times) about this idea. You'll be notified about every single temporary file, every Internet cache file, every  Application Data change - in short, you'll be getting an enormous number of notifications that could slow down the entire system. If you need to monitor an entire drive, you should probably use the Change Journal instead. This will also allow you to track changes even if your app is not running. Don't even think about monitoring the whole drive with FILE_NOTIFY_CHANGE_LAST_ACCESS.

If you are using overlapped I/O without using an I/O completion port, don't wait on handles. Use Completion Routines instead. This removes the 64 handle limitation, allows the operating system to handle call dispatch, and allows you to embed a pointer to your object in the OVERLAPPED structure. My example in a moment will show all of this.

If you are using worker threads, don't send results back to the primary thread with SendMessage.  Use PostMessage instead. SendMessage is synchronous and will not return if the primary thread is busy. This would defeat the purpose of using a worker thread in the first place.

It's tempting to try and solve the issue of lost notifications by providing a huge buffer. However, this may not be the wisest course of action. For any given buffer size, a similarly-sized buffer has to be allocated from the kernel non-paged memory pool. If you allocate too many large buffers, this can lead to serious problems, including a Blue Screen of Death. Thanks to an anonymous contributor in the MSDN Community Content.

Jump to Part 2 of this article.

Go to the GitHub repo for this article or just download the sample code.

21 comments:

  1. Hi Jim,

    thank you very much for your detailed explanation. After searching a while in internet I can say your description helped me alot and it is the most complete one.

    Cheers

    ReplyDelete
  2. Thx for sharing. Great explanation of ReadDirectoryChangesW!

    ReplyDelete
  3. Thanks for your article, and the source code, I found it very useful, saved a lot of time!

    I found one thing in CReadChangesRequest::ProcessNotification():

    if (wstrFilename.Right(1) != L"\\")

    Shouldn't this better be:

    if (m_wstrDirectory.Right(1) != L"\\")

    Regards,

    Jost

    ...

    ReplyDelete
  4. Thank You!!! This helps

    ReplyDelete
  5. Hi Jim,

    Thanks for the great article. Any idea how .NET System.IO.FileSystemWatcher implements its functionality. Would you recommend its use with a timer for watching files dropped via FTP?

    Dave W

    ReplyDelete
    Replies
    1. Dave,

      I haven’t done much with .Net (I’m an SDK guy) so I don’t have a good answer for you about that.

      As I discuss near the beginning of Part 1, FindFirstChangeNotification is a much simpler way to monitor for new files. ReadDirectoryChangesW is more complicated than you need. However, you need to read the discussion about timeouts in the Comments after Part 2.

      Delete
    2. We use FileSystemWatcher to monitor for SFTP changes coming in on a SAN.

      This complicates things as we don't see all create events for example, and the client renames their sftped files after copy onto the san.

      We monitor for all events in the directory, using them as an indication that 'something' is happening in the directory. Then we take a directory listing and act on that, looping until nothing is left in the directory, and just registering that other notifications might be occurring - although we take some care to not miss a notification that comes in after the directory listing that says there is nothing left to do.

      This 'strategy' means we are not subject to 'lost notifications'.

      Delete
  6. Hi jim. Your article is great and useful. I have a question I hope you can help me. Anti virus softwares offer some feature they call Real-time protection or on-access scan. as wikipedia says:
    'real-time'means while data loaded into the computer's active memory: when inserting a CD, opening an email, or browsing the web, or when a file already on the computer is opened or executed.
    I'm interested in writing some code to implement this on-access or real-time functionality.
    do you have any suggestions to write a code which can monitor active memory changes and retrieve file address responsible for that change to trigger a scan by some tool.
    thank you very much.

    ReplyDelete
    Replies
    1. Hello Rezatash,

      Windows has some built-in functions for allowing antivirus to do its job, but I've never worked with them. Device change notifications are available at the user level with notifications, but that's too late for antivirus, so it probably needs to be done at the device driver level. I have no experience, sorry.

      Delete
  7. Very helpful. But a probrem can be found at ThreadSafeQueue.h. CThreadSafeQueue::pop() never calls WaitForSingleObject() when the list is not empty.

    ReplyDelete
  8. polyvertex and Anonymous,

    Thanks so much for your feedback on this article. My schedule at the moment is completely overwhelmed and I don't expect to have time to dig into this for at least several weeks. Your comments definitely point out the complexity of using these APIs. The good news is that this code has been in production use for several years on systems that log all crashes, and we haven't seen any related crashes.

    ReplyDelete
  9. I have been searching for some help on 'tail -f' like solution. Finally, I came across your blog and solved my problem. Thanks!

    ReplyDelete
  10. Hi,
    Great job, great description and sample.
    Do you already have a sample for the "A4C6D4" solution?
    Thx
    Fred

    ReplyDelete
    Replies
    1. Unfortunately, no. One alternative that's somewhat easier to implement is as follows:
      - use the existing sample.
      - any "fast" processing can be done in the overlapped I/O callback.
      - any "slow" processing can be handed off to a Windows thread pool.

      Hope this helps.

      Delete
    2. Thx for your answer. I'll see, if I can adapt your sample....
      Thx again for the article and the sample! Great!

      Delete
  11. As far as directory monitoring goes, a much more reliable alternative to ReadDirectoryChangesW() is to use a little-known feature of NTFS known as the NTFS USN Journal. Before you say, "That requires admin rights", accessing the journal actually only requires admin rights on the OS volume (i.e. C:). Other volumes are freely available without having to elevate to Administrator.

    The far more difficult issue is that accessing the USN Journal itself is quite complex - the MSDN Library documentation on USN Journal operations is quite sparse and mostly focuses on v2 Journal records. If you want to be ReFS ready (assuming that ever happens), you have to also handle v3 records, which gets quite tricky. The much more difficult issue to deal with is that the USN Journal itself only provides a "file reference number" of the parent. To determine the full path and filename, you have to use OpenFileById() - a function only available on Vista and later even though the USN Journal was around long before that. If that call fails (e.g. the path no longer exists), you are hosed unless you have a reference to the ID/path saved somewhere. The other alternative to OpenFileById() is to read the $MFT and parse it...an esoteric exercise at best that few people have ever accomplished. VoidTools' "Everything" software, some Python scripts, and couple of forensic toolkits are about all I ran into when I was looking into reading the $MFT. However, with the USN Journal, you can monitor the entire file system quite efficiently. If you use the Overlapped I/O option on the relevant DeviceIoControl() calls, you can even get extremely close to real-time results - as soon as the kernel filesystem records the USN Journal record, your Overlapped I/O completes. The default USN Journal is rather large, so loss of information is pretty rare. Even on a fairly active system it doesn't roll over too frequently, so, generally-speaking, getting behind is only possible if your process exits and doesn't run for several hours. It's also configurable when it is created, so you can make it even bigger if you do find you get behind.

    The only downside I ran into that you have to consider is removable storage. The requisite CreateFile() call to open the device most likely locks the device so it can't be ejected properly if it is a USB thumbdrive or external hard drive or something like that. I didn't test it, but it's not a good idea to be using the USN Journal constantly even if it is the most efficient means to accessing directory changes on full NTFS volumes.

    A couple of other thoughts: Instead of using directory/journal monitoring as an IPC mechanism, if you control the source code to all of the applications involved, a mutex and a named pipe is probably a better solution.

    Another option could be to use a driver that creates a fake drive letter. Anything written to that "disk" would cause the driver to notify the application that it has something to do. Look up "Dokany" on GitHub. Dokany even has a FUSE wrapper, so you could write your application as a FUSE app and have it work on Linux and other OSes too. Then whatever it was that you were wanting to monitor, you just have it write to your fake drive letter/volume instead of a normal NTFS volume. The only downside is that this type of solution uses one of the rather limited 23 available DOS drive letters and Dokany or an application could accidentally trigger a BSOD. Playing with fire in a production environment is fun!

    ReplyDelete
  12. Thanks Thomas. I mentioned the Change Journal partway into this article, but it's not something I've spent much time on. You make good points in your comments.

    ReplyDelete
  13. Good article.

    > Windows 2000 brought two new interfaces

    Small point on the history though, from the 2000 MSDN Library:

    FindFirstChangeNotification:
    Windows NT/2000: Requires Windows NT 3.1 or later.
    Windows 95/98: Requires Windows 95 or later

    ReadDirectoryChangesW:
    Windows NT/2000: Requires Windows NT 3.51 SP3 or later.
    Windows 95/98:/ Unsupported.

    ReplyDelete
  14. Good article. Thanks!

    ReplyDelete