2023-10-15

How investigate disk cache usage in Win32 application?

I have a workload similar to the following:

while True:
    data = get_data_from_network();
    filename = sha1(data);
    write_to_file(filename, data, data.size());

Occasionally I read back from the file, but it's not very common. Importantly, I get a lot of these network requests. It's not uncommon for me to a gigabyte of data out to the disk this way. So for the most part I'm effectively just streaming large volumes of data to the disk. There is this article from Raymond Chen where he advises the customer not to use the flag, because as Raymond puts it:

If the application reads back from the file, the read can be satisfied from the disk cache, avoiding the physical I/O entirely

But I'm not sure if this applies to me, because depending on the size of the cache, there's a pretty good chance that by the time I go to read that data again, it's already been pushed out by some other data.

I can bypass this with FILE_FLAG_NO_BUFFERING when I call CreateFile(), but before I just go and blindly do this, I'm wondering how can I investigate the impact of this from a performance point of view. I can just time my application, sure, but I'd like to go deeper.

For starters, how big even is the OS cache? Is it per-process, per-file, global? Is the size configurable? Can I query its size programatically via an API? Is there a way for me to investigate if it's being thrashed due to my workload? Is there a way to run my program and then determine how many disk reads were served from the memory cache as opposed to from the physical media?



No comments:

Post a Comment