2023-05-22

In MS Outlook VBA, how to find hidden data in a MailItem

Environment: MS Office LTSC Pro Plus 2021 under Windows 11 Pro 64

Background

A couple of weeks ago, I had the startling experience of watching my Outlook inbox fill up with thousands of copies of old e-mails stored in various Outlook folders. A Google search informed me that this is not a new problem. It's apparently caused by a bug in Outlook that has apparently never been fixed, and I found nothing about what might cause the bug to be triggered and how to prevent that. It's a big problem because I use my inbox to keep a backlog of e-mails requiring later attention, and now those e-mails are drowning in a sea of random old e-mails. So I need to find a way to reliably identify and remove those copies.

To do this, I've been learning Outlook VBA (e.g.). I'm experienced in VBA for Excel and Access, but new to it in Outlook. What I've discovered so far is that the property CreatedTime is set to the time those copies were made. This has allowed me to determine that my inbox has about 8,000 of those errant copies, made in seven spurts on May 3, with each copied e-mail appearing between about two and seven times in the inbox. I have no idea why it happened those seven times on that day and why it hasn't happened again since then, and I live in fear of it suddenly happening again.

In a filesystem, one can run a comparison of two files to determine if they are identical. As far as I know, there is no such comparison facility that can be run between two e-mails. One has to pick a set of properties to compare and hope for the best. The system I've come up with looks for e-mails with the same values of the MailItem properties SenderName, To, and Subject and the same value of "timestamp", which I define as property SentOn if SenderEmailAddress is one of mine, and otherwise ReceivedTime. I suppose it would be more accurate to compare bodies, but I'm doing this by exporting properties to Excel, where the comparisons are run, and the e-mail bodies are too large to do that with. If I were more proficient in Outlook VBA, I could perhaps write a routine to do the comparisons there, but I haven't figured out how to do that. I thought of including Size as a proxy for body content, but I discovered that size can be mysteriously different for two e-mails that appear to be otherwise identical. More about that below.

That is the background of the question I ask below. In addition to an answer to that question, I'd also be grateful if someone can direct me to any technical information about the bug that caused those copies, what triggers it, and how to prevent it being triggered again in the future.

Hidden data

In my inspection of the data of those copies, I've made two (Edit: three) strange observations:

  • In order to keep down the size of my pst file, when I archive an e-mail in Outlook, I remove any sizable attachments for storage elsewhere. So I was flabbergasted to find that when Outlook generated these errant copies of old e-mails, many of them include attachments that I removed. This means that removing attachments from an e-mail does not reduce the size of the pst file at all, but merely hides the attachments!
  • There is a case of a file whose size is 21 kb. As far as I can tell, it never had an attachment. There are seven copies of it in the inbox with CreationTime at seven different times on May 3. Six of those copies are 21 kb, but one is 136 kb. I've opened the original and the large copy, and I see no difference in the content. This means that there are 115 kb of data hiding somewhere in the data structure of that larger copy. If these were files, I would open them in Notepad++ to see if I could find where the differences are. But I don't know how to open the full content of an e-mail like that. I ran a routine to load one of the e-mails in VBA by its EntryID and then added it to the watch window to look at its structure. That 115 kb has to be hiding somewhere in there, but I couldn't tell from this where it is.
  • Edit: It's worse than that. Another thing I've been doing to try to keep down the size of the pst is that when an e-mail is large because of embedded images, I forward the e-mail to myself with the images deleted, and then permanently delete (or so I thought) the original e-mail. By "permanently delete," I mean that I first move the e-mail to a subfolder of "Deleted Items" called "Too big." Then, every so often, I delete this folder. When I delete a folder elsewhere, it gets moved to the "Deleted Items" folder. But when I delete a subfolder of "Deleted Items", I get a message, "Delete this folder and everything in it?" and when I click "Yes", the folder and its contents disappear. But guess what? Included in the errant copies in my inbox are copies of old e-mails that I thought I had removed from Outlook in this way. This means that when I delete a subfolder of "Deleted Items", Outlook does not discard its contents, but hides them somewhere.

My question

Both the hidden attachments in the first observation above and the hidden 115 kb in the second have to be somewhere in the structure of the MailItem object. And (Edit) the hidden e-mails in the third observation must also be hidden somewhere, but I don't see any evidence of MailItem objects still existing for them. I have two questions about all this:

  • Where is this stuff hidden? Or how can I find out where it is?
  • Is there a way to actually remove it? I could trim gigabytes off the size of my pst file if all those attachments (Edit: and e-mails) that I've been removing for years could actually be removed instead of just hidden.

(Edit 2:) Second question

There's something that doesn't make sense in what I wrote above. The first and third observations tell me that Outlook never discards any data -- either deleted attachments or contents of deleted subfolders of Deleted Items, but only hides the data. For me, since I've been keeping almost all my e-mails for over ten years, I haven't been surprised to see the size of my pst file grow to over 10 Gb. And I've always assumed that other people who allow their Deleted Items folder to regularly purge would have much smaller pst files. If that's correct and if my observations are correct, then it must be that:

  • Outlook does discard the data of e-mails purged from Deleted Items.
  • Outlook does not discard, but only hides, deleted attachments and deleted subfolders of Deleted Items.

That would seem like a strange modus operandi. Is that really how Outlook works, or is there something wrong in my reasoning, or maybe something in my settings that is causing Outlook to keep deleted data?



No comments:

Post a Comment