• United States




The life and death of a document: where did it really go?

Feb 07, 20185 mins
Application SecurityData and Information SecurityEmail Clients

As the Moody Blues would say, what became of that letter you never meant to send?

email iot internet security
Credit: Thinkstock

I’m a curious person, as I bet you are, too. I wonder about a lot of things, not dark matter, but simple things, like where are my documents? What happened to them? Where have they gone? A cursory scan of your hard drive may reveal the sobering truth: you have many old documents you barely remember. By some online analyses, you have thousands of documents, most unopened for years, and a relic of your past work. My hard drive reveals documents well beyond two decades old, some whose contents I have little memory of, but fodder for future data archeologists perhaps. But the real question is where else may they be?

The birth of a document life cycle

You sit ready to generate another brilliant work and press New. Ever wonder what you have just started? From version to version, to ultimate commitment of a complete first final draft (like the document you are now reading) you have just started a document on its long-lived journey with no end in site where it goes and who reads it. When you press Save, the journey has just begun to your hard drive. But what about your browser cache, or the word processor and your time machine backups? That one document is copied and stored in locations you don’t generally observe. But they are there.  

By some analyses, each laptop or desktop has thousands of dusty documents sitting at rest. Long forgotten, and never really gone. When you press Delete, do they really disappear? Remember, caches and backups store data, too. Delete doesn’t generally flush that document from all its hiding places. Worse, perhaps, is if that document was emailed, or uploaded, all bets are off.

Consider a very real example. In your laptop, your TAX folder likely has subfolders named /2010, /2011, …, with last year’s /2016 folder, soon to be joined with a new folder you are about to create, /2017. (It is near tax time, so my apologies for reminding you of the upcoming un-pleasantries.) A peak inside /2016 reveals 1040_TY2016.pdf, alongside various spreadsheets and receipts. That 1040 document is the key to your financial privacy, yet it sits in your laptop at rest and unprotected. Who last opened that document? Who read your secrets? Clearly, the US IRS and your accountant both have copies, but does anyone else? Would you like to know the answer to that mystery? Is it possible to know who last read that document?

Send is the end of your control

As the Moody Blues would say, what became of that letter you never meant to send? Off it went in your email to a trusted recipient. Your company may have invested heavily in DLP technology to watch if their sensitive documents you created were inadvertently sent to your home email, or worse a competitor. But some of those sensitive documents you legitimately provided to your company’s trusted law firm, or accounting firm, or intellectual property counsel’s inbox, all covered under a corporate NDA agreement. No problem.  All legitimate. But do you really know where they went from there? In the trusted third party’s zeal for efficiency, they hired temporary workers who likewise in their zeal to do a good job, read them at home on those laptops they brought on vacation. And there are those pesky local caches again, even if they deleted the documents, the cache went on vacation, too.

So where are they now?

Most of the document files I’ve kept the past twenty years hardly interest me, much less anyone else. But we all have documents that matter and I’m sure, like me, you’d feel a sense of panic or embarrassment if your half-finished memoir or legal files ended up elsewhere.  But beyond tracking down your personal files, companies are on the hook for new compliance regs that control the flow of data. The issue is more than just satisfying curiosity, it’s a significant new liability in view of existing compliance requirements, and worse, the upcoming GDPR regulations. Your company of course must comply and risk the stiff penalties of GDPR, if it has any business in the EU.

So, do you really know where your documents go, once you press Send?

Tracking a document with beacons

Documents can now be tracked with beacons. A beacon is an object embedded in a document, that survives editing, that signals home when the document is opened. Think of it as GPS for your data! There are many ways of implementing a beacon, some that challenge the ambiguity of various laws and regulations, others that are entirely legal.

Legitimate readers can freely review the document, but illegitimate readers can be revealed. You can know where your documents went, and how many copies were made.  Imagine that. Your documents may flow on the internet for days, weeks or more, and the curious can know where they are.  Each document becomes its own Voyager roaming endlessly on the net. This eye opening new technology can finally answer the question, where did my document really go? Who actually read my 1040_TY2016.pdf? I really want to know the answer to that question.


Salvatore Stolfo is a tenured Columbia University professor, teaching computer science since 1979. He is the co-founder and CTO of Allure Security, a DARPA-funded cybersecurity startup specializing in data protection and the prevention of data breaches.

Dr. Stolfo is a people-person. And that makes him unique in a field where folks focus on making machines. As professor of artificial intelligence at Columbia University, Dr. Stolfo has spent a career figuring out how people think and how to make computers and systems think like people. Early in his career he realized that the best technology adapts to how humans work, not the other way around.

Dr. Stolfo has been granted over 75 patents and has published over 230 papers and books in the areas of parallel computing, AI knowledge-based systems, data mining, computer security and intrusion detection systems. His research has been supported by numerous government agencies, including DARPA, NSF, ONR, NSA, CIA, IARPA, AFOSR, ARO, NIST, and DHS.

See his full academic bio at Columbia University for more background.

The opinions expressed in this blog are those of Salvatore Stolfo and do not necessarily represent those of IDG Communications, Inc., its parent, subsidiary or affiliated companies.