Sometimes the best defense is deletion

Information Governance experts say that while storage coast are down, there's risk -- and cost -- associated with the growing 'data lake'

Big Data is viewed as a very good thing by most enterprises. With the right analytics, it can generate meaning and business value. But like with many things there can be too much of a good thing, say a number of Information Governance (IG) experts.

Their message is that enterprises need to do more than protect their data from theft or infection -- they need to get rid of some of it, for both economic and legal reasons.

Dumping data has a variety of names, so far, including defensible disposition, defensible deletion and active expiration. Barry Murphy, cofounder and principal analyst at eDJ Group, prefers defensible deletion (DD).

What is more important than the label, Murphy wrote in a post in eDiscovery Journal, however: "Companies can reduce costs and decrease risks by proactively getting rid of unnecessary information."

Murphy told CSO Online that it is true that the cost of storage, both on-premise and in the cloud, continues to decrease. "One could argue that the decreasing cost of storage combined with lower-cost information processing platforms like Hadoop makes keeping information in perpetuity economically viable," he said. "But the rate at which information grows is faster than the rate at which the cost of storage decreases. So much corporate information is either duplicate or unnecessary that the cost of retaining it is greater than that of getting rid of it."

Jim McGann, vice president of marketing for Index Engines, said in an interview with Government Technology last year that in the past five years he had seen organizations taking steps to "clean up the 'data lake' that has been generated."

[See also: The security risks and rewards of Big Data]

The motivation is legal as well as economic, he said. Until about 15 years ago, organizations could save anything and easily hide the content that could become a liability, bu he saidt that won't work these days. "Lawyers and judges are more tech savvy and they won't accept excuses about complexity and cost issues anymore," he said.

Barry Murphy agrees. "The cost and risk of eDiscovery can poke a giant hole in any economic assessment of information management costs," he said.

The rules governing electronic information are different than those for paper documents, since it usually includes metadata, which can be important as evidence. An example is the value of the date and time a document was written to a copyright case.

This doesn't mean a company can get rid of any electronic documents it fears might create a liability. But Murphy said federal Rules of Civil Procedure give companies a so-called "safe harbor" from liability for information deleted in accordance with standard operating procedures, "as long as a legal hold process is in place to stop deletion if information may be relevant to a litigation or regulatory matter."

Murphy said that in general, "any information assets that are duplicate or have no business value would fall into the pile of 'to be deleted.'" But he said too many organizations are not yet "mature" enough to put an accurate value on information. Instead, he said, they have "time-based retention policies."

"For example, many companies delete all email in an employee's inbox after 90 days. Any email the employee wants to keep longer need to be dragged to a central archive folder where the employee can access them beyond the 90-day period."

It is better, and much more defensible, he said, to have "legal hold management," which would be enough to convince a court that relevant ESI (electronically stored information) has been preserved. The standard is reasonable effort rather than perfection," he said.

Jim McGann said he recommends that companies start small. "[It] could be with purging ex-employee data, or determining what data has not been accessed in five years and could be migrated to less expensive storage such as the cloud, or can eventually be purged," he said.

But he said it still takes setting priorities. "The highest risk data environments are typically email servers and legacy backup tapes," he told Government Technology. "Email is the most common source of evidence produced for litigation and regulatory requests. Legacy backup tapes are a snapshot of everything, including email and files."

So, he recommends creating a data map that includes things like the age of the data, last accessed or modified date, owner, location, email sender/receiver and even sensitive keywords. "A data map will deliver the knowledge required to make 'keep or delete' decisions for files and email. An actionable data map can then help you execute on these decisions and defensibly delete what is no longer required, and archive what must be kept," he said.

Insider: How a good CSO confronts inevitable bad news
Join the discussion
Be the first to comment on this article. Our Commenting Policies