Aggregated Data and the Threat of Re-Identification

I have written before about the risks of clauses in technology contracts giving the vendor broad and, usually, undefined rights in aggregated data of their customers.  Specifically, I have talked about the need for specificity as to what constitutes “aggregation” (e.g., combination with other customer data and no identification of any individual or entity) and requiring the vendor to assume liability for its use of the data.  Recently, however, we have seen instances where data that was thought to have been properly aggregated was, in fact, easily re-identifiable through the use of sophisticated data mining tools.  

The threat of re-identification is not new.  The drafters of the Health Insurance Portability and Accountability Act (HIPAA) went so far as to include clear standards for de-identification of protected health information (see 45 CFR § 164.514).  Similar standards should be used anytime highly sensitive data is being aggregated.  For example, a clear statement in the contract that the aggregation process must be done in such a way as to render re-identification statistically impossible would be a fine start.  The point is to have at least some nod to the fact that this risk exists and simply saying the data will be “aggregated” is no longer sufficient.

Related:

Copyright © 2012 IDG Communications, Inc.

7 hot cybersecurity trends (and 2 going cold)