Republican data analytics firm exposes voting records on 198 million Americans

Deep Root Analytics left 1.1 TB of data unsecured on an Amazon S3 account

Election 2016 teaser - Republican vs Democrat
Thinkstock

Researcher Chris Vickery has discovered nearly 200 million voter records in an unsecured Amazon S3 bucket maintained by Deep Root Analytics (DRA), a big data analytics firm that helps advertisers identify audiences for political ads.

The data was discovered on June 12, and secured two days later after Vickery reported the incident to federal regulators.

Salted Hash has been down this road before with Vickery, who is now a Cyber Risk Analyst for UpGuard. In 2015, Vickery discovered 191 million voter records being stored in an unsecured database.

At the time, determining who was responsible for the exposed database was nearly impossible, as everyone contacted denied any association. The records, sourced form a Nation Builder data set, were never officially claimed.

Now, with his most recent discovery, Vickery has once again highlighted the risk surrounding big data analytics and cloud storage. The records discovered on June 12 are a complete voter file. UpGuard says the records were compiled by DRA and at least two other contractors, Target Point Consulting Inc. and Data Trust.

Voter information, aside from a few elements protected by law, are public record. Sometimes access to this data can be expensive, other times it's freely available. Yet, the golden rule tends to be that the information can't be used for commercial purposes.

However, there are laws in some states that affect the type of data discovered this month. California restricts voter information to political purposes only, and the information may not be made available to people outside of the U.S. South Dakota addresses repositories like this directly, stating that voter information "may not be placed for unrestricted access on the internet."

In 2015, Data Trust denied any connection to the exposed records discovered by Vickery. This time however, folder within the DRA S3 bucket ("data_trust") contains a massive collection of personal information representing between 150 to 198 million potential voters. Salted Hash has seen an example voter record, and many of the profile fields are similar to those from two years ago.

Using an internal "RNC ID" – each voter in the database can be uniquely identified and associated with the logged data points.

Each record contains the voter's first and last name, home and mailing address, date of birth, phone number, party affiliation, ethnicity, voter registration data, and a flag should the person appear on the federal Do-Not-Call registry. However, the voter file also includes fields for modeled ethnicity and modeled religion, suggesting the usage of big data to predict answers when voters didn't offer such details directly, explained UpGuard's Dan O'Sullivan in a blog post.

Overall, the collection discovered by Vickery contains information on 2008, 2012, and 2016. However, the 2016 records only contained details on voters in Ohio and Florida.

Another folder in the S3 bucket Vickery discovered is from Target Point. The records in this data set used the same "RNC ID" and had update timestamps as recent as January 2017. According to O'Sullivan, the records "provide a rare glimpse in to a systematic large-scale analytics operation."

"The result is a database of frightening scope and intrusiveness into the modeled personal and political preferences of most of the country – adding up in total to an unsecured political treasure trove of data which was free to download online," O'Sullivan added.

Many of the Target Point records were focused on post-election data, conducted around President Trump's inauguration earlier this year. For example, one 50 GB file contained scores for potential voters, signifying their potential to support a given policy, such as President Trump's foreign policy stance of "America First", or how concerned they'll be with auto manufacturing as an issue.

Same Risk, Different Year:

This is the same risk from two-years ago.

Using the "RNC ID" in this most recent discovery it is possible to map a voter to a geographic location, or political topic or stance. The data contained in the voter profiles, compiled with both direct information and data science, is astonishingly accurate too.

Salted Hash has reached out to DRA for comment, but prior to the discovery by Vickery, it isn't clear if anyone else accessed this data. As such, if someone malicious discovered the records first, the socially-based attack vectors commonly seen in general Phishing and more serious targeted attacks can be seriously enhanced with this type of information.

Another issue is the usage of Amazon as a whole. The platform offered by Amazon is a boon to organizations both large and small. However, organizations are still responsible for the security of their own accounts.

Amazon offers the tools and the guidance to help secure things, but it's up to the organization to follow the suggestions and use the tools properly. It isn't clear how the DRA data ended up in the public, but there's something to be said about the need for constant internal checks against information leaks.

Especially if said information is critical to your organization.

Update:

Deep Root Analytics issued a statement about Vickery's discovery. Their statement is printed below in full:

"Deep Root Analytics has become aware that a number of files within our online storage system were accessed without our knowledge.

Deep Root Analytics builds voter models to help enhance advertiser understanding of TV viewership. The data accessed was not built for or used by any specific client. It is our proprietary analysis to help inform local television ad buying.

The data that was accessed was, to the best of our knowledge this proprietary information as well as voter data that is publicly available and readily provided by state government offices.

Since this event has come to our attention, we have updated the access settings and put protocols in place to prevent further access.  We take full responsibility for this situation.

Deep Root Analytics maintains industry standard security protocols. We built our systems in keeping with these protocols and had last evaluated and updated our security settings on June 1, 2017.

We are conducting an internal review and have retained cyber security firm Stroz Friedberg to conduct a thorough investigation.  Through this process, which is currently underway, we have learned that access was gained through a recent change in asset access settings since June 1.

We accept full responsibility, will continue with our investigation, and based on the information we have gathered thus far, we do not believe that our systems have been hacked."

This story was also covered by Gizmodo and ZDNet.

New! Download the State of Cybercrime 2017 report