• United States




Biometrics can provide better data provenance

Mar 31, 20174 mins

Determining who created, touched or authorized data transformations plays a critical role in transparency and auditing

Imagine these scenarios:

  • Insiders at a financial institution place transactions using e-execution and then deny involvement when trades lose money.
  • Regulated individuals share secrets and collude to fix pricing via messaging services.
  • Fraud occurs through re-diverted funds within Treasury departments.
  • Funds are embezzled or re-directed for personal gain.
  • Confidential data is accessed for market price fixing, front running or gaining market advantage
  • Executives request staff members to access confidential or highly secure content to create a more simplistic briefing process.
  • Data is accessed and leaked for personal benefit.

The common denominator to every one of these scenarios is individuals denying their involvement or abdicating responsibility in a transaction. These types of acts are happening every day across virtually every industry — pharma, finance, the public sector — costing companies incredible amounts of money to investigate and putting operating licenses at risk.

Each of these scenarios also illustrate how critical legal non-repudiation is for organizations that want to provide end-to-end transparency — and the important role that authentication plays in all transactions.

A robust authentication system should provide at least three MFA (multi-factor authentication) options: what you know (e.g., passwords), what you have (e.g., tokens), and what you are (e.g., biometrics). Passwords and tokens are insufficient by themselves because they cannot authenticate who requested authorization and record the associated identity of the requestor.  If the trader in the scenario had authenticated the trade with his fingerprint, there would be no question about its ownership.

These types of issues — transparency and the role of authentication — are becoming more and more prevalent for enterprises. Most fall under the heading of data provenance. Data provenance is “showing your work”: the entire historical record for any piece of data, stored and searchable.

A key driver of this are new financial transparency regulations in Europe: the GDPR (General Data Protection Regulation), scheduled for implementation in May 2018, requires all banks and financial organizations to keep a clear record of their data and associated metadata in order to show the lineage for financial info (transactions, trades, etc). The GDPR’s controversial “right to explanation” mandates that “data subjects receive limited information (Articles 13-15) about the logic involved, as well as the significance and the envisaged consequences of automated decision-making systems.” Extensive data provenance facilities will be necessary to support GDPR requirements including authentication decisions regarding who or what entities where involved in transactions such as money transfers, currency exchanges, fraud alerts, and loan approvals.

Historically, there have been three issues with capturing and storing the whole provenance of any and all data elements:

  1. The storage requirements and costs of data and associated metadata was prohibitive
  2. The lack of software and computing power to mine the data for insights
  3. No reliable security infrastructure to securely record data lineage

We are overcoming all of these issues. First, storage is cheaper and more available on a global scale. The advent of the cloud makes it easier than ever before. Second, data mining is so commonplace these days it’s practically textbook; and the clear benefits of analyzing data for insights means businesses are seeking out ever-more opportunities to add value to their organization.

Finally, we are now in fourth generation of big data infrastructures that can securely record provenance information and use multi-factor authentication, including biometrics.

The Hadoop ecosystem keeps evolving rapidly thanks to the contributions from the open source communities and vendors like Cloudera and Hortonworks. In the early days of Hadoop, anyone could run Hive or Pig queries against any dataset. New access control technologies for the Hadoop ecosystem, like Apache Knox, offer integration points delegated authentication including biometrics.

Next generation big data platforms like Pachyderm offer native provenance mechanisms that can be coupled with biometric authentication to fine-grain data-level authorization, access control and provide signed transactions for non-repudiation.  Development of this infrastructure is growing quickly to support the coming need for rich audit, data purging and explanation systems needed for regulatory compliance.

Given the high stakes of its data and the tight regulations around its practices, the financial industry is particularly impacted by the need for data provenance, but other industries should also take note, as a clear, searchable record of all transactions has important applications for everything from property deeds to birth records. And when it comes to tracking sensitive data, from a trader’s high-value transaction to an average ATM withdrawal, biometrics are the key to providing the legal non-repudiation necessary to meet transparency requirements because they are the ideal way to track and confirm identity throughout a data’s lineage. 

Biometrics is poised for growth for many reasons over the next few years, but regulations around data provenance are a clear driver.


John Callahan, Chief Technology Officer at Veridium, is responsible for the development of the company’s world class enterprise-ready biometric solutions, leading a global team of software developers, computer vision scientists and sales engineers. He has previously served as the Associate Director for Information Dominance at the U.S. Navy’s Office of Naval Research Global, London UK office, via an Intergovernmental Personnel Act assignment from the Johns Hopkins University Applied Physics Laboratory. John completed his PhD in Computer Science at the University of Maryland, College Park.

The opinions expressed in this blog are those of John Callahan and do not necessarily represent those of IDG Communications, Inc., its parent, subsidiary or affiliated companies.