• United States



Director of Data Science, ProtectWise

Artificial Intelligence: Buzzword or Bingo

Mar 23, 20176 mins

Data “at scale” used to be such a rare occurrence that most decisions were made without it. Collecting and studying data on a given problem was relegated to academic status; in the business world, human intuition reigned supreme. This was both a consequence of never having been exposed to data-driven thinking and processes, as well as the general lack of data experts and faculties with which to implement data-driven thinking. Consequently, the number of experts and analysts required by industry remained modest far into the ’80s and ’90s, given the intense cost to producing data for analysts to use, and even higher costs for the machinery to do the processing. Data analysis required painstaking work, with experts overseeing each step in the process to ensure the best quality data was procured and the most appropriate analytics were performed, because you weren’t going to get a second bite of the apple.

As we entered the Information and Computing Age, the possibilities of more data and more ubiquitous computing fomented a change in the business mindset. Human intuition was still valid, but verification with data started to seem more plausible and hence more valuable. Data volumes started their exponential trend skyward while computing costs fell as quickly, leading to explosions of possibilities based on an embarrassment of (possible) riches with respect to data science and analytics. A new set of large scale data analysis technique such as regression/generalized linear models; iterative optimization techniques and simulation-based maximizations were now well within reach, out of the larger set of what were once only theoretical possibilities. With the advent of cloud computing, the pace of disruption increased further, as now storage and computing power shifted from being a capital cost to an operational cost, thereby enabling the economics to more directly justify further investment. Computing and storage could be scaled to suit the sizes required for the task at hand and subsequently decommissioned when no longer required. The data and machines were plentiful, but a lingering sense of the past that required a human to be the arbiter of the information contained in such data continued to create a bottleneck. The data continued to grow in both size and complexity, and the experts required to analyze it were soon outnumbered and outgunned.

Emboldened by the rapid declines in cost-of-computing, data-oriented industries came up with a new strategy – if humans are the bottleneck, can’t we train machines to do the human’s work when it comes to analyzing data, and remove the need for the high-cost, highly-trained and hard-to-find human arbiters? Thus began the search for “Artificial Intelligence” or AI as a means of systematizing and automating data analysis, and with it the veritable hype-tsunami that is sweeping over InfoSec. Automated sentinels marching through your networks, mining every byte on their own and training themselves to interdict even the most novel threats automatically! Machine Learning, Deep Learning, and their kin promise to set forth a new era in which machines autonomously structure data and extract information. It certainly sounds enticing in theory, and those who would look to be successful in this space realized that a message of “We Make Data Complexity Simple Again” strikes a profitable chord with an analytically-overwhelmed industrial landscape.

Unfortunately, the consequence of this has become a game of Buzzword Bingo that leaves most people confused when assessing what is really true or just hype when it comes to data analytics in this new age.

For example, Forrester highlights machine learning platforms, deep learning platforms, AI-optimized hardware and decision management as some of the most important AI technologies – but where do you start? 62% of enterprises will use AI technologies by 2018 so does that mean you are falling behind if you aren’t? Not necessarily. “AI technologies” is a broad term that includes predictive and prescriptive analytics, automated written reporting and communications, and voice recognition and response in addition to the more familiar machine learning. There is so much buzz that Gartner has coined Algorithmic Business to describe the shift of digital businesses from big data to AI and VC funding in private AI companies reached an $1B in Q2 2016, an all time high. No wonder “vendor overload” is a cause of CISO burnout!

Step back from the precipice for a moment, and you’ll quickly realize that the basics of data analysis that were true back when data was a precious resource are still true today:

  • Collecting quality inputs and understanding how data collection affects and possibly biases the resultant data set;
  • Constructing information-rich and analytically tractable model features, and applying appropriate analytical and statistical techniques based on the known structure of data collection and any inherent biases, the structure of the constructed features, and the outcome of interest;
  • Assessing model fit for errors that may undermine the model’s internal and/or external validity, and when required, assessing its cross-validated/out-of-sample performance to ensure predictive accuracy;
  • Monitoring ongoing model performance in situ (including the path of learning for models that are continuously updating) to understand period-to-period evolution of the model and to ensure continued stability of the model fit and performance.

These concepts are as real and valid in producing accurate analytical results for a machine as they would be for a human. Realizing this gives one a strong basis on which to try and demystify the verbiage that is ever-present in data-oriented industries today.

Machine learning is not going to magically solve all your data analysis and modeling problems and obviate away the need for human structuring, design or supervision of tasks in InfoSec. The role of experts may be migrating to a higher-level set of functions, such as what types of features to look for in the data, how algorithms are constructed and how they reason about data, recognizing the constraints of such algorithms and guiding their application in threat detection constructs. That notwithstanding, these tasks are indispensable and unavoidable, and the machines cannot and should not diminish the necessity or importance of having such experts on hand as a result. Organizations who can formulate cross-functional relationships between humans and machines and more coherently grasp the fundamentals of extracting information from data will be rewarded with deeper insights at greater scale than ever before.

Director of Data Science, ProtectWise

Matt is an experienced analytics professional who uses statistically guided thought processes to find optimal and actionable solutions to problems. At ProtectWise he heads up the Data Science team, which is responsible for analytics & reporting on a petabyte-scale Data Warehouse, as well as algorithmic and threat-detection research with specific focus on anomaly detection methods and threat classification models. Prior to joining ProtectWise, Matt led Data Science and Analytics at several startups and established organizations within the Ad-Tech and eCommerce arenas. Prior to entering the Data Sciences space, Matt was an Equity Analyst with Janus Capital, with primary financial research responsibilities ranging across consumer products and retail, payment processing, auto & auto parts, and several other industries. Matt was a PhD candidate and has a Master’s in Statistics from Harvard University, and graduated Magna Cum Laude with a Bachelor’s in Mathematical Social Sciences from Dartmouth College.

More from this author