At the recent RSA Conference it was virtually impossible to find a vendor that was not claiming to use machine learning. Both new and established companies are now touting “machine learning” as a major component of the data science being used in their products. What the heck is machine learning anyway? And is it really going to reshape cyber security in 2016?
For brevity’s sake, I’ll define machine learning as the science of getting computers to act without being explicitly programmed. Over the past decade, machine learning has enabled self-driving cars, practical speech recognition, effective web search, and has vastly improved our understanding of the human genome. Machine learning is so pervasive today that we use it dozens of times a day without knowing it. Many researchers also think machine learning is the best way to make progress towards human-level Artificial Intelligence.
[ MORE MACHINE LEARNING: Machine learning: Cybersecurity dream-come-true or pipe dream? ]
With those concepts in mind, is “SkyNet” calling? Not quite, yet the incredible power of machine learning and its application to analytics is reshaping cyber security as we know it.
In particular, applying machine learning to behavioral analytics is profoundly improving our ability to make sense of the volumes of data generated by security products in the average enterprise. When machine learning concepts like automated and iterative algorithms are used to learn patterns in data, we can probe data for structure, even if we do not know what that structure looks like.
In the past, security products attempted to ‘correlate’ data to discern patterns and meaning. We now know better, and more. Instead, today we perform link analysis to evaluate relationships or connections between data nodes. Key relationships can be identified among various types of data nodes or objects, things we might think of as organizations, people, transactions, and so on.
Machine learning is what enables us to bring together huge volumes of data that is generated by normal user activity from disparate, even obscure, sets of data -- to identify relationships that span time, place and actions. Since machine learning can be simultaneously applied to hundreds of thousands of discrete events from multiple data sets, “meaning” can be derived from behaviors and used an early warning detection or prevention system.
The ultimate test for a machine-learning model is validation error on new data. In other words, machine learning is looking to match new data with what it’s seen before, and not to test it to disprove, reject or nullify an expected outcome. Since machine learning uses an iterative, automated approach, it can reprocess data until a robust pattern is found. This allows it to go beyond looking for “known” or “common” patterns.
In my next few posts, I’ll use real-world use cases from the recent Verizon Data Breach Digest report to illustrate potential ways that machine learning can used to detect/prevent similar incidents in the future.
This article is published as part of the IDG Contributor Network. Want to Join?