• United States




Machine learning and social engineering attacks

Apr 21, 20164 mins
CybercrimeData and Information SecuritySecurity

Can analytics prevent IP theft after a successful account compromise?

In my last post I promised to use some real-world use cases from the recent Verizon Data Breach Digest report to illustrate potential ways that machine learning be can used to detect or prevent similar incidents.

For my first example, I’ve chosen the case of a manufacturer whose designs for an innovative new model of heavy construction equipment were stolen following a social engineering attack. They were tipped off when a primary competitor, located on another continent, introduced a new piece of equipment that looked like an exact copy of a model recently developed by the victim company.

To paraphrase the Verizon report, it went like this. The threat actors identified an employee who they suspected would have access to new product design they were after — the chief design engineer. They targeted their victim via a spearfishing campaign that was based on a fictitious LinkedIn profile of a recruiter. The attackers began sending emails containing fictitious employment opportunities to the victim, one contained an attachment that had a malware file embedded in the document. When opened, the malware began beaconing to an external IP address used by the threat actor. The threat actors then installed a backdoor PHP reverse shell on the chief design engineers system. The rest, as they say, is history.

As we reflect on this scenario, what intercept points could have been used to uncover the anomalous behavior occurring with the chief engineer’s account? One was the presence and availability of multiple log files containing rich information about what data had been transferred, when, by whom, and to where. These are available from intrusion detection logs, NetFlow data, DLP logs, firewall logs, anti-virus and malware reporting. By underutilizing this critical data, the victim company left itself wide open to several types of compromises.

True, not all organizations are capable of making sense of complex data from multiple sources. The volume and speed at which this data is produced can seem unmanageable. Also, the ability to bring together dissimilar data in a normalized and comparable manner may not be available to an organization. When this situation arises, it’s time for more advanced analysis with sophisticated mathematical support. Yes, I’m speaking of data normalization, analytics, and the application of machine learning.

Using machine learning can provide a more holistic view of the combined log data, and expose suspicious activity. In addition to revealing malicious command and control traffic, machine learning models can shine a light on who is accessing, storing and using data in “uncharacteristic” ways compared to normal and peer-group behavior. However, according to Sommer and Paxson detecting account compromise via machine learning poses some unique challenges.

First, security professionals typically expect an extremely low false positive rate from network security tools, which has given rise to the popularity of “whitelist” and “blacklist” approaches, which are too rigid to adapt to account compromise threats like this one. When scaled to an enterprise of 2,000 users, a one percent daily false alarm rate per user translates to 20 false alarms a day. Eventually, a tool that generates this many false positives will be ignored.

Second, when an account is compromised, bad logins are typically sparse and mixed with good behavior in such a way that an algorithm or human operator may miss bad behavior among the preponderance of good logins. The Expectation Maximization (EM) approach addresses this problem by treating the compromised account as a two-user model, in which sessions may either be produced by the original user or a new user. This approach causes benign sessions to fall out of the likelihood calculations, so that they do not sway a mix of good and bad sessions toward being evaluated as good overall.

In this particular Verizon incident, if the victim company had employed machine learning to analyze the data already in hand, they likely would have been alerted to several suspicious activities including who was accessing the designs, where the files were being stored, how and where they were being moved, non-typical access to sensitive data repositories, and several other possibilities.

Since most organizations have multiple security tools in place which are producing meaningful log data, applying machine learning algorithms to these information sources to profile user access and behavior is a logical next step.

We’re just scratching the surface. In my next post I’ll discuss insider threat and how machine learning can specifically assist with identifying  and predicting malicious activities by an organization’s trusted users. 

[ MACHINE LEARNING SERIES: Part 2 and Part 3 ]


Leslie K. Lambert, CISSP, CISM, CISA, CRISC, CIPP/US/G, former CISO for Juniper Networks and Sun Microsystems, has over 30 years of experience in information security, IT risk and compliance, security policies, standards and procedures, incident management, intrusion detection, security awareness and threat vulnerability assessments and mitigation. She received CSO Magazine’s 2010 Compass Award for security leadership and was named one of Computerworld’s Premier 100 IT Leaders in 2009. An Anita Borg Institute Ambassador since 2006, Leslie has mentored women across the world in technology. Leslie has also served on the board of the Bay Area CSO Council since 2005. Lambert holds an MBA in Finance and Marketing from Santa Clara University and an MA and BA in Experimental Psychology.

The opinions expressed in this blog are those of Leslie K. Lambert and do not necessarily represent those of IDG Communications, Inc., its parent, subsidiary or affiliated companies.