Who wins in a world of 100% encrypted traffic?

With advances in artificial intelligence, security doesn’t have to come at the cost of privacy.

2 encryption
Thinkstock

According to Google, over seventy-five percent of requests to its servers are encrypted and Mozilla reports that over 55 percent of the web pages loaded by its browser are encrypted. Either way you slice it, the majority of internet traffic is encrypted.

Internet traffic encryption has consistently been increasing, especially as video streaming providers turned on encryption. Data breaches are at an all-time high, internet eavesdropping by governments and criminals is widespread, and failure to protect sensitive information can cause irreparable damage.

Cybersecurity leaders have worked hard to secure mobile devices and cloud access. Now enter wearables, environmental sensors, surveillance cameras, location beacons and the rest of the internet-of-things (IoT). From the very start of IoT, security researchers have sounded the alarm, but the parade of embarrassing security events marches on, including the Mirai botnet last October.

To decrypt or not to decrypt, that is the question

As websites and applications encrypt more traffic to preserve data privacy, the reaction of some part of the infosec community is to attempt to decrypt internet-bound traffic in order to inspect it for inbound malware, outbound command & control or data leakage. But the increased proliferation of different kinds of encryption protocols is making pervasive decryption more and more difficult.

Many organizations do still attempt to terminate and decrypt internet-bound HTTPS traffic by using forward proxies to “man-in-the-middle” the connections. But the use of strong encryption – including longer keys, hardware-based crypto and biometrics – compounds the challenge of decrypting traffic in a manner that does not impact user experience by adding delays or requiring large investment in decryption capacity.

Meanwhile, several major online services and many mobile applications use certificate pinning to prevent man-in-the-middle attacks. Certificate pinning encodes specific root certificates into the application instead of trusting the device’s built-in (and extensible) list of trusted root certificates. Attackers’ attempts to insert themselves between the client and server are thwarted, but so are organizations’ forward proxies.

Separately, there has been a longstanding argument by members of law enforcement communities that the “good guys” should be given some form of an encryption backdoor which would allow them to break the crypto through the use of a master key. However, adding master key capability to a crypto algorithm inherently weakens it and makes it easier to find and crack keys. And given the historical failure of governments to protect data – including important keys – there is great skepticism about how long any government-level master key would remain secret.

Meanwhile, attackers use the presence of lots of benign encrypted traffic to blend in and obfuscate their communications to the internet.

Artificial intelligence can read between the lines

Advances in artificial intelligence (AI) are fueling the next generation of threat detection products. Using underlying AI technologies like machine learning, data science and deep learning to identify and stop threats automatically reduces the need for increasingly ineffective measures like deep packet inspection, sandboxing and content analysis to identify hidden threats.

Different applications and processes have unique communication patterns, and attacker traffic often looks very different from user traffic regardless of whether it is encrypted or not. The timing and duration of communications, packet sizes and gaps between packets are telling. Instead of prying off the lid and looking inside the encrypted stream, sophisticated math is used to find signals that indicate a threat.

While signatures can only key off the setup of the encrypted exchange, deep learning can be used to train up a neural network based on the time series of the communication and can then calculate the likelihood of any communication being a command and control channel.

Artificial intelligence also gives a threat detection system the brains to learn and put its discoveries in context, so it can pinpoint threats that pose the greatest risk to individual organizations.

Time to learn

Reaching this level of artificial intelligence requires both supervised and unsupervised machine learning.

With supervised machine learning, data scientists analyze massive volumes of previously categorized network traffic samples to identify distinguishing features that are common to attacker behaviors.

A threat detection platform can, for example, recognize command-and-control communications that are pulling instructions from command and control infrastructure it’s never seen before – even if the IP address and domain are trusted and have a high reputation.

Unsupervised machine learning focuses on finding outliers in unlabeled data and can also derive a baseline against which to judge future traffic. Given its noisy nature – it will show you what’s different rather than what’s bad – it should be used judiciously in areas judged to be of high value to attackers.

Threat detection with a brain

With data science and machine learning at the core of next-generation threat detection, organizations can detect and stop threats that have never been seen before – even if the traffic is encrypted.

Organizations no longer have to choose between keeping their data private with encryption and weakening their defenses by allowing attackers to hide amid their benign traffic.

This article is published as part of the IDG Contributor Network. Want to Join?

NEW! Download the Fall 2018 issue of Security Smart