Mesh VPNs explained: Another step toward zero-trust networking

Organizations are taking a closer look at mesh virtual private networks as they must support and secure more remote network connections.

Mesh VPN software definition

Mesh VPNs use a peer-to-peer architecture where every node or peer in the network can connect directly to any other peer without going through a central concentrator or gateway. This approach can be less expensive and easier to scale than a traditional VPN.

Mesh VPNs are not a new concept, but it has taken a long time for them to mature and expand beyond a niche use. Until a few years ago the VPN needs of most organizations were perfectly met through a traditional hub-and-spoke architecture. Most corporate firewalls and gateway security products include VPN functionality and that was convenient for most companies who only had a few employees working remotely.

The move to hybrid cloud-based infrastructure and the growing remote workforce has finally put mesh networking solutions on the map. This started with the need to connect VMs and nodes running in different clouds, a technology commonly referred to as a "service mesh," and now is expanding to connect traditional endpoints such as laptops and mobile phones.

"I believe that in the long term, the distinction between service meshes and mesh VPNs will blur, as both products are working to solve the problem of moving packets securely and privately between devices." David Crawshaw, CTO and co-founder of mesh VPN startup Tailscale and former Google software engineer who worked on distributed systems and experimental infrastructure projects, tells CSO. "The traditional distinction is whether the device is virtual (a VM or container) or physical (a phone or laptop or server), and that distinction is getting blurrier."

Why mesh VPNs are becoming more popular

The COVID-19 pandemic has profoundly changed IT operations in most companies, forcing them to adapt to a new work-from-home reality almost overnight. For some enterprise IT teams this has meant accelerating existing digital transformation plans to support remote work, while for others it meant scrambling to identify and deploy new solutions.

Large content delivery networks and cloud vendors including Cloudflare, Akamai and Google now offer remote access solutions that allow companies to make internal applications available through the browser to remote workers while enforcing strong access controls and identity and security checks. There's one problem though: Most of those products work only for web applications, meaning that apps and services that require other protocols to communicate need to be accessed via VPNs.

Traditionally, VPN architectures have followed a hub-and-spoke model where a VPN gateway is the central hub to which all clients — the spokes — connect. VPN gateways can also be connected to each other to achieve a multiple-hubs-with-multiple-spokes design. That's often the case in practice since each corporate office or branch has its own VPN gateway, but they still represent a choke point in the network architecture.

VPN connections use encryption, which is a computationally intensive operation, so VPN gateways are hardware devices built with enough CPU power and RAM to support a certain number of simultaneous users and connections. If a company suddenly needs to support a very large number of remote users, like what has happened during the pandemic, that company might need to completely replace its VPN gateway with a more powerful one or add additional VPN servers. Many VPN solutions also come with a per-seat license fee, so at the very least the company would need to buy more seats.

The internet bandwidth available to the company and implicitly to the VPN gateway can also limit how many concurrent users can be supported. That's why many organizations often don't route their workers' entire internet traffic through the VPN, potentially leaving them exposed when they use public Wi-Fi in insecure locations.

Another problem is that not all IT resources and applications someone needs to access are on premises in the company's office. They can be running on a virtual server in the cloud or even on a coworker's laptop if it's a test version of an app they're working on and want to share. In those scenarios, the user needs to first go through the company's VPN gateway, which can be located in a different city or region, and then back out to the end server through the VPN link between that server and the VPN gateway. This adds a lot of latency to the connection and severely impacts performance.

Enter peer-to-peer mesh VPNs, where every node can connect directly to any other peer without the need for a central concentrator or gateway. This is how the internet was originally designed, with organizations and users just adding additional nodes to the network -- there weren't many concerns for security back then.

How mesh VPNs work

Over time, the internet developed a backbone made up of tier-1 telecom companies, cloud vendors and content delivery networks that are linked to each other at high-speed internet exchange points. Also, the limited pool of public IPv4 addresses has led to greater use of network address translation (NAT) including at the ISP level, making the internet increasingly resemble a hub-and-spokes architecture. This is something the proponents of IPv6 are hoping to reverse.

Some hardware VPN gateways support multipoint-to-multipoint configurations and they can work similar to mesh networks, but their configurations need to be constantly managed and updated when changes to clients and nodes occur; and in the world of services and apps running in VMs in multiple clouds, this happens quite often.

"Networks rearrange constantly," Crawshaw says. "VMs get moved between data centers, phones move from the office to the coffee shop. A company can have an office for 30 years and build out dedicated network hardware on premises or rent an office by the month. Commonly when the term 'mesh' is used it implies automatic configuration of endpoints. That is, if you move a device on a network, it should re-establish communication with other devices on the mesh network without the intervention of IT administrators."

Mesh VPNs are implemented through software so in that sense they are software-defined networks, although SD-WAN (software-defined wide-area network) has become a term that often refers to solutions designed to simplify and automate central management and control of traditional networking equipment in large datacenter and telecom infrastructure projects.

Popular mesh VPN solutions

One of the first open-source VPN daemons designed for mesh networks is Tinc VPN, which dates back to 1998. It works on nearly all major operating systems including Windows, Linux, BSD and macOS and supports automatic full mesh routing, NAT traversal and bridging Ethernet segments. Tinc served as an inspiration for many of the later projects.

A newer open-source mesh VPN solution that also has a commercial business behind it is ZeroTier and was set up in 2015. ZeroTier describes itself as "a smart programmable Ethernet switch for planet Earth" that has most of the capabilities of an enterprise SDN switch sitting on top of and controlling a peer-to-peer network of apps and devices spread across local networks or the internet. The result is a mesh VPN with a network hypervisor that provides enhanced management and monitoring capabilities, automatic peer discovery and configuration and traffic encryption through a custom protocol.

Last year, software engineers from Slack released another open-source mesh VPN project called Nebula, which is built on top of the Noise Protocol Framework. They refer to Nebula as a "scalable overlay networking tool" designed to create a mutually authenticated peer-to-peer software-defined network that can link thousands of nodes running at multiple cloud service providers in various locations around the world. Nebula uses certificates issued by a certificate authority to identify nodes as well as some of their attributes like IP addresses, name and membership within user-defined groups.

"Most cloud providers offer some form of user-defined network host grouping, often called 'security groups,' which allow you to filter network traffic based on group membership, as opposed to individually by IP address or range," the Slack engineers wrote in their announcement of Nebula. "Unfortunately, as of this writing, security groups are siloed to each individual region of a hosting provider. Additionally, there is no interoperable version of security groups between different hosting providers. This means that as you expand to multiple regions or providers, your only useful option becomes network segmentation by IP address or IP network range, which becomes complex to manage. Given our requirements, and the lack of off-the-shelf options that could meet our encryption, segmentation, and operational requirements, we decided to create our own solution."

Another newcomer in this space is Crawshaw's Tailscale, which is built around the new and highly performant WireGuard VPN protocol that made its way into the Linux and OpenBSD kernels this year. Tailscale doesn't use the WireGuard kernel code yet, but a userspace implementation of the protocol written in the Go programming language is currently available for Windows, Linux, macOS, iOS and Android.

Most of Tailscale's code is open source and its business model is centered around running a central directory service or coordination node that's used for automatic peer discovery and configuration and allows administrators to manage role-based access controls and logging. For large enterprise deployments, Tailscale offers the option of running on-premises coordination servers, but there is no actual user traffic passing through these servers, unlike a traditional VPN gateway. They are just used to exchange configuration information.

Identity and zero-trust networks

Zero-trust networking is seen as the future of enterprise networks. It's an architecture where the identity of every user and device is verified before trust is assigned and access is granted to corporate resources. In most traditional corporate networks, devices located on the internal network can connect to servers and services just because they are on the same implicitly "trusted" network, which is why hackers are so successful at moving laterally through networks.

Most mesh VPN solutions draw their inspiration from Google's BeyondCorp project and other zero-trust networking concepts and place a big focus on device or node identity verification. In ZeroTier, Nebula and Tailscale, node identity is done at the IP layer through some form of public-key cryptography.

"Traditional physical networks do not provide any notion of identity, and we are so used to that idea that modern work on bringing identity to network traffic has always started at higher levels of abstraction [for example HTTPS/TLS]," Crawshaw says, "but that’s not necessary. It is possible using network tunnels and modern cryptography to ensure that a packet’s IP source address can describe precisely who really sent it. The advantage of moving this concept to the IP layer is that it becomes compatible with existing software. You can take existing internal tools and move them slowly onto private virtual networks where the identity of all senders and receivers is known, slowly turning off access via traditional networks. You don’t have to rewrite all your software to be identity-aware."

The WireGuard protocol, for example, introduces the concept of cryptokey routing, where a node's public key is tied to a list of IP addresses that node is allowed to have inside the VPN tunnel. This means there is no possibility of node impersonation on the network if the node's private key remains secure. This gives network administrators and security teams the power to ensure certain applications or resources can only be visible and accessible on the mesh network to very specific devices and users.

On top of the device identity checks performed at the protocol level, mesh VPNs can also perform user identity checks. Tailscale, for example, supports integration with the most common identity providers already used by enterprises including Google, Microsoft, Okta and SAP and supports multi-factor authentication through them.

Companies can also optionally combine a mesh VPN solution to access non-HTTP applications and services with a zero-trust access gateway like those from Akamai, Cloudflare or Google for access to their Web applications. These solutions also perform device security checks through the browser or through a separate lightweight client installed on endpoints.

Copyright © 2020 IDG Communications, Inc.

Subscribe today! Get the best in cybersecurity, delivered to your inbox.