Inside Atlassian's zero trust implementation

CISO Adrian Ludwig says Atlassian's zero-trust implementation was nearly complete when the pandemic hit. His advice: Define policies to cover all cases first.

adrian ludwig 1200x800px

The COVID-19 pandemic forced many companies to adapt to a new work-from-home reality.

We're now past the initial shock of the COVID-19 pandemic, which forced many companies to quickly adapt to a new work-from-home reality. In the early days, business continuity, often at the expense of security, was the priority in decision making around remote access. Today, business leaders are realizing that employees will continue to work from home—perhaps permanently—so remote access needs to be both scalable and secure to protect business data.

To that end, some organizations are transitioning to zero-trust network architectures where access and trust is granted based on device and user identity, security posture and user roles rather than location. One such company is Atlassian, a provider of development and collaborations tools such as Jira and Trello.

CSO spoke with Atlassian’s CISO, Adrian Ludwig, about its zero-trust initiative.

How has Atlassian tackled remote work challenges?

Ludwig: I've been with Atlassian about two and a half years, and I spent most of my career in what I think of as platform security or product security, so most of our focus is on making sure that our products are secure. That also means recognizing that a key part of the security of our products is the security of the environment within which we created those products.

So, zero-trust networking, more robust enablement of multi-factor authentication—two of the things that Google spearheaded several years ago—are definitely things that we tackled. We had been thinking about it already at Atlassian, but we really put it in the front of our efforts over the last two years and made sure that we got it done.

Then, obviously, we are a development organization of about 3,000 developers at this point who are globally distributed. We use agile methodologies, and so we use our own tools for almost everything, with the exception of maybe email and some of the accounting things. Thinking about how to use our tools to enable developers to be as efficient as possible and then thinking about how to make sure that we're as secure as possible is definitely at the forefront.

How did Atlassian approach the implementation of zero-trust networking?

Ludwig: In many ways the path that Atlassian went down in terms of implementing zero trust was similar to Google's from a rollout and strategy standpoint. Where it was quite different is that Google had a very web-centric approach, whereas our approach was much more at the network layer. We were using a VPN concentrator to route traffic to identify which services should be available. Theoretically, the result is very similar.

The biggest thing for us was to recognize that there were different types of devices that people were going to be connecting with. Therefore, different tiers of service were appropriate. There are some things like access to the internet, where if someone were to come into one of our offices—which doesn't happen very much these days, but back in the day it was something common—it should be easy for them to connect to that network [or] get access to internet services without going through silly processes like getting a guest password. So, any device is able to connect to that open network.

There is another set of services for which general accessibility by employees was very desirable—things like email, Slack for communication. We put those applications onto what we called a low tier and there are specific requirements about what types of devices are allowed to be connected to that low tier and device association with a known human is part of the checks we do.

Mobile devices that are BYOD devices—which is what all our mobile devices are—are able to connect to those services, but we know that a specific person is associated with that device. So, they should be connecting only to the particular instance of Gmail that's associated with their device.

One of our big concerns is lateral movement from one device to another, or the possibility of stealing a token and trying to use it on a different device. So, we have [security] posture checks—even for BYOD devices—that are a requirement as part of connecting to those low-tier services.

How do you perform the security posture checks?

Ludwig: We use an MDM agent and require the user to associate the device with themselves and with an MDM policy. In that sense, it's not a cutting-edge implementation. It's a straightforward, simple implementation, but that's how we go about it.

Ours is a more classic MDM that implements a full policy that affects the device. There are a couple reasons why we do that. One is that it gives employees the ability to do certain things like create a workspace on an Android device where they can say, "I'm not at work anymore, so turn off those applications that are associated with work." Some capabilities are sort of pro-employee that are available in more heavyweight agents. It also connects into the way that we implement our policy a little bit cleaner. This covers the low-tier services and the devices that are able to connect to those.

The high-tier services are things like internal documents, any of the classic workplace management tools, the CRM, any kind of gateway that provides access to the production environment, CI/CD [continuous integration/continuous delivery] and other tools that developers would normally use on a regular basis. These services can only be accessed from a high-tier device. Our normal process is that high-tier devices are ones that are provided by the company. They're from a known supply and they have additional monitoring capabilities to make sure that we're able to do full validation of their environment, but also to put some additional security controls on them.

What challenges did Atlassian encounter with its network security architecture implementation after the pandemic started?

Ludwig: I would say we were 95-ish percent done with our rollout by the time people started working from home at mass scale. Some of the challenges that we had run into—that we were thinking about prior to everybody working from home—were around corner cases, like how to deal with consultants that come in for a short period of time.

One of the cases that we ran into that's super interesting and that people don't think about is how do you deal with really high-end consultants, meaning folks that are coming in for incident response and they need to have access to really powerful and important data. Do you carve out exceptions for them in the policy? Do you require that they work only on corporate laptops provided by Atlassian? There are some details there to figure out.

The board of directors was another interesting corner case that I don't think anybody thinks about ahead of time. These folks are not technically employees. They very much are going to want to work on their mobile phones, and they need access to certain types of data that are fairly restricted. Thinking through those implications [was something] that we had made good progress on before work from home happened.

Once COVID hit and we started to have 100% of people working from home, there were a couple issues that we ran into that were, I guess I would say, surprising. One was just that our VPN wasn't designed to scale. For the first two weeks we did a lot of work just ramping up the ability of the VPN to handle the load caused by people coming in. We do use VPN concentrators even though we didn't scale that out based on tiering to do zero-trust checks. So, network connectivity was an issue.

Another issue that we ran into—that probably had been there all along, but we didn't really think about it being a big problem—had to do with tunneling and how to do broader network connectivity. We had been doing sort of full tunneling, but then you can't access things like your printer at home, and it really spiked the volume [of traffic] that we had to deal with. So, we made a move toward split tunneling where some of the traffic that was coming off employee laptops was no longer going to be routed through our infrastructure.

It's good from a privacy standpoint for employees and it's also really important for things like accessing a printer at home, which if you're only working from home one day a week it's an annoyance, but if you're working at home full time it's a challenge if you need to do any kind of printing.

Another thing that we bumped into was how to think about network monitoring. In particular, we have intrusion detection systems that monitor and look for anomalies. The shape of traffic changed when people started working from home, as well as what our expectations were after that, so we had to work through it. The best places to have monitoring and what to look for have changed as well. I think we're going to find over time that it stabilizes a little bit in this environment because people are not moving around like before, but six months or nine months from now it will probably go the opposite direction where it will be even more complicated for us to monitor.

Have you considered a mesh VPN architecture to address some of the scalability problems?

Ludwig: The way I think about it is that the biggest challenge that we've faced is not a technical one per se. The basic controls were implemented in a couple of months. The definition of which services should go into which area took a little longer, but thinking through all the corner cases and then making sure that we've got those corner cases covered is what's taking a year or more.

There [might be] new technical approaches that we could take that would help with some of the underlying challenges that we have, but figuring out the policy and then actually getting that implemented to 100%, so that we've got everybody using our existing approach, has been the biggest challenge and that's what I would say to the CISOs of other organizations that are around our size.

I came from Google, so [the approach I’ve described] was what I expected everybody's network should look like. What I didn't realize until I started talking with my peers was that Atlassian was far ahead in terms of actually implementing this. I talk to other CISOs for orgs that are our size and they're just beginning to go down that journey. So, for us getting to a point where we've worked through all the corner cases is the priority.

How to deal with partners is another thing that we're thinking through now. We have technical support and certain levels of our support are provided by third-party partners, but they potentially have access to communication with customers. Thinking through what their infrastructure should look like and how to make sure that ties into ours in a way that doesn't degrade the overall security posture of our environment is the type of thing that we're working through. We're not looking to make any big technical changes right now, but maybe in the future.

Is it more critical now to make sure remote access to software development environments is done securely?

Ludwig: Yes, and that's one of the biggest challenges that we've had. We're making adjustments to the way that we think about zero trust to make sure that those partners that do software development, for example, work in partnership with our teams within Atlassian, that they have appropriate constraints in their environment and what those look like. In many ways, we've moved toward having those partners delegate employees within their environment that are going to be operating on our environment and giving them hardware to be able to have the monitoring that we need so we can have visibility and respond to things. So, we treat all the software development processes at this point as high tier. That's both access to portals into our production environment and access to the CI/CD environment as well.

As you might expect, we use mostly our own tools for software development. We don't make our own development tools per se, but those for the management of code, management of security issues, bugs—like JIRA and Bitbucket—pipelines for CI/CD, and so on. It's kind of a long, long topic, actually.

Copyright © 2020 IDG Communications, Inc.

Microsoft's very bad year for security: A timeline