Data breached in translation

Online language translation software caused a data leak at Statoil. Use these best practices to keep translated information secure.

laptop / networked binary data flows / world map
Thinkstock

Before September, translation didn’t matter — at least, from an infosec standpoint. Taking content written in one language and changing it to another wasn’t at the top of most CSOs’ lists of data risks. Then Norwegian news network NRK uncovered a breach at Statoil, one of the world’s biggest oil and gas companies.

NRK reports that the $46 billion business used Translate.com, a free online tool, to translate “notices of dismissal, plans of workforce reductions and outsourcing, passwords, code information, and contracts.” Then, the story continued, Lise Lyngsnes Randeberg, a college professor, Googled Statoil: In her results were the company’s translations.

“Wow! What is this?” Randeberg thought, telling NRK, “This was information from organizations, private companies, government agencies.” In other words, stuff Statoil may not have wanted Randeberg — or any Google user — to read.

The translation industry saw the breach coming. “It was something that we had been warning companies about [for] 10 years or so,” says Don DePalma, Chief Strategist at Cambridge-based think tank Common Sense Advisory.  “It's been a question that's been coming up, given the way [free online translation] works: Is that something that would expose information?”

How online translation services work

So how did it happen? This is what Translate.com had to say: "Translate.com’s free, volunteer-based machine translations were not breached. There are two versions of the Translate.com solution. The one in question, the free version, using various online translation services, also incorporated volunteer translators to review and correct translations. This 'old' volunteer segment is now closed, and all translations involving volunteers have been removed. The online machine translations, which are still available for free, will no longer be saved."

In general, here’s how free online translation works: Every word you enter is stored in a translation engine where machine learning uses your entry — and its translation — to improve future results. That means anyone who uses the tool after you either has use of or access to your data, if not both. Whether your information winds up on Google from there depends on where and how the tool provider stores it.

Create a translation policy with security in mind

When it comes to preventing your own translation-related data breach, the first step is to determine when employees can — and can’t — use free tools. At BASF, that answer is never. After learning employees were translating “important emails about new products, business plans, [and] PowerPoint presentations” online, independent technology consultant Kirti Vashee says the company blocked all free translation sites.

For an option that’s less severe, you can always limit the use of free translation tools by topic. Maybe it’s okay to enter product shipment details in the software, but not receiver contracts. Vashee says this is problematic, though: Employees often use free translation to see what something’s about. “People will use Google [Translate] and Bing [Translator] because they get a memo in Chinese and just want to know, ‘What is he talking about?’” Employees who don’t speak a language might not realize content is about a sensitive topic until they’ve already translated it.

A more secure option is to create your own machine learning engines and move translation in-house. That’s what Volkswagen did, Vashee explains: “They specifically don’t want to use outside engines because of the risk of exposure.” Of course, in 2016 Volkswagen’s revenue was $251.6 billion. That’s more than the GDP of many sovereign nations, including Chile and Finland. At a company that large, internalizing translation is easy. For other businesses, it’s simply not realistic.

Professional translation services an option, but have their own risks

So what can those companies do? Instead of plugging data in random tools online, tell employees to route all translation through a professional provider. Translation vendor selection is usually based on quality, turnaround and cost. To ensure data security, ask prospective resources how they receive and deliver files for translation. If they say email, watch out. “[Email is] 10 times riskier than any [online] solution because it’s very easy to break into people’s email,” Vashee says.

Email is also readily forwarded — something many translation companies depend on. A human translator gets the job by specializing in that content type and the language direction needed — English into Polish, for example. If either of those factors change, so does the translator. As a result, even the largest translation companies don’t have in-house resources for everything you need. DePalma says, “There's a lot of reselling in the industry,” translation companies outsourcing work to other providers.

“Let's say somebody comes along and wants Albanian to Polish,” he explains, “There's a very small demand for that and they’re probably not going to provision for that on a 24/7/365 basis.” So after you email your file to your selected translation company, they forward it to another one, likely a business you’ve never heard of that only offers Polish. But your data won’t stay there. That company forwards your files to an independent translator somewhere else.

“[It’s] an infinite chain of inheritance,” says DePalma. Twenty-six percent of the average translation company’s income comes from other translation companies, constituting one-fourth of all words translated worldwide.

“As soon as [your file] goes outside the company,” he adds, “it's in the wild.” In the end, if no human resource is found, your project could wind up on Translation.com, except this time, you’re paying a translation provider to put it there. According to Common Sense Advisory, 64 percent of translation professionals say their colleagues frequently use free translation services on the web.

“When [your data is] in the wild,” DePalma continues, “you then have to rely on the provisions, the security mechanisms, just the entire range of anybody who touches that information to keep it secret and secure.”

That’s a lot of trust for a single vendor. So as the Russian proverb says, trust but verify. To track your data while it’s in translation, Vashee recommends translation management software (TMS), an industry-specific tool that tracks every word from the moment it leaves your office to the moment it comes back.

With TMS, no one accesses data without your direct approval; files cannot be forwarded without your knowledge. “You go in and you provide access,” Vashee says. “If you say, ‘Here are 100 valid IDs and the only people that will be able to touch this data use these 100 valid IDs,’ [you’ll] be able to know exactly what they did every time they touched the data. “That’s a high level of security. A TMS system properly set up will give you some protection.”

This protection isn’t perfect. TMS systems are sold to both translation companies and clients; advanced systems extract content directly from GitHub, Adobe CQ, and other platforms where it’s created. Ask how that connection is secured. Then ask where and how the TMS stores your files.

Even more importantly, does the TMS you use let translators take data out? DePalma mentions that translators are prone to removing materials from TMS to move it into a tool they might like better. They log in, hit export, then suddenly your data is back in the wild. Tell your TMS provider that you want this option turned off.

In the end, though, DePalma says no matter how well you lock down the tech, the riskiest part of any translation project is the translator: “Even if they couldn't pull [your data] out exactly, what they could do is a screen capture, then do an OCR, and then from that, put it into another tool.” To DePalma’s knowledge, this type of breach is simply “theoretical.” But before September, so was Statoil’s.

Do language translation apps pose a risk?

You might be tempted to use one of the many translation apps now available. Most are designed for consumer use to, for example, aid in voice-to-voice communication or translate street signs. They can be used for small translation tasks for business, but they do come with significant risk.

In October 2019, the Australian Strategic Policy Institute (ASPI) published a report that raised concerns about China’s use of technology that collects data to “generate industrial knowledge graphs, algorithmic models, and visualization platforms for finance technology, intelligent manufacturing, smart cities, national security, and industry consulting and analysis for government and the private sector.”

China’s Global Tone Communications Technology (GTCOM) is leading this effort by analyzing unstructured data in bulk in at least 65 languages from more than 200 countries, according to the report. GTCOM is part of China’s Central Propaganda Department. Its translation services are embedded in the Chinese search giant Alibaba’s cloud offering and in apps such as the JoveTrans voice recorder and translator and LanguageBox, which translates and transcribes conference sessions.

The ASPI report claims that GTCOM is “openly contributing to state security and intelligence data collection.” Clearly, businesses would want to steer clear of using its technology to translate any kind of sensitive data.

Popular web-based apps like Google Translate store and translate content on its own servers. Businesses lose some control over that data once it’s in the cloud, and it creates a regulatory risk. For example, under the EU’s General Data Protection Regulation (GDPR), an organization is equally accountable for a breach that occurs at a third-party data processor.

“Under normal circumstances, the user will perceive themselves as the owner of the words and documents that are being translated and as such they should have the ability to control how the information is used, shared and protected while being translated,” says Doug Graham, CSO for translation services company Lionbridge. “When it comes to these expectations, a user may get what they pay for…. If there is no cost for translation, how is the provider monetizing their services? Is it at the cost of security?”

Translation is no different than any other app or service, according to Graham. Businesses should ask app vendors the same kinds of questions concerning data protection and privacy, including:

  • Does the app encrypt data?
  • Will the application vendor enter into a contract that underpins their security responsibilities?
  • Does the app vendor have a dedicated security team?

The safest way to use translation apps is to keep sensitive data away from them. “From a security perspective, companies should consider the type of data involved and how essential it is that this data be protected,” says Graham. “If this data were only in – and were staying in – English, would you still load it in a free, third-party, online app? Common sense, with a good dose of privacy regulations and information handling procedures, should go a long way to tell people what can be translated using these apps and what shouldn’t.”

Copyright © 2019 IDG Communications, Inc.

How to choose a SIEM solution: 11 key features and considerations