Is language-theoretic security the answer to Internet insecurity?

library shelves books stacks
Credit: flickr/Loughborough University Library

How some industry innovators are putting theory to practice with language-theoretic security (LANGSEC).

Historically traditional security technologies have not been very effective. For the better part of the last two decades, IT and security teams have been focused on defending the networks. Attackers have figured out how to work around network controls.

Every five to 10 years a new technology comes along that needs to be tested, and according to Kunal Anand, co-founder and CTO at Prevoty, language-theoretic security is the next generation of application security controls that are broadening the solution space.

According to Upstanding Hackers, "Language-theoretic security, or LangSec, is the emerging field of digital security that treats code patterns and data formats as languages and their grammars for the purpose of preventing the introduction of malicious code into software."

LangSec enables companies to secure their prized data and applications without lag-time or delay. "It is security across initiatives," Anand said.

Anand has been a bit of a fish swimming upstream as he has grown in his understanding of security technologies. One key question that drives his ongoing development in application security, he said, is asking, "Is there an ability for us to add security so that we can make sure those apps don’t get compromised?"

Across sectors and industries, whether it's an enterprise in financial services or retail, they all have a preponderance of software they’ve developed. For most of these well-established companies, the issue with application security is rooted in the reality that they may not have the same team that developed that software, said Anand. Those vulnerabilities are exposed by attackers.  

Understanding what controls can predict actions before they happen can strengthen security and mitigate threats, but how is it possible to predict an event before it happens? Anand said controls like run-time application security monitoring (RASP) and LangSec offer a deeper level of security than pattern matching.

"Pattern matching can block content in list of patterns, but attacker can get more specific and put spaces in between or insert upper/lower case letters," said Anand. "With new types of attack, payloads are generated really quickly. Criminals are using scripts like JJencode, so how do you guard against that using patterns?" 

When Anand worked at MySpace, he said, "We had to run thousands of patterns. The problem is that false positives and false negatives are high in pattern matching." When security is focused on anomaly detection, you first have to determine what is normal.

"Applications are always changing, some change up to 40 times a day. The thing you are trying to detect is changing because the underlying application is always changing," said Anand.  

LangSec is the idea of understanding what something is going to do before it executes, so it looks at the intent within the context. Anand gave the example of network controls in SQL injection. 

"Network controls are looking for 0s and 1s, but they aren’t living inside the application. If you can see every database query going between the application and database, you can look at it the same way that the database is going to look at it. You don’t have to guess. You can look at the query and understand exactly what it is going to do," Anand said.

Coupling visibility inside the application and nowhere else with the idea of language analysis to look at HTML for things like command injection or SQL injection will allow security professionals to understand the way that the database would execute.

"We don’t care about patterns because patterns change all the time. Applications are always changing. Data flow analysis may change all the time, but the benefit of being in the application is you don’t have to guess," Anand said. The result is a much reduced rate of false positives at a much faster rate.  

For LangSec to be accurate, though, you need to look at the payload in the right context. "Building a LangSec approach is really difficult on its own," Anand said. "An enterprise can have lots of databases that all diverge in unique ways. Oracle may have different functionalities from SQL. You have to understand how each target system is going to execute. You have to build formal tokenizers and language analysis tools for each one of those pieces of the tool chain." 

While some have overlooked LangSec as too theoretical or nerdy, more organizations are waking up to it. "Gaming companies are talking about applying LangSec."  

The greatest challenge right now is moving security professionals forward, beyond the habits of defending the network. Anand asked, "How do you explain to people who are entrenched in patterns and pattern matching?"  

The proof, as they say, will be in the pudding.

This article is published as part of the IDG Contributor Network. Want to Join?

Insider: These ransomware situations can result in colossal outcomes
View Comments
Join the discussion
Be the first to comment on this article. Our Commenting Policies