Security expert Evan Pena uses large language models (LLMs) almost daily \u201cto confirm answers or come up with other ideas about how to investigate a vulnerability.\u201d These natural language processing (NLP) tools that rely on artificial neural networks can generate text or code almost like humans, and they can also recognize patterns.\n\nTapping into their potential is part of Pena\u2019s job. He is managing director of professional services at Google Cloud and has led Mandiant\u2019s red team for over five years. For him, using large language models often means finishing tasks quickly, an essential factor in cybersecurity, a field in which the workload is often high and skill shortages are a real struggle.\n\nAt one point, Pena and his colleagues needed a C# utility to test a known username and password combination against a number of hosts within a network. \u201cSince it was a red team, we did not want to use open-source tooling to accomplish this in order to avoid static indicators, and avoid detection by EDRs,\u201d he says. \u201cWe were able to develop this tool and fully test it in a practice environment before using it in a production environment within a few hours.\u201d The tool allowed them to identify local administrator access on a system and perform lateral movement within the environment.\n\nRed and blue teams can use LLMs for many more tasks. Offensive security firm Bishop Fox explores how these models can power social engineering campaigns, cybersecurity solutions provider Check Point Software leverages AI to optimize malware investigation and vulnerability finding, while Cossack Labs uses it when recruiting security experts for its data protection solutions business.\n\nHow red and blue teams use LLMs in their work\n\nLarge language models have started to revolutionize the way red and blue teams do their work. These tools were first used to automate mundane tasks, which can free up valuable time and resources. Little by little, though, they are beginning to reach into more complex areas of cybersecurity.\n\n\u201cIt\u2019s safe to say that LLMs and generative AI have revolutionized red teamer\u2019s ability to conduct social engineering and phishing campaigns at scale,\u201d says Brandon Kovacs, senior red team consultant for Bishop Fox. \u201cFor example, using LLMs that have been pre-trained on billions of parameters of human text, in addition to supplying these models with additional data from public sources regarding the target, has allowed us to create very convincing and personalized campaigns at scale. This would typically take hours or days to perform. However, thanks to AI, we\u2019re able to create these instantaneously.\u201d\n\nBishop Fox is also exploring ways to create and study new malware strains that were not previously seen in the wild. Additionally, it uses LLMs to perform source-code analysis to identify security vulnerabilities, a task that is also a top priority at Check Point Software, according to Sergey Shykevich, the company\u2019s threat intelligence group manager. \u201cWe use a plugin named Pinokio, which is a Python script that uses the davinci-003 model to help with vulnerability research on functions decompiled by the IDA tool,\u201d he says.\n\nCheck Point also relies on artificial intelligence to streamline the process of investigating malware. They use Gepetto, a Python script that uses GPT-3.5 and GPT-4 models to provide context to functions decompiled by the IDA tool. \u201cGepetto clarifies the role of specific code functions and can even automatically rename its variables,\u201d Shykevich says.\n\nSome red and blue teams have also found counterintuitive ways of getting help from AI. Anastasiia Voitova, head of security engineering at Cossack Labs, says her blue team is thinking about this technology in the recruitment process, trying to filter out candidates over-reliant on AI. \u201cWhen I hire new cybersecurity engineers, I give them a test task, and some of them just ask ChatGPT and then blindly copy-paste the answer without thinking,\u201d Voitova says. \u201cChatGPT is a nice tool, but it\u2019s not an engineer, so [by hiring candidates who don\u2019t possess the right skill set,] the life of a blue team might become more difficult.\u201d\n\nAdding LLMs to red and blue teams\n\nRed and blue teams looking to incorporate large language models into their workflow need to do it systematically. They have to \u201cbreak their day-to-day work into steps\/processes and then to review each step and determine if LLM can assist them in a specific step or not,\u201d Shykevich says.\n\nThis process is not a simple one, and it requires security experts to think differently. It\u2019s a \u201cparadigm shift,\u201d as Kovacs puts it. Trusting a machine to do cybersecurity-related tasks that were typically done by humans can be quite a challenging adjustment if the security risks posed by the new technology are not thoroughly discussed.\n\nLuckily, though, the barriers to entry to train and run your own AI models have lowered over the past year, in part thanks to the prevalence of online AI communities, such as HuggingFace, which allow anyone to access and download open-source models using an SDK. \u201cFor example, we can quickly download and run the Open Pre-trained Transformer Language Models (OPT) locally on our own infrastructure, which give us the equivalency of GPT-like responses, in only a few lines of code, minus the guard rails and restrictions typically implemented by the ChatGPT equivalent,\u201d Kovacs says.\n\nBoth red and blue teams who want to use large language models must consider the potential ethical implication of this technology. This includes privacy, the confidentiality of data, biases, and the lack of transparency around it. As Kovacs puts it, \u201cAI decision-making can be rather opaque.\u201d\n\nThe human-AI red and blue teams\n\nWhen using LLMs, though, both red and blue teams need to keep one thing in mind. \u201cThe technology isn\u2019t perfect,\u201d says Kovacs. \u201cAI and LLMs are still relatively new and in their infancy stage. Whether it\u2019s improving the security of the AI systems themselves or addressing the ethical and privacy concerns introduced by this technology, we still have a long way to go.\u201d\n\nKovacs and most researchers see LLMs as a way to complement and assist red and blue teams, not replace them entirely, because while these models excel at processing data and drawing insights, they lack human intuition and context.\n\n\u201cLLMs are still far from being able to replace researchers or make decisions related to cyber research\/red teams,\u201d Shykevich says. \u201cIt is a tool that assists in the work, but the researchers still have to review its output.\u201d\n\nThe quality of data is also important, as Kovacs notices: \u201cThe effectiveness of LLMs and the outputs they provide is greatly influenced by the quality of the data supplied during the training of the model.\u201d\n\nIn the coming years, this technology will be increasingly embedded into the day-to-day lives of tech experts, potentially turning everyone into a \u201ccybersecurity power user.\u201d Tools that do that, such as CrowdStrike\u2019s recently introduced Charlotte AI, have started to emerge. Charlotte AI is a generative AI-based security analyst that customers can use. They can ask questions in plain English and dozens of other languages and will receive answers. \u201cLarge language models are built to incorporate knowledge from external data stores, as well as data generated from technologies like the Falcon platform,\u201d a CrowdStrike spokesperson said.\n\nIn this context, to any red and blue team member, staying up to date with the evolution of AI is a must. In the years to come, increasingly sophisticated tools will be used both offensively and defensively. \u201cOn the offensive side, we can expect to see more advanced and automated attacks, in addition to increasingly advanced social engineering attacks such as deepfakes or voice phishing,\u201d Kovacs says. \u201cOn the defensive side, we can expect to see AI playing a crucial role in threat detection and response, and helping security analysts to automate and sift through large data sets to identify potential threats.\u201d\n\nAs Kovacs anticipates, hackers will continue to use LLMs and think of innovative ways to infiltrate organizations and break security rules. Therefore, security teams need to stay ahead of the curve. By combining human intelligence with AI capabilities, red and blue teams can help minimize the impact of cyberattacks.