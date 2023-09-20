Cloud security vendor Skyhawk has unveiled a new benchmark for evaluating the ability of generative AI large language models (LLMs) to identify and score cybersecurity threats within cloud logs and telemetries. The free resource analyzes the performance of ChatGPT, Google BARD, Anthropic Claude, and other LLAMA2-based open LLMs to see how accurately they predict the maliciousness of an attack sequence, according to the firm.\n\nGenerative AI chatbots and LLMs can be a double-edged sword from a risk perspective, but with proper use, they can help improve an organization\u2019s cybersecurity in key ways. Among these is their potential to identify and dissect potential security threats faster and in higher volumes than human security analysts.\n\nGenerative AI models can be used to significantly enhance the scanning and filtering of security vulnerabilities, according to a Cloud Security Alliance (CSA) report exploring the cybersecurity implications of LLMs. In the paper, CSA demonstrated that OpenAI\u2019s Codex API is an effective vulnerability scanner for programming languages such as C, C#, Java, and JavaScript. \u201cWe can anticipate that LLMs, like those in the Codex family, will become a standard component of future vulnerability scanners,\u201d the paper read. For example, a scanner could be developed to detect and flag insecure code patterns in various languages, helping developers address potential vulnerabilities before they become critical security risks. The report found that generative AI\/LLMs have notable threat filtering capabilities, too, explaining and adding valuable context to threat identifiers that might otherwise go missed by human security personnel.\n\nLLM cyberthreat predictions rated in three ways\n\n\u201cThe importance of swiftly and effectively detecting cloud security threats cannot be overstated. We firmly believe that harnessing generative AI can greatly benefit security teams in that regard, however, not all LLMs are created equal,\u201d said Amir Shachar, director of AI and research at Skyhawk.\n\nSkyhawk\u2019s benchmark model tests LLM output on an attack sequence extracted and created by the company\u2019s machine-learning models, comparing\/scoring it against a sample of hundreds of human-labeled sequences in three ways: precision, recall, and F1 score, Skyhawk said in a press release. The closer to \u201cone\u201d the scores, the more accurate the predictability of the LLM. The results are viewable here.\n\n\u201cWe can\u2019t disclose the specifics of the tagged flows used in the scoring process because we have to protect our customers and our secret sauce,\u201d Shachar tells CSO. \u201cOverall, though, our conclusion is that LLMs can be very powerful and effective in threat detection, if you use them wisely.\u201d\n\nIt\u2019s important for organizations to understand that they can\u2019t just throw data [at an LLM] and expect it to do the work for them, Shachar says. \u201cWe meticulously built our technology to be able to incorporate LLMs into real-time threat detection by utilizing the right concepts from the ground up, and now we\u2019re leveraging that to provide a glimpse into LLM performance to the broader industry to strengthen the security community. "\n\nSkyhawk said its data will be regularly updated and available to view free of charge via its website.