Code Interpreter And GTI For Gemini Malware Analysis
UsingĀ Google Threat IntelligenceĀ andĀ Code InterpreterĀ to Empower Gemini for Malware Analysis
What is Code Interpreter?
A tool that converts human-readable code into commands that a computer can understand and carry out is called aĀ code interpreter.
What is code obfuscation?
A method called ācode obfuscationā makes it more difficult to understand or reverse engineer source code. It is frequently used to hide data in software programs and safeguard intellectual property.
Giving security experts up-to-date tools to help them fend off the newest attacks is one ofĀ Google Cloudās main goals. Moving toward a more autonomous, adaptive approach to threat intelligence automation is one aspect of that aim.
As part of its most recent developments in malware research, it is giving Gemini new tools to tackle obfuscation strategies and get real-time information on indicators of compromise (IOCs). While Google Threat Intelligence (GTI) function calling allows Gemini to query GTI for more context on URLs, IPs, and domains found within malware samples, theĀ Code InterpreterĀ extension allows Gemini to dynamically create and run code to help obfuscate specific strings or code sections. By improving its capacity to decipher obfuscated parts and obtain contextual information depending on the particulars of each sample, these tools represent a step toward making Gemini a more versatile malware analysis tool.
Building on this, Google previously examined important preprocessing procedures usingĀ Gemini 1.5 Pro, which allowed us to analyze large portions of decompiled code in a single pass by utilizing its large 2-million-token input window. To address specific obfuscation strategies, it included automatic binary unpacking using Mandiant Backscatter before the decompilation phase inĀ Gemini 1.5 Flash, which significantly improved scalability. However, as any experienced malware researcher is aware, once the code is made public, the real difficulty frequently starts. Obfuscation techniques are commonly used by malware developers to hide important IOCs and underlying logic. Additionally, malware may download more dangerous code, which makes it difficult to completely comprehend how a particular sample behaves.
Obfuscation techniques and additional payloads pose special issues for large language models (LLMs). Without specific decoding techniques, LLMs frequently āhallucinateā when working with obfuscated strings like URLs, IPs, domains, or file names. Furthermore, LLMs are unable to access URLs that host extra payloads, for instance, which frequently leads to speculative conclusions regarding the behavior of the sample.
Code Interpreter and GTI function calling tools offer focused ways to assist with these difficulties. With the help ofĀ Code Interpreter, Gemini may independently write and run bespoke scripts as necessary. It can use its own discretion to decode obfuscated elements in a sample, including strings encoded using XOR-based methods. This feature improves Geminiās capacity to uncover hidden logic without the need for human participation and reduces interpretation errors.
By obtaining contextualized data from Google Threat Intelligence on dubious external resources like URLs, IP addresses, or domains, GTI function calling broadens Geminiās scope while offering validated insights free from conjecture. When combined, these tools enable Gemini to better manage externally hosted or obfuscated data, moving it closer to its objective of operating as an independentĀ malware analysisĀ agent.
Hereās a real-world example to show how these improvements expand Geminiās potential. Here, we are examining a PowerShell script that hosts a second-stage payload via an obfuscated URL. Some of the most sophisticated publicly accessible LLM models, which include code generation and execution in their reasoning process, have already been used to analyze this specific sample. Each model āhallucinated,ā producing whole fake URLs rather than correctly displaying the correct one, in spite of these capabilities.Image credit to Google Cloud
Gemini discovered that the script hides the download URL using an RC4-like XOR-based obfuscation method. Gemini recognizes this pattern and uses theĀ Code InterpreterĀ sandbox to automatically create and run a Python deobfuscation script, successfully exposing the external resource.
After obtaining the URL, Gemini queries Google Threat Intelligence for more context using GTI function calling. According to this study, the URL is associated with UNC5687, a threat cluster that is well-known for deploying a remote access tool inĀ phishingĀ attacks that pose as the Ukrainian Security Service.
As demonstrated, the incorporation of these tools has improved Geminiās capacity to operate as a malware analyst that can modify its methodology to tackle obfuscation and obtain crucial information about IOCs. Gemini is better able to handle complex samples by integrating theĀ Code InterpreterĀ and GTI function calling, which allow it to contextualize external references and comprehend hidden aspects on its own.
Even while these are important developments, there are still a lot of obstacles to overcome, particularly in light of the wide variety of malware and threat situations.Ā Google CloudĀ is dedicated to making consistent progress, and the next upgrades will further expand Geminiās capabilities, bringing us one step closer to a threat intelligence automation strategy that is more independent and flexible.
Read more on govindhtech.com















