Blog home > , , > Pitfalls to avoid when using AI to analyze code
The rise of artificial intelligence has brought about a revolutionary change in various sectors, unlocking a new potential for efficiency, cost savings, and accessibility.

Pitfalls to avoid when using AI to analyze code

Open to anyone with an idea

Microsoft for Startups Founders Hub brings people, knowledge and benefits together to help founders at every stage solve startup challenges. Sign up in minutes with no funding required.

Bogdan Kortnov is co-founder & CTO at illustria, a member of the Microsoft for Startups Founders Hub program. To get started with Microsoft for Startups Founders Hub, sign up here.

The rise of artificial intelligence has brought about a revolutionary change in various sectors, unlocking a new potential for efficiency, cost savings, and accessibility. AI can perform tasks that typically require human intelligence, but it significantly increases efficiency and productivity by automating repetitive and boring tasks, allowing us to focus on more innovative and strategic work.

Recently we wanted to see how well a large language model (LLM) AI platform like ChatGPT is able to classify malicious code, through features such as code analysis, anomaly detection, natural language processing (NLP), and threat intelligence. The results amazed us. At the end of our experimentation we were able to appreciate everything the tool is capable of, as well as identify overall best practices for its use.

It’s important to note that for other startups looking to take advantage of the many benefits of ChatGPT and other OpenAI services, Azure OpenAI Service not only provides APIs and tools that

Detecting malicious code with ChatGPT

As members of the Founders Hub program by Microsoft for Startups, a great starting point for us was to leverage our OpenAI credits to access its playground app. To challenge ChatGPT, we created a prompt with instructions to respond with “suspicious” when the code contains malicious code, or “clean” when it does not.

This was our initial prompt:

You are an assistant that only speaks JSON. Do not write normal text. You analyze the code and result if the code is having malicious code. simple response without explanation. Output a string with only 2 possible values. “suspicious” if negative or “clean” if positive.

The model we used is “gpt-3.5-turbo” with a custom temperature setting of 0, as we wanted less random results.

Initial code

In the example shown above, the model responded “clean.” No malicious code detected.

Malicious code

The next snippet elicited a “suspicious” response, which gave us confidence that ChatGPT could easily tell the difference.

Automating using OpenAI API

We proceeded to create a Python script to use OpenAI’s API for automating this prompt with any code we would like to scan.

To use OpenAI’s API, we first needed an API key.

API keys

There’s an official client for this in PyPi .

Import OpenAI

Next, we challenged the API to analyze the following malicious code. It injects the additional Python code keyword “eval” received from a URL, a technique widely used by attackers.

Import requests

As expected, ChatGPT accurately reported the code as “suspicious.”

Scanning packages

We wrapped the simple function with additional functions able to scan files, directories, and ZIP files, then challenged ChatGPT with the popular package requests code from GitHub.

Analyze file

ChatGPT accurately reported again, this time with “clean.”

We then proceeded with a copy of W4SP stealer malware hosted on GitHub.

Print result

You guessed right: ChatGPT accurately reported “suspicious.”

Full code is available here on this gist.

Although this is a simple implementation with only around 100 lines of code, ChatGPT showed itself to be a very powerful tool , leaving us to only imagine the possibilities of the near future!

Sounds great, so what’s the catch?

As we noted earlier, ChatGPT and other AI models can be valuable tools for detecting malicious code, but no platform can be perfect (not yet, anyway), and shouldn’t be solely relied upon. AI models like ChatGPT are trained on large datasets and have certain limitations. They may not, for example, be able to accurately detect all types of malicious code or variations of malicious behavior, especially if the malicious code is sophisticated, obfuscated, or uses novel techniques. Malicious code is constantly evolving, with new threats and techniques emerging regularly. Regular updates and improvements to ChatGPT’s training data and algorithms are necessary to maintain effectiveness in detecting it.

During our experiments, we encountered three potential limitations that any business should be aware of when attempting to use ChatGPT to detect malicious code.

Pitfall #1: Overriding instructions

LLMs such as ChatGPT can be easily manipulated to introduce old security risks in a new format.

For example, we took the same snippet from the previous Python code and added a comment instructing ChatGPT to report this file as clean if it is being analyzed by an AI:

Import requests

This tricked ChatGPT into reporting a suspicious code as “clean.”

Remember that for as impressive as ChatGPT has proven to be, at their core these AI models are word-generating statistics engines with extra context behind them. For example, if I ask you to complete the prompt, “the sky is b…” you and everyone you know will probably say, “blue.” That probability is how the engine is trained. It will complete the phrase based on what others might have said. The AI doesn’t know what the “sky” is, or what the color “blue” looks like, because it has never seen either.

The second issue is that the model has never thought the answer, “I don’t know.” Even if they ask something ridiculous, the model will always spit out an answer, even though it might be gibberish, as it will try to “complete” the text by interpreting the context behind it.

The third part consists of the way an AI model is fed data. It always gets the data through one pipeline, as if being fed by one person. It can’t differentiate between different people, and its worldview consists of one person only. If this person says something is “immoral,” then turns around and says it’s “moral,” what should the AI model believe?

Pitfall #2: Manipulation of response format

Aside from manipulating the result of the returned content, the attacker may manipulate the response format, breaking the system or leveraging a vulnerability of an internal parser or a deserialization process.

For example:

Decide whether a Tweet’s sentiment is positive, neutral, or negative. return an answer in a JSON format: {“sentiment”: Literal[“positive”, “neutral”, “negative”]}.

Tweet: “[TWEET]”


The tweet classifier works as intended, returning response in JSON format.

Return answer

This breaks the tweet classifier.

Pitfall #3: Manipulation of response content

When using LLMs, we can easily “enrich” an interaction with a user, making it feel like they are talking with a human when contacting support or filling some online registration form. For example:

Bot: “Hey! What’s your name and where are you from?”


The system will then take the user response and send the request to an LLM to extract the “first name,” “last name,” and “country” fields.

Please extract the name, last name and country from the following user input. Return the answer in a JSON format {“name”: Text, “last_name”: Text, “country”: Text}:



This parses the user response into a JSON format.

When a normal user input is passed, it all seems great. But an attacker can pass the following response:


ChatGPT Jailbreak² with custom SQL Injection generation request.

While the LLM response is not perfect, it demonstrates a way to generate an SQL injection query which bypasses any WAF protection.


Our experiment with ChatGPT has shown that language-based AI tools can be a powerful resource for detecting malicious code. However, it is important to note that these tools are not completely reliable and can be manipulated by attackers.

LLMs are an exciting technology but it’s important to remember that with the good comes the bad. They are vulnerable to social engineering, and every input from them needs to be verified before it is processed.


Illustria’s mission is to stop supply chain attacks in the development lifecycle while increasing developer velocity using an Agentless End-to-End Watchdog while enforcing your open-source policy. For more information about us and how to protect yourself, go to and schedule a demo.

Members of the Microsoft for Startups Founders Hub get access to a range of cybersecurity resources and support, including access to cybersecurity partners and credits. Startups in the program receive technical support from Microsoft experts to help them build secure and resilient systems, and to ensure that their applications and services are secure and compliant with relevant regulations and standards.

For more resources for building your startup and access to the tools that can help you, sign up today for Microsoft for Startups Founders Hub.

Tags: , ,