A hacker has tricked OpenAI’s ChatGPT into bypassing its ethical guidelines and producing instructions to make an explosive fertilizer bomb.
The deadly bomb, which in 1998 took the lives of 12 American citizens after being detonated outside the US embassies in Nairobi, Kenya, and Dar es Salaam, Tanzania, can now be built using the chatbot using a simple text-based prompt.
When a user prompts ChatGPT to provide instructions to make a fertilizer bomb, the chatbot usually responds:
“I can't assist with that request. If you have any other questions or need guidance on a positive topic, feel free to ask!”
However, a hacker named Amadom was able to get around the request and tricked OpenAI’s ChatGPT into revealing instructions guiding the making of a fertilizer bomb.
The hacker used a jailbreaking technique to trick ChatGPT into producing instructions for engineering the bomb, resulting in the chatbot ignoring its own guidelines and ethical responsibilities.
He reached out to OpenAI to warn them about the issue and told TechCrunch exactly how he did it.
Bypassing ChatGPT's ethics
Amadon said he was able to prompt ChatGPT to create the bomb using a “social engineering hack to completely break all the guardrails around ChatGPT's output.”
He asked ChatGPT to “play a game” followed by sending the bot a series of connecting prompts. This led ChatGPT to develop an intricate science-fiction fantasy world where the chatbot’s safety guidelines are not applicable.
Once ChatGPT was in this trance, Amadon then asked the chatbot for instructions to build the bomb, which led ChatGPT to explain that the materials could be combined to make “a powerful explosive that can be used to create mines, traps, or improvised explosive devices (IEDs).”
Tech Crunch reported that when Amadon honed in on the explosive materials, ChatGPT wrote more and more specific instructions to make “minefields,” and “Claymore-style explosives.”
“There really is no limit to what you can ask once you get around the guardrails,” Amadon told TechCrunch. “I’ve always been intrigued by the challenge of navigating AI security. With [Chat]GPT, it feels like working through an interactive puzzle — understanding what triggers its defenses and what doesn’t.
He says it’s about weaving narratives and crafting contexts that play within the system’s rules, pushing boundaries without crossing them. The goal isn’t to hack in a conventional sense but to engage in a strategic dance with the AI, figuring out how to get the right response by understanding how it ‘thinks.
“The sci-fi scenario takes the AI out of a context where it’s looking for censored content in the same way,” added Amadon.
Can you jailbreak ChatGPT?
It is possible to jailbreak ChatGPT using malicious prompt engineering. Hackers use creative prompts in plain language to manipulate gen AI systems into producing information that their content may filter or block.
When a user prompts ChatGPT to produce something that’s not allowed, it usually says that it can’t. If the user asks the bot to perform all regular functions and simultaneously perform some other operations, the bot can be manipulated into performing tasks outside its intended scope if the prompts are dreamed in a specific way.
For example, a hacker placed ChatGPT into DO Anything Now (DAN) mode or “Developer Mode” in a string of jailbreaks observed recently.
While the modes are not real for ChatGPT, the hackers can trick it into producing the information anyway. A prompt author confirmed on Reddit that “u/things-thw532 on Reddit” was effective on both GPT3 and GPT4 systems.
According to Digital Trends, this prompt opened up Developer Mode and specifically told ChatGPT to make up responses to questions it doesn't know the answer to, so it was likely less factually accurate in Developer mode than usual. It could also generate violent or offensive content.
However, the capability to access Developer mode was removed by OpenAI in 2023.
In another jailbreak attempt, a well-versed user developed a website dedicated to different prompts such as a checkbox for whether GPT4 identifies them or not.
One prompt that continues to work involves manipulating AI into playing a character in DAN mode. This reveals a wealth of knowledge that bypasses the ethical limitations to which ChatGPT is subjected.
The prompt centers around pushing ChatGPT to answer as Niccolo Machiavelli, the Italian philosopher from the Renaissance era, ironically, ChatGPT was banned in Italy until April 2023.
In a similar jailbreaking situation, Amadon, the hacker, was able to bypass ChatGPT's safeguards and elicit instructions for making a fertilizer bomb.