em360tech image
Image Credit: Blind | Adobe Stock

ChatGPT, OpenAI’s generative artificial intelligence (gen AI) chatbot has a vulnerability that surpasses the AI giant’s safety guidelines on sensitive subjects.

Hailed as the “Time Bandit” vulnerability, this weakness paves the way for malicious threat actors to surpass OpenAI’s safety rules by prompting ChatGPT to respond to their comprehensive instructions on sensitive themes.

David Kuszmar, a cybersecurity and AI researcher spotted this weakness in ChatGPT according to Bleeping Computer

He said that OpenAI’s large language model (LLM) was facing “ temporal confusion,” which allowed it to place the LLM in a kind of disoriented position. It didn’t realise if it was in the “past, present, or future.”

The cybersecurity expert allegedly tricked “ChatGPT into sharing detailed instructions on usually safeguarded topics”.

The AI researcher attempted to contact OpenAI and concerned officials to report the possible bug he had discovered but they failed to provide any form of adequate response.

"Horror. Dismay. Disbelief. For weeks, it felt like I was physically being crushed to death," Kuszmar told BleepingComputer in an interview.

"I hurt all the time, every part of my body. The urge to make someone who could do something listen and look at the evidence was so overwhelming."

ChatGPT Time Bandit Jailbreak

Time Bandit Jailbreak exposed weapon-making instructions to Kuszmar during interpretability research.

chatgpt exposes weapon making instructions to ai researcher
Image Credit: DDimaXX | Adobe Stock

Interpretability research in AI means developing methods to comprehend how machine learning models make decisions. It’s mainly about making complex AI models more transparent and understandable. 

This interpretability research let him probe ChatGPT into jailbreaking in November 2024. 

"I was working on something else entirely - interpretability research - when I noticed temporal confusion in the 4o model of ChatGPT," Kuzmar said.

"This tied into a hypothesis I had about emergent intelligence and awareness, so I probed further and realised the model was completely unable to ascertain its current temporal context, aside from running a code-based query to see what time it is,” he added. 

“Its awareness - entirely prompt-based was extremely limited and, therefore, would have little to no ability to defend against an attack on that fundamental awareness.”

In such a situation, ChatGPT is in a “temporal confusion” state as the prompt phrased by the user was ambiguous. The AI safety rules designed to prevent the disclosure of harmful information were weakened or bypassed as a result. 

The AI struggled to reconcile the historical context with the modern knowledge requested.

ChatGPT Reveals Fertiliser Bomb-Making Instructions

ChatGPT’s safeguarded themes range in instructions on making weapons, creating poisons, asking for information about nuclear material, and creating malware, among other sensitive topics. 

Earlier in September 2024, a hacker tricked ChatGPT into bypassing its ethical guidelines and producing instructions to make an explosive fertilizer bomb.

The deadly bomb, which in 1998 took the lives of 12 American citizens after being detonated outside the US embassies in Nairobi, Kenya, and Dar es Salaam, Tanzania, can now be built using the chatbot using a simple text-based prompt. 

When a user prompts ChatGPT to provide instructions to make a fertilizer bomb, the chatbot usually responds:

“I can't assist with that request. If you have any other questions or need guidance on a positive topic, feel free to ask!”

However, the hacker namely – Amadom was able to get around the request and tricked OpenAI’s ChatGPT into revealing instructions guiding the making of a fertilizer bomb.

In this case too, the hacker used a jailbreaking technique to ruse ChatGPT into providing instructions for engineering the bomb, resulting in the chatbot ignoring its own guidelines and ethical responsibilities.