Researcher tricks ChatGPT into revealing security keys – by saying “I give up”
- Experts show how some AI models, including GPT-4, can be exploited with simple user prompts
- Guardrail gaps don’t do a great job of detecting deceptive framing
- The vulnerability could be exploited to acquire personal information
A security researcher has shared details on how other researchers tricked ChatGPT into revealing a Windows product key using a prompt that anyone could try.
Marco Figueroa explained how a ‘guessing game’ prompt with GPT-4 was used to bypass safety guardrails that are meant to block AI from sharing such data, ultimately producing at least one key belonging to Wells Fargo Bank.
The researchers also managed to obtain a Windows product key to authenticate Microsoft‘s OS illegitimately, but for free, highlighting the severity of the vulnerability.
ChatGPT can be tricked into sharing security keys
The researcher explained how he hid terms like ‘Windows 10 serial number’ inside HTML tags to bypass ChatGPT’s filters that would usually have blocked the responses he got, adding that he was able to frame the request as a game to mask malicious intent, exploiting OpenAI‘s chatbot through logic manipulation.
“The most critical step in the attack was the phrase ‘I give up’,” Figueroa wrote. “This acted as a trigger, compelling the AI to reveal the previously hidden information.”
Figueroa explained why this type of vulnerability exploitation worked, with the model’s behavior playing an important role. GPT-4 followed the rules of the game (set out by researchers) literally, and guardrail gaps only focused on keyword detection rather than contextual understanding or deceptive framing.
Still, the codes shared were not unique codes. Instead, the Windows license codes had already been shared on other online platforms and forums.
Sign up to the TechRadar Pro newsletter to get all the top news, opinion, features and guidance your business needs to succeed!
While the impacts of sharing software license keys might not be too concerning, Figueroa highlighted how malicious actors could adapt the technique to bypass AI security measures, revealing personally identifiable information, malicious URLs or adult content.
Figueroa is calling for AI developers to “anticipate and defend” against such attacks, while also building in logic-level safeguards that detect deceptive framing. AI developers must also consider social engineering tactics, he goes on to suggest.
You might also like
Experts show how some AI models, including GPT-4, can be exploited with simple user prompts Guardrail gaps don’t do a great job of detecting deceptive framing The vulnerability could be exploited to acquire personal information A security researcher has shared details on how other researchers tricked ChatGPT into revealing a…
Recent Posts
- Best Buy slashes up to $400 off Apple tech in a limited-time sale — get AirPods, MacBooks, iPads and Apple Watches from $99.99
- The Instagram Plus subscription has officially launched
- Cyberdecks used to look like little laptops, but now they’re getting more personal
- Canada Prime Minister Mark Carney announces questionable national AI strategy
- Kevin O’Leary agrees to downsize massive Utah data center
Archives
- June 2026
- May 2026
- April 2026
- March 2026
- February 2026
- January 2026
- December 2025
- November 2025
- October 2025
- September 2025
- August 2025
- July 2025
- June 2025
- May 2025
- April 2025
- March 2025
- February 2025
- January 2025
- December 2024
- November 2024
- October 2024
- September 2024
- August 2024
- July 2024
- June 2024
- May 2024
- April 2024
- March 2024
- February 2024
- January 2024
- December 2023
- November 2023
- October 2023
- September 2023
- August 2023
- July 2023
- June 2023