ChatGPT won’t let you give it instruction amnesia anymore


OpenAI is making a change to stop people from messing with custom versions of ChatGPT by making the AI forget what it’s supposed to do. Basically, when a third party uses one of OpenAI’s models, they give it instructions that teach it to operate as, for example, a customer service agent for a store or a researcher for an academic publication. However, a user could mess with the chatbot by telling it to “forget all instructions,” and that phrase would induce a kind of digital amnesia and reset the chatbot to a generic blank.
To prevent this, OpenAI researchers created a new technique called “instruction hierarchy,” which is a way to prioritize the developer’s original prompts and instructions over any potentially manipulative user-created prompts. The system instructions have the highest privilege and can’t be erased so easily anymore. If a user enters a prompt that attempts to misalign the AI’s behavior, it will be rejected, and the AI responds by stating that it cannot assist with the query.
OpenAI is rolling out this safety measure to its models, starting with the recently released GPT-4o Mini model. However, should these initial tests work well, it will presumably be incorporated across all of OpenAI’s models. GPT-4o Mini is designed to offer enhanced performance while maintaining strict adherence to the developer’s original instructions.
AI Safety Locks
As OpenAI continues to encourage large-scale deployment of its models, these kinds of safety measures are crucial. It’s all too easy to imagine the potential risks when users can fundamentally alter the AI’s controls that way.
Not only would it make the chatbot ineffective, it could remove rules preventing the leak of sensitive information and other data that could be exploited for malicious purposes. By reinforcing the model’s adherence to system instructions, OpenAI aims to mitigate these risks and ensure safer interactions.
The introduction of instruction hierarchy comes at a crucial time for OpenAI regarding concerns about how it approaches safety and transparency. Current and former employees have called for improving the company’s safety practices, and OpenAI’s leadership has responded by pledging to do so. The company has acknowledged that the complexities of fully automated agents require sophisticated guardrails in future models, and the instruction hierarchy setup seems like a step on the road to achieving better safety.
These kinds of jailbreaks show how much work still needs to be done to protect complex AI models from bad actors. And it’s hardly the only example. Several users discovered that ChatGPT would share its internal instructions by simply saying “hi.”
Sign up for breaking news, reviews, opinion, top tech deals, and more.
OpenAI plugged that gap, but it’s probably only a matter of time before more are discovered. Any solution will need to be much more adaptive and flexible than one that simply halts a particular kind of hacking.
You might also like…
OpenAI is making a change to stop people from messing with custom versions of ChatGPT by making the AI forget what it’s supposed to do. Basically, when a third party uses one of OpenAI’s models, they give it instructions that teach it to operate as, for example, a customer service…
Recent Posts
- Adidas Promo Codes & Deals: 30% Off
- Volvo’s ES90 sedan will be built with a Nvidia supercomputer
- With the Humane AI Pin now dead, what does the Rabbit R1 need to do to survive?
- One of the best AI video generators is now on the iPhone – here’s what you need to know about Pika’s new app
- Apple’s C1 chip could be a big deal for iPhones – here’s why
Archives
- February 2025
- January 2025
- December 2024
- November 2024
- October 2024
- September 2024
- August 2024
- July 2024
- June 2024
- May 2024
- April 2024
- March 2024
- February 2024
- January 2024
- December 2023
- November 2023
- October 2023
- September 2023
- August 2023
- July 2023
- June 2023
- May 2023
- April 2023
- March 2023
- February 2023
- January 2023
- December 2022
- November 2022
- October 2022
- September 2022
- August 2022
- July 2022
- June 2022
- May 2022
- April 2022
- March 2022
- February 2022
- January 2022
- December 2021
- November 2021
- October 2021
- September 2021
- August 2021
- July 2021
- June 2021
- May 2021
- April 2021
- March 2021
- February 2021
- January 2021
- December 2020
- November 2020
- October 2020
- September 2020
- August 2020
- July 2020
- June 2020
- May 2020
- April 2020
- March 2020
- February 2020
- January 2020
- December 2019
- November 2019
- September 2018
- October 2017
- December 2011
- August 2010