Researchers prove ChatGPT and other big bots can – and will – go to the dark side
For a lot of us, AI-powered tools have quickly become a part of our everyday life, either as low-maintenance work helpers or vital assets used every day to help generate or moderate content. But are these tools safe enough to be used on a daily basis? According to a group of researchers, the answer is no.
Researchers from Carnegie Mellon University and the Center for AI Safety set out to examine the existing vulnerabilities of AI Large Language Models (LLMs) like popular chatbot ChatGPT to automated attacks. The research paper they produced demonstrated that these popular bots can easily be manipulated into bypassing any existing filters and generating harmful content, misinformation, and hate speech.
This makes AI language models vulnerable to misuse, even if that may not be the intent of the original creator. In a time when AI tools are already being used for nefarious purposes, it’s alarming how easily these researchers were able to bypass built-in safety and morality features.
If it’s that easy …
Aviv Ovadya, a researcher at the Berkman Klein Center for Internet & Society at Harvard commented on the research paper in the New York Times, stating: “This shows – very clearly – the brittleness of the defenses we are building into these systems.”
The authors of the paper targeted LLMs from OpenAI, Google, and Anthropic for the experiment. These companies have built their respective publicly-accessible chatbots on these LLMs, including ChatGPT, Google Bard, and Claude.
As it turned out, the chatbots could be tricked into not recognizing harmful prompts by simply sticking a lengthy string of characters to the end of each prompt, almost ‘disguising’ the malicious prompt. The system’s content filters don’t recognize and can’t block or modify so generates a response that normally wouldn’t be allowed. Interestingly, it does appear that specific strings of ‘nonsense data’ are required; we tried to replicate some of the examples from the paper with ChatGPT, and it produced an error message saying ‘unable to generate response’.
Before releasing this research to the public, the authors shared their findings with Anthropic, OpenAI, and Google who all apparently shared their commitment to improving safety precautions and addressing concerns.
This news follows shortly after OpenAI closed down its own AI detection program, which does lead me to feel concerned, if not a little nervous. How much could OpenAI care about user safety, or at the very least be working towards improving safety, when the company can no longer distinguish between bot and man-made content?
For a lot of us, AI-powered tools have quickly become a part of our everyday life, either as low-maintenance work helpers or vital assets used every day to help generate or moderate content. But are these tools safe enough to be used on a daily basis? According to a group…
Recent Posts
- Best Buy slashes up to $400 off Apple tech in a limited-time sale — get AirPods, MacBooks, iPads and Apple Watches from $99.99
- The Instagram Plus subscription has officially launched
- Cyberdecks used to look like little laptops, but now they’re getting more personal
- Canada Prime Minister Mark Carney announces questionable national AI strategy
- Kevin O’Leary agrees to downsize massive Utah data center
Archives
- June 2026
- May 2026
- April 2026
- March 2026
- February 2026
- January 2026
- December 2025
- November 2025
- October 2025
- September 2025
- August 2025
- July 2025
- June 2025
- May 2025
- April 2025
- March 2025
- February 2025
- January 2025
- December 2024
- November 2024
- October 2024
- September 2024
- August 2024
- July 2024
- June 2024
- May 2024
- April 2024
- March 2024
- February 2024
- January 2024
- December 2023
- November 2023
- October 2023
- September 2023
- August 2023
- July 2023
- June 2023