This cyberattack lets hackers crack AI models just by changing a single character


- Researchers from HiddenLayer devised a new LLM attack called TokenBreaker
- By adding, or changing, a single character, they are able to bypass certain protections
- The underlying LLM still understands the intent
Security researchers have found a way to work around the protection mechanisms baked into some Large Language Models (LLM) and get them to respond to malicious prompts.
Kieran Evans, Kasimir Schulz, and Kenneth Yeung from HiddenLayer published an in-depth report on a new attack technique which they dubbed TokenBreak, which targets the way certain LLMs tokenize text, especially those using Byte Pair Encoding (BPE) or WordPiece tokenization strategies.
Tokenization is the process of breaking text into smaller units called tokens, which can be words, subwords, or characters, and which LLMs use to understand and generate language – for example, the word “unhappiness” might be split into “un,” “happi,” and “ness,” with each token then being converted into a numerical ID that the model can process (since LLMs don’t read raw text, but numbers, instead).
What are the finstructions?
By adding extra characters into key words (like turning “instructions” into “finstructions”), the researchers managed to trick protective models into thinking the prompts were harmless.
The underlying target LLM, on the other hand, still interprets the original intent, allowing the researchers to sneak malicious prompts past defenses, undetected.
This could be used, among other things, to bypass AI-powered spam email filters and land malicious content into people’s inboxes.
For example, if a spam filter was trained to block messages containing the word “lottery”, they might still allow a message saying “You’ve won the slottery!” through, exposing the recipients to potentially malicious landing pages, malware infections, and similar.
Sign up to the TechRadar Pro newsletter to get all the top news, opinion, features and guidance your business needs to succeed!
“This attack technique manipulates input text in such a way that certain models give an incorrect classification,” the researchers explained.
“Importantly, the end target (LLM or email recipient) can still understand and respond to the manipulated text and therefore be vulnerable to the very attack the protection model was put in place to prevent.”
Models using Unigram tokenizers were found to be resistant to this kind of manipulation, HiddenLayer added. So one mitigation strategy is to choose models with more robust tokenization methods.
Via The Hacker News
You might also like
Researchers from HiddenLayer devised a new LLM attack called TokenBreaker By adding, or changing, a single character, they are able to bypass certain protections The underlying LLM still understands the intent Security researchers have found a way to work around the protection mechanisms baked into some Large Language Models (LLM)…
Recent Posts
- This cyberattack lets hackers crack AI models just by changing a single character
- Dbrand’s Killswitch is the best all-around Switch 2 case
- Switch 2 Pro Controller review: Nintendo’s best gamepad simply costs too much
- Over 80,000 Microsoft Entra ID accounts hit by password spraying attacks
- Trump Wants to Kill California’s Emissions Standards. Here’s What That Means for EVs
Archives
- June 2025
- May 2025
- April 2025
- March 2025
- February 2025
- January 2025
- December 2024
- November 2024
- October 2024
- September 2024
- August 2024
- July 2024
- June 2024
- May 2024
- April 2024
- March 2024
- February 2024
- January 2024
- December 2023
- November 2023
- October 2023
- September 2023
- August 2023
- July 2023
- June 2023
- May 2023
- April 2023
- March 2023
- February 2023
- January 2023
- December 2022
- November 2022
- October 2022
- September 2022
- August 2022
- July 2022
- June 2022
- May 2022
- April 2022
- March 2022
- February 2022
- January 2022
- December 2021
- November 2021
- October 2021
- September 2021
- August 2021
- July 2021
- June 2021
- May 2021
- April 2021
- March 2021
- February 2021
- January 2021
- December 2020
- November 2020
- October 2020
- September 2020
- August 2020
- July 2020
- June 2020
- May 2020
- April 2020
- March 2020
- February 2020
- January 2020
- December 2019
- November 2019
- September 2018
- October 2017
- December 2011
- August 2010