Amazon unveils the largest text-to-speech model ever made
Researchers at Amazon have introduced the largest text-to-speech model to date, which is set to have enhanced qualities that allow it to better articulate complex sentences.
The model, BASE TTS (text-to-speech), which stands for Big Adaptive Streamable TTS with Emergent abilities, could set the foundation for more human-like interactions.
According to the research, it looks like extensive training for TTS models could improve reliability and versability in the same way that we see with large language models (LLMs) used for artificial intelligence.
Amazon’s BASE TTS impresses researchers
The text-to-speech model has been trained on 100,000 hours of speech data that lives in the public domain, which gives the tool a “state-of-the-art naturalness.” Predominantly English, some German, Dutch and Spanish data was also used.
Moreover, the researchers found that even training a TTS model on 10,000 hours of speech can result in an improved ability to articulate complex sentences more naturally.
At 980 million parameters, BASE-large has been recognized as the largest text-to-speech model ever made. The team also trained lesser models, with 400 million and 150 million parameters, and 10,000 and 1,000 hours of speech, in order to compare results.
Amazon’s team describes BASE TTS as a “high-fidelity model capable of mimicking speaker characteristics with just a few seconds of reference audio,” recognizing the need for more research but acknowledging its potential.
Some of the key areas the researchers focused on were compound nouns, emotions, foreign words, paralinguistics, punctuations, questions, and syntactic complexities – examples can be found on a dedicated web page.
With revolutionary artificial intelligence headlining most of 2023, text-to-speech breakthroughs like this in 2024 could continue to bring once-futuristic technologies into the hands of the masses, but the research team’s cautious approach does highlight a need for proper regulation amid security and privacy fears.
More from TechRadar Pro
Researchers at Amazon have introduced the largest text-to-speech model to date, which is set to have enhanced qualities that allow it to better articulate complex sentences. The model, BASE TTS (text-to-speech), which stands for Big Adaptive Streamable TTS with Emergent abilities, could set the foundation for more human-like interactions. According…
Recent Posts
- Google Wallet ID passes will be available in select EU states this summer
- Shokz upgraded its open earbuds with better sound and a lighter design
- Shokz says its clip-on OpenDots 2 earbuds focus on improved volume and bass
- How to watch England vs New Zealand: TV Channels, Full Schedule & 1st Test Preview
- Nomad Goods Promo Codes: Get 25% Off in June 2026
Archives
- June 2026
- May 2026
- April 2026
- March 2026
- February 2026
- January 2026
- December 2025
- November 2025
- October 2025
- September 2025
- August 2025
- July 2025
- June 2025
- May 2025
- April 2025
- March 2025
- February 2025
- January 2025
- December 2024
- November 2024
- October 2024
- September 2024
- August 2024
- July 2024
- June 2024
- May 2024
- April 2024
- March 2024
- February 2024
- January 2024
- December 2023
- November 2023
- October 2023
- September 2023
- August 2023
- July 2023
- June 2023