Papercup, the UK startup using AI for realistic-sounding voice translation, raises £8M funding


Papercup, the U.K.-based AI startup that has developed speech technology that translates people’s voices into other languages and is already being used in the video and television industry, has raised £8 million in funding.
The round was led by LocalGlobe and Sands Capital Ventures, alongside Sky, GMG Ventures, Entrepreneur First (EF) and BDMI. Papercup says the new capital will be used to invest further into machine learning research and to expand its “human-in-the-loop” quality control functionality, which is used to improve and customise the quality of its AI-translated videos.
Meanwhile, Papercup’s existing angel investors include William Tunstall-Pedoe, the founder of Evi Technologies — the company acquired by Amazon to create Alexa — and Zoubin Ghahramani, former chief scientist and VP of AI at Uber and now part of the Google Brain leadership team.
Founded in 2017 by Jesse Shemen and Jiameng Gao while going through EF’s company builder program, Papercup is building out an AI and machine learning-based system that it says is capable of translating a person’s voice and expressiveness into other languages. Unlike a lot of text-to-speech, the startup claims the resulting voice translation is “indistinguishable” from human speech, and, perhaps uniquely, it attempts to retain the characteristics of the original speaker’s voice.
Initially, the tech is being targeted at video producers, including already being used by Sky News, Discovery and YouTube stars Yoga with Adriene, along with DIY content creators. It is pitched as a much more scalable and therefore lower-cost alternative to pure human dubbing.
“Most of the world’s video and audio content is shackled to a single language,” says Papercup co-founder and CEO Shemen. “That includes billions of hours of videos on YouTube, millions of podcast episodes, tens of thousands of classes on Skillshare and Coursera, and thousands of hours of content on Netflix. Almost every content owner is scrambling to go international, but there is yet no simple and cost-effective way to translate content beyond subtitling”.
For “deep pocketed studios,” there is of course the option to employ high-end dubbing via a professional dubbing studio and voice actors, but this is far too expensive for most content owners. And even wealthy studios are often constrained in terms of how many languages they can accommodate.
“That leaves the mid and long tail of content owners — literally 99% of all content — stranded and incapable of reaching international audiences beyond subtitling,” says Shemen, which, of course, is where Papercup comes into play. “Our aim is to generate translated voices that sound as close to the original speaker as possible”.
To do that, he says that Papercup will need to tackle four things. First up is creating “natural sounding” voices, i.e. how clear and human-like the synthetic voices sound. The second challenge is retaining emotion and pacing to reflect how the original speaker expressed themselves (think: happy, sad, angry etc.). Third is capturing the uniqueness of someone’s voice (e.g. Morgan Freeman, but in German). Lastly, the resulting translation needs the correct alignment of the audio to the video itself.
Explains Shemen: “We started off by making our voices as human-like and natural sounding as possible, where we’ve made quite a significant leap in terms of quality by honing our technology to the task, and today we have one of the best Spanish speech synthesis systems in production.
“We’re now focusing on better retainment and transfer of the original emotion and expressiveness in the original speaker across languages, and meanwhile figuring out what it is exactly that makes for quality dubbing”.
The next challenge and arguably the toughest nut to crack is “speaker adaptation,” described as capturing the uniqueness of someone’s voice. “This is the last layer of adaptation,” notes the Papercup CEO, “but it was also one of our first breakthroughs in our research. While we have models that can accomplish this, we’re focusing more of our time on emotion and expressiveness”.
That’s not to say Papercup is entirely machine-powered, even if it might be one day. The company also employs a “human-in-the-loop” process to make corrections and adjustments to the translated audio track. This includes correcting for any speech recognition or machine translation errors that come up, making adjustments to the timings of the audio, as well as enforcing emotions (e.g. happy, sad) and changing the speed of the generated voice.
How much human-in-the-loop is required depends on the type of content and priorities of the content owners, i.e. how realistic or perfect they need the resulting video to be. In other words, it isn’t a zero-sum game, as good enough will be more than enough for a swathe of content owners at scale.
Asked about the technology’s beginnings, Shemen says Papercup started with research conducted by co-founder and CTO Jiameng Gao “who is incredibly smart and oddly obsessed with speech processing”. Gao completed two Masters at University of Cambridge (in machine learning and speech language technology) and wrote a thesis on speaker adaptive speech processing. It was at Cambridge that he realised that something like Papercup was possible.
“When we started working together at Entrepreneur First at the end of 2017, we built our initial prototype systems that showed that this technology was even possible despite there being no precedent for it,” says Shemen. “Based on early conversations, the demand was clearly overwhelming for what we were building — it was just a function of actually building something that could be used in a production environment”.
Papercup, the U.K.-based AI startup that has developed speech technology that translates people’s voices into other languages and is already being used in the video and television industry, has raised £8 million in funding. The round was led by LocalGlobe and Sands Capital Ventures, alongside Sky, GMG Ventures, Entrepreneur First…
Recent Posts
- New Nvidia drivers should fix a major RTX 50 series GPU issue
- EA open sources four more Command & Conquer games
- Severance season 2 episode 7 ending explained: what happened to Gemma, who is Doctor Mauer, Chikhai Bardo meaning, and more big questions answered
- The best portable SSDs for 2025
- Aurzen Zip tri-fold projector review: mirror anything (without DRM)
Archives
- February 2025
- January 2025
- December 2024
- November 2024
- October 2024
- September 2024
- August 2024
- July 2024
- June 2024
- May 2024
- April 2024
- March 2024
- February 2024
- January 2024
- December 2023
- November 2023
- October 2023
- September 2023
- August 2023
- July 2023
- June 2023
- May 2023
- April 2023
- March 2023
- February 2023
- January 2023
- December 2022
- November 2022
- October 2022
- September 2022
- August 2022
- July 2022
- June 2022
- May 2022
- April 2022
- March 2022
- February 2022
- January 2022
- December 2021
- November 2021
- October 2021
- September 2021
- August 2021
- July 2021
- June 2021
- May 2021
- April 2021
- March 2021
- February 2021
- January 2021
- December 2020
- November 2020
- October 2020
- September 2020
- August 2020
- July 2020
- June 2020
- May 2020
- April 2020
- March 2020
- February 2020
- January 2020
- December 2019
- November 2019
- September 2018
- October 2017
- December 2011
- August 2010