Google’s Gemini 1.5 Pro can now hear
Google’s update to Gemini 1.5 Pro gives the model ears. The model can now listen to uploaded audio files and churn out information from things like earnings calls or audio from videos without the need to refer to a written transcript.
During its Google Next event, Google also announced it’ll make Gemini 1.5 Pro available to the public for the first time through its platform to build AI applications, Vertex AI. Gemini 1.5 Pro was first announced in February.
This new version of Gemini Pro, which is supposed to be the middle-weight model of the Gemini family, already surpasses the biggest and most powerful model, Gemini Ultra, in performance. Gemini 1.5 Pro can understand complicated instructions and eliminates the need to fine-tune models, Google claims.
Gemini 1.5 Pro is not available to people without access to Vertex AI. Right now, most people encounter Gemini language models through the Gemini chatbot. Gemini Ultra powers the Gemini Advanced chatbot, and while it is powerful and also able to understand long commands, it’s not as fast as Gemini 1.5 Pro.
Gemini 1.5 Pro is not the only large AI model from Google getting an update. Imagen 2, the text-to-image generation model that helps power Gemini’s image-generation capabilities, will also add inpainting and outpainting, which let users add or remove elements from images. Google also made its SynthID digital watermarking feature available on all pictures created through Imagen models. SynthID adds an invisible to the viewer watermark on images that marks its provenance when viewed through a detection tool.
Google says it’s also publicly previewing a way to ground its AI responses with Google Search so they answer with up-to-date information. That’s not always a given with the responses produced by large language models, sometimes intentionally; Google has intentionally kept Gemini from answering questions related to the 2024 US election.
Google’s update to Gemini 1.5 Pro gives the model ears. The model can now listen to uploaded audio files and churn out information from things like earnings calls or audio from videos without the need to refer to a written transcript. During its Google Next event, Google also announced it’ll…
Recent Posts
- Which Amazon Fire Stick do I need? A simple guide to the key differences
- Stellar Blade’s slick-looking sequel is officially called Blood Rain
- How much data does your favorite messaging app collect? New study shows 90% of messaging apps now include AI that puts privacy at risk
- More than a decade later, the team behind N++ is back with a multiplayer sequel
- If Vampire Survivors and Spelunky had a baby, it’d be Messhof’s Blood Dungeon
Archives
- June 2026
- May 2026
- April 2026
- March 2026
- February 2026
- January 2026
- December 2025
- November 2025
- October 2025
- September 2025
- August 2025
- July 2025
- June 2025
- May 2025
- April 2025
- March 2025
- February 2025
- January 2025
- December 2024
- November 2024
- October 2024
- September 2024
- August 2024
- July 2024
- June 2024
- May 2024
- April 2024
- March 2024
- February 2024
- January 2024
- December 2023
- November 2023
- October 2023
- September 2023
- August 2023
- July 2023
- June 2023