aster.cloud aster.cloud
  • /
  • Platforms
    • Public Cloud
    • On-Premise
    • Hybrid Cloud
    • Data
  • Architecture
    • Design
    • Solutions
    • Enterprise
  • Engineering
    • Automation
    • Software Engineering
    • Project Management
    • DevOps
  • Programming
    • Learning
  • Tools
  • About
  • /
  • Platforms
    • Public Cloud
    • On-Premise
    • Hybrid Cloud
    • Data
  • Architecture
    • Design
    • Solutions
    • Enterprise
  • Engineering
    • Automation
    • Software Engineering
    • Project Management
    • DevOps
  • Programming
    • Learning
  • Tools
  • About
aster.cloud aster.cloud
  • /
  • Platforms
    • Public Cloud
    • On-Premise
    • Hybrid Cloud
    • Data
  • Architecture
    • Design
    • Solutions
    • Enterprise
  • Engineering
    • Automation
    • Software Engineering
    • Project Management
    • DevOps
  • Programming
    • Learning
  • Tools
  • About
  • Technology

Advanced audio dialog and generation with Gemini 2.5

  • aster.cloud
  • June 15, 2025
  • 3 minute read

Here’s a closer look at what’s new in Gemini 2.5 for audio dialog and generation.

Gemini is built from the ground up to be multimodal, natively understanding and generating content across text, images, audio, video and code. At I/O we showed how Gemini 2.5 marks a significant step forward with new capabilities in AI-powered audio dialog and generation.


Partner with aster.cloud
for your next big idea.
Let us know here.



From our partners:

CITI.IO :: Business. Institutions. Society. Global Political Economy.
CYBERPOGO.COM :: For the Arts, Sciences, and Technology.
DADAHACKS.COM :: Parenting For The Rest Of Us.
ZEDISTA.COM :: Entertainment. Sports. Culture. Escape.
TAKUMAKU.COM :: For The Hearth And Home.
ASTER.CLOUD :: From The Cloud And Beyond.
LIWAIWAI.COM :: Intelligence, Inside and Outside.
GLOBALCLOUDPLATFORMS.COM :: For The World's Computing Needs.
FIREGULAMAN.COM :: For The Fire In The Belly Of The Coder.
ASTERCASTER.COM :: Supra Astra. Beyond The Stars.
BARTDAY.COM :: Prosperity For Everyone.

We’re already using these models to bring audio to users globally, across numerous products, prototypes and languages. NotebookLM’s Audio Overviews and Project Astra are just two examples. Here’s a closer look at what you can do with Gemini 2.5 native audio capabilities.

Real-time audio dialog

Human conversation is rich and nuanced, with meaning conveyed not just by what is said, but how it’s spoken — through tone, accent and even non-speech vocalizations, like laughter. We believe conversation will be a key way we interact with AI. That’s why Gemini reasons and generates speech natively in audio, enabling effective, real-time communication.

Native audio dialog with Gemini 2.5 Flash preview features:

  • Natural conversation: Voice interactions of remarkable quality, more appropriate expressivity, and prosody (patterns of rhythm), delivered with very low latency so you can converse fluidly.
  • Style control: Using natural language prompts, you can adapt the delivery within the conversation, steering it to adopt specific accents, produce a range of tones and expressions and even whisper.
  • Tool integration: Gemini 2.5 can use tools and function calling during dialog. This allows it to incorporate real-time information from sources like Google Search or use custom developer-built tools, making conversations more practical.
  • Conversation context awareness (proactive audio): Our system is trained to discern and disregard background speech, ambient conversations and other irrelevant audio, responding when appropriate. Basically, it understands when not to speak.
  • Audio-video understanding: With native support from streaming audio and video, Gemini 2.5 can converse with you about what it sees in a video feed or through screen sharing.
  • Multilinguality: Converse in any of our 24+ supported languages, or even easily mix languages within the same phrase.
  • Affective dialog: Gemini 2.5 responds to the user’s tone of voice, recognizing that the same words spoken differently can lead to very different conversations.
  • Advanced thinking dialog: Gemini’s reasoning capabilities can enhance its conversation, leading to overall better performance across all features. This leads to more coherent and intelligent interactions, particularly for complex reasoning tasks.
Read More  Google I/O 2019 | The State of Unity on Android

Controllable text-to-speech (TTS)

The evolution of text-to-speech technology is moving rapidly, and with our latest models, we’re moving beyond naturalness to giving unprecedented control over generated audio. Now you can generate anything from short snippets to long-form narratives, precisely dictating style, tone, emotional expression and performance — all steerable through natural language prompts.

Additional controls and capabilities include:

  • Dynamic performance: These models can bring text to life for expressive readings for anything from poetry to newscasts to engaging storytelling. They can also perform with specific emotions and produce accents when requested.
  • Enhanced pace and pronunciation control: Control delivery speed and ensure more accuracy in pronunciation, including for specific words.
  • Multi-speaker dialogue generation: This model can generate two-person “NotebookLM-style” audio overview from text input, making content more engaging through conversation.
  • Multilinguality: Create multilingual audio content effortlessly with Gemini 2.5, offering the same support for more than 24 languages.

For controllable speech generation (TTS), choose Gemini 2.5 Pro Preview for state-of-the-art quality on complex prompts, or Gemini 2.5 Flash Preview for cost-efficient everyday applications. This allows developers to dynamically create audio for announcements, stories, podcasts, video games and more.

Safety and responsibility

We’ve proactively assessed potential risks throughout every stage of the development process for these native audio features, using what we’ve learned to inform our mitigation strategies. We validate these measures through rigorous internal and external safety evaluations, including comprehensive red teaming for responsible deployment. Additionally, all audio outputs from our models are embedded with SynthID, our watermarking technology, to ensure transparency by making AI-generated audio identifiable.

Read More  Apple Supercharges 24‑inch iMac With New M3 Chip

Native audio capabilities for developers

We’re bringing native audio outputs to Gemini 2.5 models, giving developers new capabilities to build richer, more interactive applications via the Gemini API in Google AI Studio or Vertex AI.

To begin exploring, developers can try native audio dialog with Gemini 2.5 Flash preview in Google AI Studio’s stream tab. Controllable speech generation (TTS) is available in preview for both Gemini 2.5 Pro and Flash by selecting speech generation in the generate media tab within Google AI Studio.

Source: zedreviews.com


For enquiries, product placements, sponsorships, and collaborations, connect with us at [email protected]. We'd love to hear from you!

Our humans need coffee too! Your support is highly appreciated, thank you!

aster.cloud

Related Topics
  • Gemini
  • google
You May Also Like
View Post
  • Gears
  • Technology

Samsung Art Store Brings Art Basel to Homes Worldwide With New Curated Collection

  • June 15, 2026
View Post
  • Technology

The consequences of relying on AI for accurate news

  • June 10, 2026
View Post
  • Gears
  • Technology

WWDC26: Apple unveils next generation of Apple Intelligence, Siri AI, powerful parental controls, and an expansive set of software improvements

  • June 8, 2026
View Post
  • Technology

IBM and Google Cloud Announce Strategic Partnership to Scale AI with Human Expertise and AI‑Powered Delivery

  • June 4, 2026
View Post
  • Technology

Banks race to patch new cyber vulnerabilities, and other cybersecurity news

  • May 25, 2026
pope-leo-xiv-cq5dam-1500.844
View Post
  • Technology

Pope Leo XIV to Publish First Encyclical on Artificial Intelligence and Human Dignity on 25 May

  • May 22, 2026
View Post
  • Technology

Portfolio to Clients, and is Strengthened by Ongoing Project Glasswing Work

  • May 20, 2026
reMarkable Paper Pure
View Post
  • Gears
  • Technology

Everything The reMarkable Paper Pure Actually Does

  • May 14, 2026

Stay Connected!
LATEST
  • 1
    Expectations vs. Reality: The AI We Thought We’d Have in 10 Years
    • June 19, 2026
  • digital-nomad-freelancer-worker-2151205464 2
    One paperwork problem – Get your Digital Nomad Visa employment documents fast from UK, EU or Singapore
    • June 16, 2026
  • 3
    Samsung Art Store Brings Art Basel to Homes Worldwide With New Curated Collection
    • June 15, 2026
  • 4
    You Do Not Need to Invest in the IPO of SpaceX, Anthropic, and OpenAI
    • June 10, 2026
  • 5
    The consequences of relying on AI for accurate news
    • June 10, 2026
  • 6
    Connecting AI agents with unstructured data using Google Cloud Storage MCP Servers
    • June 10, 2026
  • 7
    WWDC26: Apple unveils next generation of Apple Intelligence, Siri AI, powerful parental controls, and an expansive set of software improvements
    • June 8, 2026
  • 8
    IBM and Google Cloud Announce Strategic Partnership to Scale AI with Human Expertise and AI‑Powered Delivery
    • June 4, 2026
  • Data center 9
    Data Sovereignty in Spain. It’s Not Just About the Law, It’s About Efficiency
    • June 3, 2026
  • 10
    Ink vs Pixels. What you miss versus what you are actually missing.
    • June 1, 2026
about
Hello World!

We are aster.cloud. We’re created by programmers for programmers.

Our site aims to provide guides, programming tips, reviews, and interesting materials for tech people and those who want to learn in general.

We would like to hear from you.

If you have any feedback, enquiries, or sponsorship request, kindly reach out to us at:

[email protected]
Most Popular
  • 1
    Banks race to patch new cyber vulnerabilities, and other cybersecurity news
    • May 25, 2026
  • pope-leo-xiv-cq5dam-1500.844 2
    Pope Leo XIV to Publish First Encyclical on Artificial Intelligence and Human Dignity on 25 May
    • May 22, 2026
  • 3
    Portfolio to Clients, and is Strengthened by Ongoing Project Glasswing Work
    • May 20, 2026
  • reMarkable Paper Pure 4
    Everything The reMarkable Paper Pure Actually Does
    • May 14, 2026
  • 5
    Scaling cloud and AI: Microsoft Azure’s commitment to Europe’s digital future
    • May 11, 2026
  • /
  • Technology
  • Tools
  • About
  • Contact Us

Input your search keywords and press Enter.