UHG
Search
Close this search box.

OpenAI’s Voice Engine Can Recreate Human Voices with Emotions

Voice Engine can create emotive and realistic voices with a single 15- second sample. 

Share

OpenAI GPT-5 Safety and Security
Listen to this story

OpenAI announced its latest model Voice Engine, built to generate natural-sounding speech from text input and a mere 15-second audio sample. Notably, Voice Engine can create emotive and realistic voices using this brief audio input.

OpenAI said the Voice Engine project commenced in late 2022, initially powering preset voices within OpenAI’s text-to-speech API, ChatGPT Voice, and Read Aloud features. However, due to concerns about potential misuse, the company has not yet released it to the public, similar to its text-to-video generation model, Sora.

“We recognize that generating speech that resembles people’s voices has serious risks, which are especially top of mind in an election year. We are engaging with U.S. and international partners from across government, media, entertainment, education, civil society and beyond to ensure we are incorporating their feedback as we build,” the company wrote in the blog post. 

OpenAI has been privately testing Voice Engine with a small group of trusted partners. Early trials have yielded promising applications across various sectors. These include:

Enhancing Education: Age of Learning, an education technology company, utilises Voice Engine to create pre-scripted voice-over content for reading assistance among non-readers and children. The integration of Voice Engine and GPT-4 enables personalised real-time interactions, expanding content accessibility and audience reach.

Global Content Translation: HeyGen, an AI visual storytelling platform, employs Voice Engine for translation of videos and podcasts into multiple languages while preserving the native accent of the original speaker. This innovation facilitates global content dissemination and audience engagement.

Community Health Services: Dimagi leverages Voice Engine and GPT-4 to enhance essential service delivery in remote areas, particularly in healthcare settings. The use of interactive feedback in local languages aids community health workers in providing counseling and support services effectively.

Assistive Communication: Livox, an AI communication app, integrates Voice Engine to offer non-robotic and customisable voices for individuals with speech-related disabilities. This advancement empowers users to express themselves authentically across different languages and communication contexts.

Clinical Applications: The Norman Prince Neurosciences Institute at Lifespan explores Voice Engine’s potential in clinical settings, restoring speech for patients with speech impairments caused by medical conditions. The short audio sample requirement makes Voice Engine a viable tool for speech rehabilitation and patient care.

📣 Want to advertise in AIM? Book here

Picture of Siddharth Jindal

Siddharth Jindal

Siddharth is a media graduate who loves to explore tech through journalism and putting forward ideas worth pondering about in the era of artificial intelligence.
Related Posts
Association of Data Scientists
Tailored Generative AI Training for Your Team
Upcoming Large format Conference
Sep 25-27, 2024 | 📍 Bangalore, India
Download the easiest way to
stay informed

Subscribe to The Belamy: Our Weekly Newsletter

Biggest AI stories, delivered to your inbox every week.

Flagship Events

Rising 2024 | DE&I in Tech Summit
April 4 and 5, 2024 | 📍 Hilton Convention Center, Manyata Tech Park, Bangalore
Data Engineering Summit 2024
May 30 and 31, 2024 | 📍 Bangalore, India
MachineCon USA 2024
26 July 2024 | 583 Park Avenue, New York
MachineCon GCC Summit 2024
June 28 2024 | 📍Bangalore, India
Cypher USA 2024
Nov 21-22 2024 | 📍Santa Clara Convention Center, California, USA
Cypher India 2024
September 25-27, 2024 | 📍Bangalore, India
discord icon
AI Forum for India
Our Discord Community for AI Ecosystem, In collaboration with NVIDIA.