中華電信研究院 | 人工智慧-聲訊辨識與生成

Audio Recognition And Generation

Overview

Recently, Speech-to-Text (STT) and Text-to-Speech (TTS) technology has been boosted significantly by the wave of AI and applied to all trades and professions, such as virtual assistants, intelligent customer service, smart home, smart speakers, and voice control systems for vehicle equipment. In the future, AI speech technology will become the foundation for the "metaverse" virtual worlds. Furthermore, the AI speech technology 's key elements are STT and TTS. Our research on AI speech technologies includes multilingual speech recognition, translation, synthesis, speaker verification, sound event detection, etc. Then, we transfer research results to API solutions and interactive applications in speech.

Audio Recognition And Generation

CORE TECHNOLOGY

Speech-to-Text/Speech Translation
Voiceprint Recognition/ Fake Voice Detection
Sound Event Detection
Text-to-Speech
Audio Watermark
AI-enhanced audio picture books

Audio Recognition And Generation

Application Status

Audio Recognition：We use AI deep learning models to convert speech into text or extract voice attributes. Our technologies include Taiwan localized (Mandarin/English/Taiwanese/Hakka) multilingual speech recognition and translation, voice attribute (language/gender/age/emotion) analysis, voiceprint recognition, ambient sound/event sound / fake voice detection, etc. The technical achievements have been successively applied to Chunghwa Telecom's MOD voice assistant, IVR voice navigation, outbound robots, voice of customer analysis, and more enterprise customer projects. We also won the TCCDA excellent customer service award "Best intelligent System Application Enterprise" for two consecutive years in 2023/2024 and Bronze Medal and Corporate Special Award at Taiwan Innotech Expo 2025.

Audio Generation：We leverage AI deep learning audio generation models to transform text into realistic speech with vivid and natural quality. Our technologies include Taiwan localized (Mandarin/English/Taiwanese/Hakka) and multilingual (Japanese/Korean/Vietnamese/Thai) speech synthesis, emotional speech synthesis, and multilingual speaker voice conversion enabling personalized speech generation with only a few audio samples. The independently developed technical achievements are applied to the AI Factory platform, IVR voice navigation, outbound robots, AI-enhanced audio picture books and various enterprise projects. Furthermore, these works received an Award of Merit in the 2024 AI Competition and the Gold Medal at the Taiwan Innotech Expo (TIE).

R&D

Audio Recognition And Generation

Overview

Audio Recognition And Generation

CORE TECHNOLOGY

Audio Recognition And Generation

Application Status