Audio Recognition And Generation
Overview
Recently, Speech-to-Text (STT) and Text-to-Speech (TTS) technology has been boosted significantly by the wave of AI and applied to all trades and professions, such as virtual assistants, intelligent customer service, smart home, smart speakers, and voice control systems for vehicle equipment. In the future, AI speech technology will become the foundation for the "metaverse" virtual worlds. Furthermore, the AI speech technology 's key elements are STT and TTS. Our research on AI speech technologies includes multilingual speech recognition, synthesis, speaker verification, sound event detection, etc. Then, we transfer research results to API solutions and interactive applications in speech.Audio Recognition And Generation
CORE TECHNOLOGY
- Speech-to-Text
- Voiceprint Recognition/ Fake Voice Detection
- Sound Event Detection
- Text-to-Speech
- AI-enhanced audio picture books

Audio Recognition And Generation
Audio Recognition And Generation
Application Status
Audio Recognition:We use AI deep learning models to convert speech into text or extract voice attributes. Our technologies include Taiwan localized (Mandarin/English/Taiwanese/Hakka) multilingual speech recognition, voice attribute (language/gender/age/emotion) analysis, voiceprint recognition, ambient sound/event sound / fake voice detection, etc. The technical achievements have been successively applied to Chunghwa Telecom's MOD voice assistant, IVR voice navigation, outbound robot, voice of customer analysis, and more enterprise customer projects. We also won the TCCDA excellent customer service award "Best intelligent System Application Enterprise" for two consecutive years in 2023/2024.
Audio Generation:We leverage AI deep learning audio generation models to transform text into realistic speech with vivid and natural quality. Our technologies include Taiwan localized (Mandarin/English/Taiwanese/Hakka) multilingual speech synthesis, emotional speech synthesis, multilingual speaker voice conversion enabling personalized speech generation with only a few audio samples. The technical achievements are applied to the MOD voice assistant, IVR voice navigation, outbound robot, AI-enhanced audio picture books and various enterprise projects.