Adapt-TTS software converts text into Vietnamese speech - a recently emerging research direction on personalized artificial speech

02/03/2024
During the conference to summarize the work in 2023 and implement the plan for 2024, for the first time, Vietnam Academy of Science and Technology used AI to automatically read the full-text report with a simulated voice identical to the President's Institute. This is the research result of Associate Professor, Dr. Luong Chi Mai - Institute of Information Technology and his colleagues, called adaptive software to convert text into Vietnamese speech, the English term is Adaptation Text-to- Speech (or Adapt-TTS for short).

Demonstration of automatically reading full-text reports with a simulated voice identical to the President of VAST at the Conference summarizing the 2023 work and implementing the 2024 plan of Vietnam Academy of Science and Technology

This is a recently emerging research direction on personalized artificial voices. Associate Professor. Luong Chi Mai’s group has developed surveys and research to answer a number of questions – including questions about the number of samples (recording time) and the training time of the personalized voice that needs to be within the threshold. how much to have practical applications, while still ensuring that the new voice has the characteristics of the sample voice. Text-to-speech (TTS) speech synthesis systems usually have to be built on large databases that are difficult to collect. This is a difficult problem in general for languages as well as for Vietnamese in particular, because Vietnamese has language specificities such as tone, intonation, and limited resources.

To create a new voice with a sample that is too small to have enough vocabulary, the proposed technique allows what is not yet available in the new voice to be borrowed from others. Adapting Vietnamese to small individual sample data with or without training using deep learning models with End-to-End architecture to create unique accents is also an advanced technology and a current topical topic in the world.

Associate Professor, Dr. Luong Chi Mai presented research results at the Summary Conference

The research results allow creating new voices with fairly short voice sampling of less than 10 minutes instead of having to sample data for up to about 10 hours as before, and have been commercially transferred to a number of Radio and Television Stations. The simulated voice of the President of Vietnam Academy of Science and Technology was performed in front of leaders of ministries, branches and central agencies, once again affirming the pioneer in research and application of technology. New technology from a leading science and technology research agency in Vietnam.

 

Translated by Quoc Khanh
Link to Vietnamese version



Tags:
Related news
ADVERTISMENTS
LINKS