【2020 Application Example】 AI Voice Synthesis Module, Bringing Warmth to Machine Narration
In response to current trends, digital learning and mobile educational materials have attracted widespread attention!
With rapid technological advancements, effectively nurturing professionals who can 'adapt to developmental changes' is a critical concern that many businesses continually consider. Over recent years, various enterprises have progressively integrated 'digital learning' into employee training programs to enhance educational outcomes, thus bringing 'digital learning' and 'mobile educational materials' into the limelight.
Outsourced narration is costly and cannot handle large volumes of demand
▸ Differences in the digital educational material production process before and after the implementation of the AI voice synthesis system
Strategic Breakthrough Corporation of Taiwan has assisted companies in converting many seminars, physical courses, and training events conducted by public sectors into digital materials in the past years. However, during the conversion process, it required inviting teachers, finding and renting filming locations, and post-production of recordings and videos. During recording, issues such as speakers' nervousness, discomfort in front of cameras, or mispronunciations might lead to poor recording quality or constant retakes.
Though there was an option to provide customer-specific educational material narration, the outsourcing costs were high and could not handle the demand efficiently. Therefore, there was a hope to introduce AI speech synthesis technology and develop an 'Intelligent Voice Synthesis Module' to instantly convert text on slides into natural, human-like voice files, thus saving on narration costs.
Realistic Intelligent Voice Synthesis Module, providing a diversified selection of voices
▸ AI Voice Synthesis Module Illustration
Strategic Corporation of Taiwan collaborated with the AI technology team, Magic Cube Digital Ltd., using Tacotron2 combined with WaveNet and Tacotron features. Characters are embedded into Mel-scale spectrogram plots, then a modified WaveNet model acting as the vocoder synthesizes waveform in the time domain from these spectrograms, finally developing an MOS (Mean Opinion Score) for voice quality evaluation that approximates human-like intelligent voice synthesis modules.
This AI Intelligent Voice Synthesis Module, after being tested by testers using the MOS voice quality evaluation standard, received a score of 4.3, meeting the initial project target score of 4.21 and surpassing WaveNet's score of 4.08, thereby demonstrating exceptional effectiveness!
AI Intelligent Voice Synthesis Module, reducing costs and increasing profits, will effectively enhance Taiwan's digital learning industry environment!
▸ Costs have been significantly reduced after the implementation of the AI voice system, and profits have increased relatively
This AI Intelligent Voice Synthesis Module not only reduces the cost of producing digital educational materials but also solves the difficulties faced by Taiwan's industry, government, and academia in spreading digital educational materials. It can effectively enhance the efficiency of customers in producing digital teaching materials, significantly reduce labor shortages, and cost structural risks, and improve profitability.
Strategic Corporation of Taiwan will also continue to develop the 'Intelligent Transcription Module' and introduce Robotic Process Automation (RPA) to replace the current manual processes, such as captioning, dubbing, and file conversion in the production of digital educational materials, assisting in the transformation and enhancement of the domestic digital learning industry.
「Translated content is generated by ChatGPT and is for reference only. Translation date:2024-05-19」