Recently, the third quarter results of SpeechIO TIOBE evaluation (referred to as "review") were announced, and Himalaya automatic speech recognition technology (referred to as "ASR") won the first place in this evaluation. Himalaya's technology has been widely used in the "AI manuscript function" of Himalaya App, bringing readers a content consumption experience that integrates listening and listening.
SpeechIO TIOBE evaluation is a relatively authoritative industry public evaluation project in China. It aims to objectively evaluate and record the recognition accuracy of each public speech recognition service in different fields, and use word accuracy as a test indicator. The review is conducted every quarter, with Himalaya ASR technology standing out in the third quarter of this year, winning the championship with an ultra-low error rate of 2.16%. Other companies participating in this review include Yitu, Tencent , Bilibili, Ali , Microsoft , iFLYTEK, Baidu , etc.
Himalaya ASR technology is an important voice technology developed by the Himalaya Intelligent Voice Laboratory. This technology can transliterate the audio content without manuscripts on the Himalaya platform and output the corresponding text, so that the listener can better understand the audio content. With the increase in the utilization rate of voice recognition function, extreme optimization of details has become the key to the victory of technical products. During the research and development of Himalaya, based on WeNet, it developed a self-developed "end-to-end" speech recognition framework, and deeply optimized the entire links such as data reading, model structure, training methods, hot word enhancement, and deployment processes. It constantly tried new paper solutions and integrated them into the self-developed framework, thus effectively reducing the error rate and reaching the industry-leading level.
Himalaya ASR technology has now been widely used in the AI manuscript function of the Himalaya App, which can effectively identify the content without manuscript and generate manuscripts for the content without manuscript. At the same time, for the sound content that already has original manuscripts, the Himalaya AI manuscript function applies the alignment technology of ultra-long audio and text, time stamping the sound and the manuscript, and synchronously highlighting the corresponding text while playing the sound, allowing users to enjoy the content consumption experience of listening and watching more conveniently.
Recently, Himalaya will launch a new version of AI manuscript function to comprehensively improve the user experience. Please stay tuned.
Himalaya has been studying in the field of AI voice technology for many years, and has specially established the core department of Himalaya Intelligent Voice Laboratory, which has long been focusing on the research and development of speech synthesis, speech recognition, speech signal processing, encoding and decoding, and intelligent sound effects. In addition to ASR technology, Himalaya's TTS (voice synthesis) technology is also at the forefront of the industry and has been widely used in the production of various contents such as storytelling , news, novels, etc., which is helping Himalaya further expand the possibility of AIGC in addition to the existing "UGC + PGC + PUGC" content ecosystem. At the same time, Himalaya's self-developed innovative cross-language speech synthesis technology papers, as well as speaker log technology related papers jointly developed by University of Science and Technology of China, have been recruited twice by ICASSP (International Conference on Acoustics, Speech, and Signal Processing), the top international audio conference, demonstrating Himalaya's strong strength in the field of voice technology.
In the future, Himalaya will continue to use technology to empower culture, improve content consumption experience, enrich the content ecology, and use technology to bless sound and serve life with sound.