Of the approximately 7,000 known languages ​​in the world, nearly half of them, or four out of ten are word-of-mouthed and do not contain written content.

2025/06/2612:39:36 hotcomm 1998

Of all the approximately 7,000 known languages ​​in the world, nearly half of them are spoken by word of mouth and do not contain written content. These textless languages ​​present a unique problem for modern machine learning translation systems, as they often require converting spoken language into written text and reverting text to voice before translating to a new language, but Meta has addressed this problem with its latest open source language AI advances.

Of the approximately 7,000 known languages ​​in the world, nearly half of them, or four out of ten are word-of-mouthed and do not contain written content. - DayDayNews

As part of Meta's Universal Voice Translator (UST) project, which is working to develop real-time voice-to-voice translation so that residents of the metaverse can interact more easily (pronounced as: sexual harassment between each other). As part of this project, Meta researchers studied Fujian Minnan dialect, a non-text language scattered across Asia and one of the mainstream languages ​​in Taiwan.

Machine learning translation systems usually require a large number of tagged language examples, including written and spoken languages ​​for training - which is exactly what non-text languages ​​like Minnan dialect don't have. To solve this problem, "Meta uses voice-to-unit translation (S2UT) to convert the input speech directly into a sequence of acoustic units that Meta has pioneered," CEO Mark Zuckerberg explained in a blog post Wednesday. "We then generate waveforms from these units. In addition, UnitY is adopted as a dual-pass decoding mechanism, the first-pass decoder generates text in the relevant language ( Mandarin ) and the second-pass decoder creates the unit. "

" We use Mandarin as the intermediate language to create pseudo-tags, we first translate English (or Minnan dialect as mentioned above) into Mandarin text, and then we translate it into Minnan dialect (or English) and add it to the training data. "Currently, the system allows people who speak Fujian dialect to talk to English speakers, albeit bluntly, and the model can only translate one full sentence at a time. But Zuckerberg believes that the technology can eventually be applied to more languages ​​and will be improved to the point where it provides real-time translation.

Zuckerberg announced that in addition to Meta's open source model and training data from this project, the company will also release the first voice translation benchmark system based on the Minnan dialect discourse corpus, as well as "voice matrix, a voice translation library mined with Meta's innovative data mining technology LASER." This system will enable researchers to create their own voice-to-speech translation (S2ST) system.

hotcomm Category Latest News

Jiangxi Radio and Television Station and Chongqing Radio and Television Group focus on the main line and make precise planning, especially on the two platforms of satellite video channel and mobile, innovatively launch diverse programs, and do special editing and broadcasting wit - DayDayNews

Jiangxi Radio and Television Station and Chongqing Radio and Television Group focus on the main line and make precise planning, especially on the two platforms of satellite video channel and mobile, innovatively launch diverse programs, and do special editing and broadcasting wit

[Welcome the top 20] Jiangxi TV and Chongqing TV: Give full play to the advantages of satellite TV and mobile terminals, strengthen planning, create new products, strengthen choreography, and strong voice