Google has announced an ambitious new project to develop a single AI language model that supports the world's "1000 most commonly used languages". As a first step toward this goal, the company unveiled an AI model trained in over 400 languages, which describes it as "the largest language coverage seen in a speech model today."

language and artificial intelligence can be said to have always been the core of Google's products, but the recent advances in machine learning - especially the development of powerful, multi-functional "large language models" or LLM - have gained new attention from these fields.
Google has begun integrating these language models into products such as Google search . But language models have many shortcomings, including a tendency to repost harmful social biases such as racism and xenophobia, and the inability to parse language with human sensitivity. Google itself notoriously fired them after its own researchers published papers outlining these issues.
However, these models are able to accomplish many tasks, from language generation (such as OpenAI's GPT-3) to translation (see Meta's "No Languages" work). Google's "1000 Language Plan" is not focused on any specific functionality, but rather creates a single system with a huge breadth of knowledge in various languages around the world.
Zoubin Ghahramani, vice president of artificial intelligence research at Google, said the company believes that creating a model of this scale will make it easier to bring various AI capabilities to languages that are underperforming in online spaces and AI training datasets (also known as "low resource languages".
"Languages are like organisms, they evolve from each other, and they have some similarities. By having a single model touch and train many different languages, we get better performance on low-resource languages," Ghahramani said. "Our approach to 1000 languages is not by building 1000 different models. Languages are like organisms, they evolved mutually, and they have some similarities. And, when we incorporate data from a new language into our 1000 language model and gain the ability to convert [what it learned] from a high-resource language to a low-resource language, we can find some pretty amazing progress in what we call zero-point learning."
Past research has shown the effectiveness of this approach, and the scale of the model in Google plans can provide more benefits than the past work. Such large-scale projects have become typical of tech companies' ambitions to dominate AI research and leverage these companies' unique advantages in obtaining large amounts of computing power and training data. A similar project is Facebook parent company Meta is trying to build a "universal voice translator".
Google says it will fund the collection of data in low-resource languages, including audio records and written text, to support the work of 1,000 language models.
The company said it has no direct plan for where to apply the model's functionality -- just expect it to have a range of uses in Google's products, from Google translation of to YouTube subtitles and more.
"The same language model can turn robot commands into code; it can solve mathematical problems; it can also be translated. One of the really interesting things about large language models and general language research is that they can do a lot of different tasks," Ghahramani said. "What's really interesting about language models is that they are becoming a repository of a lot of knowledge, and by probing them in different ways, you can get different useful features."
Google announced 1,000 language models at a new artificial intelligence product showcase. The company also shared new research on text-to-video models, a prototype of an AI writing assistant called Wordcraft, and an update to its AI test kitchen application that allows users to have limited access to AI models being developed, such as its text-to-image model Imagen.