The research in the area of machine learning and AI, now a key technology in virtually every industry and business, is far too voluminous for anyone to read it all. This column, Perceptron, aims to bring together some of the most relevant recent discoveries and papers – particularly, but not limited to, artificial intelligence – and explain why they matter.
In this batch of recent research, Meta has opened up a language system it claims is the first capable of translating 200 different languages with “state-of-the-art” results. Not to be outdone, Google has detailed a machine learning model, Minerva, which can solve quantitative reasoning problems, including math and science questions. And Microsoft released a language model, Godel, to generate “realistic” conversations that go along the lines of Google’s widely hyped Lamda. And then we have new text to image generators with a twist.
Meta’s new model, NLLB-200, is part of the company’s No Language Left Behind initiative to develop machine translation capabilities for most of the world’s languages. Trained to understand languages such as Kamba (spoken by the Bantu ethnic group) and Lao (the official language of Laos), as well as more than 540 African languages poorly or not at all supported by previous translation systems, NLLB-200 will be used to translate languages on the Facebook News Feed and Instagram in addition to the Wikimedia Foundation’s Content Translation Tool, Meta recently announced.
AI translation has the potential to scale dramatically – and already has scale – the number of languages that can be translated without human expertise. But as some researchers have noted, errors covering incorrect terminology, omissions, and translation errors can occur in AI-generated translations because the systems are trained largely on data from the Internet, which are not all of high quality. For example, Google Translate used to assume that doctors were male while nurses were female, while Bing’s translator translated phrases like “the table is sweet” as the feminine “die Tabelle” in German (which refers to to a table of numbers).
For NLLB-200, Meta said it had “completely revised” its data cleansing pipeline with “major filtering steps” and toxicity filtering lists for the full set of 200 languages. How well this works in practice remains to be seen, but – as the Meta researchers behind NLLB-200 acknowledge in an academic paper outlining their methods – no system is completely free from bias.
Godel, similarly, is a language model trained on a large amount of text from the web. However, unlike NLLB-200, Godel was designed to handle “open” dialogue – conversations on a range of different topics.
Godel can answer a question about a restaurant or have a dialogue about a particular topic, like the history of a neighborhood or a recent sports game. Helpfully, and like Google Lamda, the system can draw on content from around the web that was not part of the training dataset, including restaurant reviews, Wikipedia articles, and other content on public websites.
But Godel encounters the same pitfalls as NLLB-200. In a paper, the team responsible for its creation notes that it “can generate harmful responses” due to “forms of social bias and other toxicities” in the data used to train it. Eliminating, or even mitigating, these biases remains an unresolved challenge in the field of AI – a challenge that may never be fully resolved.
Google’s Minerva model is less potentially problematic. As the team behind describes in a blog post, the system learned from a 118GB dataset of scientific papers and web pages containing mathematical expressions to solve quantitative reasoning problems without using tools. external as a calculator. Minerva can generate solutions that include numerical calculations and “symbolic manipulation”, achieving peak performance on popular STEM benchmarks.
Minerva is not the first model developed to solve this type of problem. To name a few, Alphabet’s DeepMind has demonstrated several algorithms that can help mathematicians with complex and abstract tasks, and OpenAI has experimented with a system trained to solve math problems at the elementary school level. But Minerva incorporates recent techniques to better solve mathematical questions, the team says, including an approach that involves “prompting” the model with several step-by-step solutions to existing questions before presenting it with a new question.
Minerva always makes her share of mistakes and sometimes arrives at a correct final answer but with flawed reasoning. Still, the team hopes it will serve as the basis for models that “help push the frontiers of science and education.”
The question of what AI systems actually “know” is more philosophical than technical, but how they organize that knowledge is a fair and valid question. For example, an object recognition system may show that it “understands” that domestic cats and tigers are similar in some way by deliberately allowing concepts to overlap in the way it identifies them – or perhaps which he doesn’t really understand and the two types of creatures are completely unrelated to it.
Researchers at UCLA wanted to see if language models “understand” words in this sense, and developed a method called “semantic projection” which suggests that yes, they do. While you can’t just ask the model to explain how and why a whale is different from a fish, you can see how closely it associates these words with other words, like mammal, big, Balance, etc. If the whale associates strongly with mammals and large but not with scales, you know it has a good idea of what it is talking about.
As a simple example, they found that animal coincided with the concepts of size, sex, danger, and humidity (the selection was a little weird) while states coincided with weather, wealth, and partisanship. Animals are nonpartisan and states are genderless, so all leads.
There’s no surer test today of whether a model understands certain words than asking them to draw them – and text-to-image models just keep getting better. Google’s “Pathways Autoregressive Text-to-Image” or Parti model seems to be one of the best to date, but it’s hard to compare it to the competition (DALL-E et al.) Without access, which few of models offered. You can read about the Party’s approach here, anyway.
An interesting aspect of Google’s write-up shows how the model works with an increasing number of parameters. See how the image improves as the numbers increase:
Does this mean that the best models will all have tens of billions of parameters, meaning they will take years to train and run on supercomputers alone? For now, sure – it’s kind of a brute-force approach to making things better, but the “tick-tock” of the AI means the next step isn’t just to make it more bigger and better, but to make it smaller and equivalent. We’ll see who can pull it off.
Not one to be left out, Meta also showcased a generative AI model this week, though it claims to give artists who use it more agency. Having played around with these generators myself a lot, part of the fun is seeing what they come up with, but they often come up with nonsensical layouts or don’t “get” the prompt. Meta’s Make-A-Scene aims to solve this problem.
It’s not entirely an original idea – you paint a basic silhouette of what you’re talking about and that serves as a base to generate an image on top of. We saw something like this in 2020 with Google’s Nightmare Generator. This is a similar concept but scaled to allow it to create realistic images from text prompts using the sketch as a base but with plenty of room for interpretation. Might be useful for artists who have a general idea of what they’re thinking but want to include the limitless and weird creativity of the model.
Like most of these systems, Make-A-Scene is not actually available for public use, because like the others, it is quite computationally intensive. Don’t worry, we’ll have some decent versions of these things at home soon.