Why Meta’s latest large language model only survived three days online

On 15 November Meta unveiled a new large language model called Galactica, intended to assist scientists. But instead of landing with the big bang Meta hoped for, Galactica has died with a whimper after three days of intense criticism. Yesterday it took down the public demo that it had encouraged everyone to try out.

Meta’s mis-step—and hubris—shows once again that big tech has a blind spot about the severe limitations of large language models. There is a large body of research that highlights the flaws of this technology, from its tendency to reproduce prejudice and assert falsehoods as facts. 

However, Meta and other companies working on large language models, including Google, have failed to take it seriously.

Galactica is a large language model for science, trained on 48 million scientific articles, websites, textbooks, lecture notes, and encyclopedias. Meta promoted its model as a shortcut for researchers and students. In Meta’s words, Galactica “can summarize academic papers, solve math problems, generate Wiki articles, write scientific code, annotate molecules and proteins, and more.”

But the shiny veneer wore through fast. Like all language models, Galactica is a mindless bot that cannot tell fact from fiction. Within hours, scientists were sharing Galactica’s biased and incorrect results on social media. 

“I am both astounded and unsurprised by this new effort,” says Chirag Shah at the University of Washington, who studies search technologies. “When it comes to demoing these things, they look so fantastic, magical, and intelligent. But people still don’t seem to grasp that in principle such things can’t work the way we hype them up to.”

Asked for a statement on why it had removed the demo, Meta pointed MIT Technology Review to a tweet that says: “Thank you everyone for trying the Galactica model demo. We appreciate the feedback we have received so far from the community, and have paused the demo for now. Our models are available for researchers who want to learn more about the work and reproduce results in the paper.”

A fundamental problem with Galactica is that it is not able to distinguish truth from falsehood, a basic requirement for a language model designed to generate scientific text. People found that it made up fake papers (sometimes attributing them to real people), and generated wiki articles about the history of bears in space as readily as ones about protein complexes and the speed of light. It’s easy to spot fiction when it involves alien bears, but not when it is about a subject that a user does not know about.

Many scientists pushed back hard. Michael Black, director at the Max Planck Institute for Intelligent Systems in Germany, who works on deep learning, tweeted: “In all cases, it was wrong or biased but sounded right and authoritative. I think it’s dangerous.”

Even more positive opinions came with clear caveats: “Excited to see where this is headed!” tweeted Miles Cranmer, an astrophysicist at Princeton. “You should never keep the output verbatim or trust it. Basically, treat it like an advanced Google search of (sketchy) secondary sources!”

Galactica also has problematic gaps in what it can handle. Ask it to generate text on certain topics, such as “racism” and “AIDS”, and the model responded with: “Sorry, your query didn’t pass our content filters. Try again and keep in mind this is a scientific language model.”

The Meta team behind Galactica argue that language models are better than search engines: “We believe this will be the next interface for how humans access scientific knowledge,” the researchers write.

This is because language models can “potentially store, combine and reason about” information. But that “potentially” is crucial. It’s a coded admission that language models cannot yet do all these things. And they may never be able to.

“Language models are not really knowledgeable beyond their ability to capture patterns of strings of words and spit them out in a probabilistic manner,” says Shah. “It gives a false sense of intelligence.”

Gary Marcus, a cognitive scientist at New York University and vocal critic of deep learning, gave his view in a Substack post called “A Few Words About Bullshit”: the ability of large language models to mimic human-written text is nothing more than “a superlative feat of statistics.”

And yet Meta is not the only company championing the idea that language models could replace search engines. For the last couple of years, Google has been showing off its language model PaLM as a way to look up information.

It’s a tantalizing idea, because the ability of language models to mimic human-written text is remarkable. But suggesting that this text will always contain trustworthy information, as Meta appeared to do in its promotion of Galactica, is reckless and irresponsible. It was an unforced error.

And it wasn’t just the fault of Meta’s marketing team. Yann Lecun, Turing award winner and Meta’s chief scientist, defended Galactica to the end. “Type a text and Galactica will generate a paper with relevant references, formulas, and everything,“ Lecun tweeted on 15 November. Three days later, he tweeted: “Galactica demo is off line for now. It’s no longer possible to have some fun by casually misusing it. Happy?”

Galactica is not quite Meta’s Tay moment. In 2016, Microsoft launched a chatbot called Tay on Twitter. Then shut it down 16 hours later after Twitter users had taught it to be racist, homophobic and more. But Meta’s handling of Galactica smacks of the same naivety.

“Big tech companies keep doing this—and mark my words, they will not stop—because they can,” says Shah. “And they feel like they must, otherwise someone else might. They think that this is the future of information access and knowledge systems, even if nobody asked for that future.”



from MIT Technology Review https://ift.tt/UOoervQ
via gqrds

Comments