**ChatGPT Needs Some Help With Math Assignments**

‘Large language models’ supply grammatically correct answers but struggle with calculations

Josh Zumbrun , WSJ

Feb. 3, 2023 5:30 am ET

The artificial-intelligence chatbot ChatGPT has shaken educators since its November release. New York City public schools have banned it from their networks and school devices, and professors are revamping syllabi to prevent students from using it to complete their homework. The chatbot’s creator, OpenAI, even unveiled a tool to detect text generated by artificial intelligence to prevent abuse from cheaters, spammers and others.

There is, perhaps surprisingly, one subject area that doesn’t seem threatened. It turns out ChatGPT is quite bad at math.

“I’m not hearing math instructors express concern about it,” said Paul von Hippel, a professor at the University of Texas who studies data science and statistics and has written an essay about ChatGPT’s mathematical limitations. “I’m not sure it’s useful for math at all, which feels strange because mathematics was the first-use case for computing devices.”

While the bot gets many basic arithmetic questions correct, it stumbles when those questions are written in natural language. For example, ask ChatGPT “if a banana weighs 0.5 lbs and I have 7 lbs of bananas and nine oranges, how many pieces of fruit do I have?” The bot’s quick reply: “You have 16 pieces of fruit, seven bananas and nine oranges.”

It isn’t hard, and in fact is a little entertaining, to feed the bot questions to which it responds with confident nonsense.

If you ask ChatGPT who is taller, Shaquille O’Neal or Yao Ming, the bot accurately says Yao is 7’6” and O’Neal is 7’1” but then concludes that Shaq is taller. The bot miscalculates the square roots of large numbers. Ask it to show its math, and it often produces detailed formulas that look great but contain errors, such as 2 x 300 = 500.

I asked ChatGPT to write five simple algebra problems and then to provide the answers. The AI only answered three of its own problems correctly.

ChatGPT’s struggle with math is inherent in this type of artificial intelligence, known as a large language model. It scans enormous reams of text from across the web and develops a model about what words are likely to follow others in a sentence. It’s a more sophisticated version of autocomplete that, after you type “I want to” on your device, guesses the next words are “dance with somebody,” “know what love is” or “be with you everywhere.”

ChatGPT, OpenAI’s new artificially intelligent chatbot, can write essays on complex topics. WSJ’s Joanna Stern went back to high school AP Literature for a day to see whether she could pass the class using just AI. Photo illustration: Elena Scotti

A Mad Libs-proficient supercomputer might be extremely effective for writing grammatically correct responses to essay prompts, but not for solving a math problem. That is the Achilles’ heel of ChatGPT: It responds in authoritative-sounding language with numbers that are grammatically correct and mathematically wrong.

As Mr. von Hippel wrote, “It acts like an expert, and sometimes it can provide a convincing impersonation of one. But often it is a kind of b.s. artist, mixing truth, error and fabrication in a way that can sound convincing unless you have some expertise yourself.”

In an email, I asked Debarghya Das, a search-engine engineer who has tweeted examples of ChatGPT botching basic math, why it gets some simple questions right but others completely wrong. “Maybe the right analogy is if you ask a room of people who have no idea what math is but have read many hieroglyphics, ‘What comes after 2+2,’ they might say, ‘Usually, we see a 4.’ That’s what ChatGPT is doing.” But, he adds, “math isn’t just a series of hieroglyphics, it’s computation.”

It isn’t great for faking your way through a math class because you only recognize the mistakes if you know the math. If it’s all hieroglyphics to you, the wrong answers seem plausible.

OpenAI Chief Executive Sam Altman said in December on Twitter that “ChatGPT is incredibly limited, but good enough at some things to create a misleading impression of greatness. It’s a mistake to be relying on it for anything important right now.”

When you begin a conversation with ChatGPT it warns up front, “While we have safeguards in place, the system may occasionally generate incorrect or misleading information.”

Another reason that math instructors are less fussed by this innovation it that they have been here before. The field was upended for the first time decades ago with the general availability of computers and calculators.

No, the answer is X=7/3.

PHOTO: SCREENSHOT: THE WALL STREET JOURNAL

“Math has had the biggest revolution based on machinery of any mainstream subject I could ever have thought of,” said Conrad Wolfram, the strategic director of Wolfram Research, which developed Mathematica, a technical computing software program, as well as Wolfram Alpha, a website for answering math queries.

Whereas English teachers are only now worrying about computers doing their students’ homework, math teachers have long wrestled with making sure students were actually learning and not just using a calculator. It’s why students have to show their work and take tests on paper.

The broader lesson is that AI, computers and calculators aren’t simply a shortcut. Math tools require math knowledge. A calculator can’t do calculus unless you know what you’re trying to solve. If you don’t know any math, Excel is just a tool for formatting tables with a lot of extra buttons.

SHARE YOUR THOUGHTS

What do you see as the promise and the pitfalls of ChatGPT? Join the conversation below.

“In the real world, since computers came along, have math, science and engineering gotten conceptually simpler? No, completely the opposite. We’re asking harder and harder questions, going up a level,” Mr. Wolfram said.

Eventually, artificial intelligence will probably get to the point where its mathematics answers are not only confident but correct. A pure large language model might not be up for the job, but the technology will improve. The next generation of AI could combine the language skills of ChatGPT with the math skills of Wolfram Alpha.

In general, however, AI, like calculators and computers, will likely ultimately be most useful for those who already know a field well: They know the questions to ask, how to identify the shortcomings and what to do with the answer. A tool, in other words, for those who know the most math, not the least.

Write to Josh Zumbrun at josh.zumbrun@wsj.com

## Comments