Nvidia to launch the Spritzler M1 chip. Capable of generating over 1,000 offensive comments per second, the 4th generation chip is considered the future of modern journalism.
It has stored in its over 3 million terabytes of RM memory over 40 years of Spritzler Report content.
How a Shifting AI Chip Market Will Shape Nvidia’s Future
Customers’ needs are evolving as the artificial-intelligence industry transitions, presenting an opportunity for Nvidia and rivals
By Asa Fitch, WSJ
Feb. 25, 2024 5:30 am ET
Nvidia’s chips used for training AI systems are expected to remain in high demand for the foreseeable future.
The AI chip battle that Nvidia has dominated is already shifting to a new front—one that will be much larger but also more competitive.
Nvidia built itself into a $2 trillion company by supplying the chips essential for the incredibly complicated work of training artificial-intelligence models. As the industry rapidly evolves, the bigger opportunity will be selling chips that make those models run after they are trained, churning out text and images for the fast-growing population of companies and people actually using generative AI tools.
Right now, that shift is adding to Nvidia’s blockbuster sales. Chief Financial Officer Colette Kress said this past week that more than 40% of Nvidia’s data center business in the past year—when revenue exceeded $47 billion—was for deployment of AI systems and not training. That percentage was the first significant indication that the shift is under way.
Kress’s comments allayed some concerns that the shift toward chips for deploying AI systems—those that do what is called “inference” work—threatens Nvidia’s position because that work can be done with less-powerful and less-expensive chips than those that have made Nvidia the leader of the AI boom.
A weekly digest of tech reviews, headlines, columns and your questions answered by WSJ's Personal Tech gurus.
“There is a perception that Nvidia’s share will be lower in inferencing vs. training,” Ben Reitzes, an analyst at Melius Research, said in a note to clients. “This revelation helps shed light on its ability to benefit from the coming inferencing explosion.”
Many rivals believe they have a better shot in the AI market as chips for inferencing become more important.
Intel, which makes central processing units that go into data centers, believes its chips will be increasingly appealing as customers focus on driving down the cost of operating AI models. The kinds of chips Intel specializes in are already widely used in inferencing, and it isn’t as critical to have Nvidia’s cutting-edge and more expensive H100 AI chips when doing that task.
“The economics of inferencing are, I’m not going to stand up $40,000 H100 environments that suck too much power and require new management and security models and new IT infrastructure,” Intel Chief Executive Pat Gelsinger said in an interview in December. “If I can run those models on standard [Intel chips], it’s a no-brainer.”
Vivek Arya, an analyst at Bank of America, said the shift toward inference was perhaps the most significant news to emerge Wednesday from Nvidia’s quarterly earnings report, which beat Wall Street forecasts and led its stock to climb 8.5% for the week, pushing the company to a roughly $2 trillion valuation.
Arya said inference would rise as the focus shifts to generating revenue from AI models following a surge of investment in training them. That could be more competitive compared with AI training, where Nvidia dominates.
The technology behind generative AI like ChatGPT has exploded, fueling a demand for chips that can handle the processing power these programs need. WSJ visited Amazon’s chip lab to see how these chips work, and why tech titans think they are the future. Illustration: John McColgan
The rate at which inference is growing may be faster than earlier expected. Early this year, UBS analysts estimated 90% of chip demand stemmed from training, and that inference would drive just 20% of the market by next year. Nvidia deriving about 40% of its data center revenue from inference was “a bigger number than we would expect,” the analysts said in a note.
Indeed, Nvidia’s financial results Wednesday suggest its market share in AI chips of more than 80% isn’t yet being challenged in a serious way. Nvidia’s chips used for training AI systems are expected to remain in high demand for the foreseeable future.
In training AI systems, companies run vast oceans of data through their models to teach them to predict language in a way that enables human-sounding expression. The work requires enormous computing ability that is well suited to Nvidia’s graphics processing units, or GPUs.
Inference work is when those models are asked to process new bits of information and respond—a lighter lift.
In addition to Nvidia’s established competitors like Intel and Advanced Micro Devices , a number of AI-chip startups may also gain steam as inference takes center stage.
“We’re seeing our inference use case exploding,” said Rodrigo Liang, chief executive of SambaNova, a startup that makes a combination of AI chips and software that can do both inferencing and training. “People are starting to realize that 80%-plus of the cost is going to be in inferencing, and I need to look for alternate solutions,” he said.
Groq, a startup founded by former Google AI-chip engineer Jonathan Ross, also has seen a surge of interest in recent months after a demo on the company’s home page showed how quickly its inference chips could generate responses from a large language model. The company is on track to deploy 42,000 of its chips this year and one million next year, but is exploring increasing those totals to 220,000 this year and 1.5 million next year, Ross said.
One factor driving the shift, he said, was that some of the most advanced AI systems were being tuned to produce better responses without retraining them, pushing more of the computational work into inference. And Groq’s specialist chips, he said, were significantly quicker and cheaper to run than Nvidia’s or other chip companies’ offerings.
“For inference, what you can deploy depends on cost,” he said. “There are a bunch of models that would get trained at Google that worked but about 80% of them didn’t get deployed because they were too expensive to put into production.”
Big tech companies—including Meta, Microsoft, Alphabet’s Google and Amazon.com—have been working to develop inference chips in-house, recognizing the coming shift and the benefits of being able to do inference more cheaply.
Amazon, for example, has had inference chips since 2018, and inference represents 40% of computing costs for its Alexa smart assistant, Swami Sivasubramanian, a vice president of data and machine learning at the company’s cloud-computing arm, said last year.
Nvidia, for its part, is seeking to stay on top as the transition toward inference proceeds. A coming chip posted industry-leading results last year in a key AI inference benchmark, extending the company’s yearslong dominance in the competition.
In December, after AMD unveiled new AI chips that it said were better than Nvidia’s at inference, Nvidia fired back in a blog post disputing the claims. AMD didn’t use optimized software in making its performance claims, Nvidia said, and if it did, Nvidia’s chips would be twice as fast.
Write to Asa Fitch at asa.fitch@wsj.com
Comments