What happens when someone makes an integrated GPU that is faster than $200 discrete GPUs? If the rumors about Intel's upcoming Meteor Lake CPU are accurate, then they might be on the verge of doing exactly that. What's more, the way that they're doing it is clever and not what I anticipated.
There are three different effects here to understand. One is that integrated GPUs have long been starved for memory bandwidth. Having two channels of DDR3 and then DDR4 and then DDR5 memory bandwidth that they have to share with the CPU is just not that much for a GPU. For comparison, Intel's latest consumer Raptor Lake CPUs have up to 89.6 GB/sec of bandwidth (at 5600 MHz DDR5), while an older GeForce RTX 3090 Ti goes up to 1008 GB/sec. You could make a big integrated GPU, but there was no point if it was going to still be slow due to a lack of memory bandwidth.
The second is that GPUs can greatly reduce their memory bandwidth requirements by having a large L2 or L3 cache. Nvidia's top of the line GeForce RTX 3090 Ti had a 6 MB L2 cache and no L3. In the same generation, AMD's bottom of the line Radeon RX 6400 had a 16 MB L3 cache, and their top of the line went as high as 128 MB. If half of the memory accesses that would have gone to physical DRAM on a GeForce part got served by L3 cache on a Radeon, then the latter only needs half as much memory bandwidth to give the same performance. Nvidia has copied that approach with the GeForce RTX 4000 series, but it's not really a new idea, as the Xbox One did something similar with a 32 MB ESRAM way back in 2013.
The third effect is one that people mostly haven't thought about in the consumer space, but rumor has it that Meteor Lake will change that. If you put chips on top of an interposer to connect them, then you have an awful lot of silicon space in the interposer, but don't need to use very much of it to have wires connect chips. You usually use a very old process node to make the large interposer be cheap and give good yields. But most of it is just unused silicon, and it would be nice to actually use it.
If the rumors about Intel's upcoming "Adamantium" L4 cache are correct, then Intel has just found a great use of it. Add a bunch of L4 cache in the space on the interposer that would otherwise be unused. And while you're at it, make it not that old of a process node so that it could have quite a lot of cache. As Intel didn't start producing chips for other companies until recently, they might have a lot of free capacity in their 14 nm node, and surely have quite a lot on 22 nm if they haven't already shut down the fabs (or perhaps rather, repurposed them for newer nodes).
Meanwhile, in order to be an effective last level cache for a GPU, it isn't necessary for the cache to be low latency. Even a little higher latency than off-chip DDR4 or DDR5 DRAM is acceptable. What it does need is to be large capacity and high bandwidth. That's a perfect fit for what you can get from a large cache on an interposer.
Rumors are that the cache could be quite large. Rather than leaving much of the interposer unused, you might as well fill up whatever would have been unused with more cache. Really, though, 64 MB would be plenty enough to cut the memory bandwidth requirements of the integrated GPU by half, and larger than that would just be a nice bonus. Even so, Intel is probably planning on using this as a CPU cache, too, to rival AMD's X3D parts with the large L3 cache.
But if you can cut your memory bandwidth needs by half, then you can double the size of the integrated GPU and still have ample memory bandwidth. Rumors say that Intel is going to do exactly that. If so, then the top integrated GPU in Meteor Lake could easily beat a Radeon RX 6500 and a GeForce GTX 1650 outright.
One downside to stacking dies like this is that you can't handle as much heat without physically breaking the connection between dies. AMD has had trouble with this on their X3D parts, even though they've greatly restricted clock speeds and voltages to reduce heat output. Even so, if that means that Meteor Lake tops out at 120 W or 150 W or some such instead of 300 W, I'd see that as a good thing and not a bad thing, even if it means giving up a few percent of your performance.
You might ask, couldn't AMD just do the same thing? And maybe they could, but not at the same price point as Intel. For starters, AMD has generally gone with monolithic dies for big integrated GPUs to save money, even as higher end parts are multi-chip modules. Adding more die space to your main die on a cutting edge process node is expensive. In addition, TSMC, Samsung, and Global Foundries all have plenty of customers who don't need the cutting edge for their older process nodes, so AMD would have to pay up if they want a big cache die as an interposer. Intel's foundries don't have that.
There is also the question of whether Intel will bother to bring their largest integrated GPUs to desktops. In the past, they haven't, because it added cost and still wouldn't be very fast. But a $400 Meteor Lake desktop CPU that offers integrated GPU performance that beats a $200 discrete GPU could be a compelling part for budget gaming systems. Not needing the discrete GPU could allow Intel to offer a legitimate gaming desktop that goes ultra small form factor, too. This would burn a lot more power than their usual NUCs, but they could probably cool a 150 W part just fine in a 6" x 6" x 4" box.
Finally, there is the question of whether this will all work as intended. And this is always a huge question. But we might find out later this year.
Comments
------------
2024: 47 years on the Net.
Mixing different process nodes in the same package isn't new, of course. AMD has done that for years with their higher end Ryzen and EPYC parts, and more recently with the Radeon RX 7000 series GPUs. Having a large cache for an integrated GPU is what is new to PCs. That's what should make it possible for TSMC 5 nm to build a much better integrated GPU than we've seen so far in PCs.
------------
2024: 47 years on the Net.
Meteor Lake is basically the successor to Raptor Lake, and is a consumer desktop/laptop CPU. It's also slated to be Intel's first part to use their upcoming 4 nm process node. And for good measure, it's their first part to use tiles to mix process nodes, as AMD has been doing for several years.
I suppose that it's conceivable that Meteor Lake could eventually be canceled entirely, but that would be catastrophic for Intel and really a last resort. The only analogous part that I can think of that Intel has ever canceled was Cannon Lake, and even that had a fake launch that Intel would now like for everyone to forget about.
It's very plausible that Meteor Lake could be delayed, and not just because a lot of parts get delayed. I'd be very skeptical of it being canceled entirely, though. If it does get completely canceled, then it's time to start asking if Intel is facing bankruptcy.
Brenics ~ Just to point out I do believe Chris Roberts is going down as the man who cheated backers and took down crowdfunding for gaming.
https://www.newegg.com/intel-rnuc12wski50001-nuc-12-pro/p/N82E16856102367
CPU, GPU, motherboard, case, and power supply for $460. Buy your own memory and SSD, plug in peripherals, and you've got a complete desktop computer, albeit one based on a laptop CPU and with laptop-like performance.
But what if Meteor Lake can deliver roughly the performance of a $150 CPU and a $150 GPU in that form factor and price tag? It might, and that could make a very interesting product for a budget gaming desktop.
https://www.newegg.com/intel-bnuc11pahi50z/p/2SW-000B-003U3
Add your own SSD and memory and it's a complete desktop. What I'm saying is that if Meteor Lake is what the rumors say it will be, that could be very interesting as a budget gaming desktop. Or perhaps rather, if you add an inch to the height to allow a larger cooler and bump the TDP to 95 W to allow higher clock speeds, then a Meteor Lake NUC could be an interesting pick for a budget gaming desktop.