I've often posted about the disaster that is Intel's 10 nm process node. But they're not the only foundry struggling to make new process nodes that deliver the gains to which we're accustomed. Obviously, Global Foundries has abandoned the leading edge. But it hasn't been smooth sailing for TSMC or Samsung, either.
Let's start with Samsung, as that's the more obvious case. Nvidia went with Samsung for their Ampere lineup. But while Samsung has a 7 nm process node, that's not what Nvidia is using. Nvidia is using Samsung's "8 nm" node, which is a modified version of their 10 nm process node that went into mass production in 2016. Nvidia is launching products on it four years later.
It's not that Samsung's 7 nm node is brand new. It's been more than two years since Samsung announced that their 7 nm LPP node was available for mass production. It's been nearly 15 months since Samsung launched the Exynos 9825 SoC on it. Normally, you'd think that should mean it's plenty mature enough to use. Nvidia didn't think so, and went with the older node instead.
Nvidia isn't the only customer to shy away from Samsung's 7 nm process node. Much has been made of TSMC having limited capacity on their 7 nm node. That's because everyone and his neighbor's dog wants to use it, precisely because they aren't impressed by Samsung's 7 nm node.
But that doesn't mean that all is well at TSMC, either. Like Samsung, TSMC had a 10 nm process node. Unlike Samsung's, it was a dud and didn't get used much. TSMC was able to make that not particularly matter when they delivered a market-leading 7 nm node, allowing their 10 nm struggles to be forgotten. And that's surely Intel's hope with their 7 nm node, too, as delayed as it is.
But 7 nm is hardly TSMC's latest process node. They also have a "7+" node that uses EUV. They have a "6 nm" node that is designed to be easy to move to from 7 nm. They have a "5 nm" node already available and being used commercially. For both Zen 3 CPUs and Navi 2X GPUs, AMD said "nope" to all of that and went with the same 7 nm node that their previous generations had used.
So does that mean that Samsung 10/8 nm and TSMC 7 nm are good nodes, and the rest of the sub 14/12 crowd is junk? Hardly. Even the "good" nodes aren't all that great. The performance per watt gains for Nvidia's Ampere as compared to Turing are minimal, and much less than the 40% that you'd normally expect from a full node die shrink. Nvidia has been able to increase absolute performance considerably, but it has come at the expense of blowing out the power budget.
On the AMD side, if AMD's claims are to be believed, Navi 2X on 7 nm offers around double the energy efficiency of Vega on 14 nm. That's about what you'd hope for from that shrink. At least if it were all due to the die shrink, which it isn't. Vega was rather inefficient on 14 nm, and a cleaner comparison is of the Vega 10 die on 14 nm to the Vega 20 die on 7 nm. AMD was able to increase performance by maybe 1/4 in the same power envelope. From two full node die shrinks, you'd normally expect to about double your performance even without much in the way of architectural improvements.
So does that mean all is lost and Moore's Law is now dead? Not necessarily. For one, AMD's Navi 2X demonstrates a possible way forward, at least for a while. AMD's 128 MB "infinity cache" relies heavily on the die shrinks to make it possible. Assuming it's standard 6T SRAM, that means using over 6.4 billion transistors just for that one cache. For comparison, the GeForce GTX 1080 GPU chip (GP104) used about 7.2 billion transistors for the entire chip.
Die shrinks enable much higher transistor counts in the same area, and caches don't use very much power, so they can still enable much larger caches without burning ridiculous amounts of power. But there's only so much gain that you can get out of large caches. If Navi 3X or whatever AMD calls it has a 256 MB "infinity cache", will that really be much better than the 128 MB cache in Navi 2X?
Comments
For about the last 30 years, they've used argon fluoride lasers for this. You may have learned in chemistry that noble gases such as argon normally don't participate in chemical reactions. If you put them at sufficiently extreme temperatures and pressures, they do. So they make a bunch of argon fluoride, which will really try hard to split back into argon and fluoride if you let it. Then they let it do so, and that reaction releases 193 nm photons, which is in the deep ultraviolet range, and all of exactly the same wavelength. Of course, it takes a lot of energy to make argon react with fluorine, so that 300 W laser might need 100 kW of power to generate it.
The problem is that 193 nm photons can carve out 193 nm holes. That's not so great if you need to cut out 7 nm holes for your 7 nm process node. Well, they don't actually cut 7 nm holes, but they do need features that are around 50 nm or so. That's a lot smaller than 193 nm, so they use multiple patterning. If you cut two 193 nm holes whose centers are 100 nm apart, then the part where they overlap is smaller than 193 nm across and has a deeper hole. Double patterning like that isn't enough anymore, as they've moved to triple and quadruple and maybe sextuple patterning, which gets expensive.
The real solution is to stop using 193 nm photons and instead use something with a shorter wavelength. EUV lithography with a 13.5 nm wavelength is the solution here, and the industry was supposed to move to it about a decade ago. But that turns out to be hard, for a lot of reasons. For one thing, generating lasers of sufficient energy at that wavelength is hard to begin with. The photons will be almost immediately absorbed by air, so you have to do it in a vacuum. And then they're such high energy that they'll immediately rip an electron off of any electrically neutral molecule known to man, and also most of the ions that have already lost an electron, which can tear up the masks that you were hoping to reuse. And so it got delayed and delayed and delayed.
EUV lithography using the ASML Twinscan NXE:3400C is already in production in several process nodes. Those include Samsung 7 nm LPP, TSMC 7+ nm, and TSMC 5 nm. Intel's 7 nm node is also going to use EUV.
But oh look, those are all those advanced nodes that AMD and Nvidia are ignoring. Is this because the nodes aren't very good, or because they're too expensive, or what? If the latter, then EUV might become cheaper as the technology matures. But there are early claims that TSMC's 5 nm node isn't so great, and Intel's 7 nm is tremendously delayed. If EUV is supposed to save the industry, then shouldn't at least one of the EUV nodes actually be good?
Maybe it will be, and just needs time to mature, and will extend Moore's Law by quite a few more years. But so far, it hasn't done that. Hence the title: is this how Moore's Law dies, with new process nodes that aren't much better than the old?
I had to Google Moore's Law.
After looking it up, I think with something like Moore's law, there comes a point of evolutionary cap, where you get to the point of diminishing returns.
The reason why the transistor count doubles is that about every two years, you can do a full node die shrink that makes each transistor about half as large as it was before. (It's a scaling factor of about .7 in each dimension, but that means about half the area.) Making computer chips the same size on a new node as they were on the old with each transistor half as large means that you can fit twice as many transistors.
That's the sort of thing that you look at and say, that obviously can't continue forever. But Gordon Moore made his observation in 1965, and it has roughly continued ever since then. The Intel 4004 (1971) had 2250 transistors. The Nvidia A100 has more than 51 billion transistors.
If you could go back in time a few centuries and try to explain Moore's Law to the people who lived then, they wouldn't get it. People simply don't understand the concept of something doubling every two years and continuing to do so for more than 50 years. The few people who had sufficient mathematical understanding to comprehend the claims would surely tell you that you're mistaken because what you're claiming is impossible, as that's just not how the universe works.
There is the old observation that if something can't go on forever, then it won't. Exponential growth in the number of transistors on a chip obviously can't continue forever. But that doesn't tell you how it will stop.
Moore's Law first started to break down in the 1990s. It used to be that when you'd do the die shrink to get twice as many transistors, each transistor only uses half as much power as before. Then the scaling changed such that each transistor was using about 70% as much power as before. If you double your number of transistors, you use about 40% more power than before.
The original Pentium CPU from 1993 was notorious in its day for running very hot. It used 5.5 W. Note the decimal point; that's not a typo. For a number of years after that, they accepted that power consumption would increase every year, and you just build something that can cool it.
That eventually became a problem because you couldn't cool it. Intel's Pentium 4 was probably the last CPU to ignore power consumption. Intel promised that the architecture would scale to 10 GHz. It didn't even make it to 4 GHz before physics got in the way and said you can't do that.
That largely led to an era in which the amount of power that you could dissipate was fixed, and the question was how much performance you could get in that power envelope. At first, there was a lot of low hanging fruit as to ways to reduce power consumption. But after a while, that all got eaten up, and all a die shrink could get you was about a 40% performance improvement in the same power consumption, not double.
But even that 40% doesn't mean 40% higher per-core CPU performance. CPUs ran into the problem that you can't just clock a CPU core arbitrarily high and have it work without runaway power consumption. Rather, in order to increase CPU performance, you had to add more cores. But it's harder for software to put those cores to good use than it was for a single core that you could simply make faster. GPUs still scale well to many shaders, or "cores", which is why GPU advancements have seemed to come faster than CPU advancements in recent years.
And in recent process nodes, we're not necessarily even getting that 40% increase in performance per watt per full node die shrink anymore. That's what led to Nvidia's recent desperation move of having the GeForce RTX 3080 and 3090 use over 300 W. That wasn't desperation in the sense of "we're losing to AMD". It was desperation in the sense of, "we need to increase performance to get people to buy new hardware, and the methods we've used for the last few decades don't work anymore". If you need to increase performance and can't increase performance per watt, then you have to increase power consumption.
So if we're not even getting as large of gains out of die shrinks as we did several years ago, then what are we getting? Not very much, it seems. Hence the title.
"Human Knowledge Doubles every 10 years."
and.. when i said this to my Science teacher in Highschool, he laughed and said "Of course, that is because every 10 years, we learn that half the stuff we thought we knew was wrong"
------------
2024: 47 years on the Net.
It's kind of like claiming that GPUs will replace CPUs so that no one needs a CPU anymore. CPUs are still needed because they're really good at a lot of things that GPUs are bad at. GPUs have a lot more theoretical TFLOPS, but a lot of things can't put that to good use.
------------
2024: 47 years on the Net.
All of it.
G80 (2006): 0.68 billion
GT200 (2008): 1.4 billion
GF110 (2010): 3 billion
GM200 (2015): 8 billion
GP102 (2016): 12 billion
TU102 (2018): 18.6 billion
GA102 (2020): 28.3 billion
That's slower than doubling every two years, but it's still faster than doubling every three years.
The problem with huge caches is that they can only offer so much benefit. If you're not waiting on DRAM, then even a theoretical infinitely large cache doesn't help. They can also only replace whatever power consumption was previously used for the memory controllers and physical memory. That doesn't mean that there shouldn't or won't be a move toward larger caches, but only that there's limited gains to be had.
Apple is using it for the new A14 ARM processor, which is going in the iPhone 12, the current gen of iPads, and presumably the first generation of ARM laptops. Apple sells an ungodly amount of phones, and a decent number of iPads and laptops, so that's a pretty big order - Apple is TSMC's biggest single customer, and alone is responsible for 20% of TSMC's revenue.
https://www.extremetech.com/computing/315186-apple-books-tsmcs-entire-5nm-production-capability
it appers that as additional 5nm capacity is added, there is already a queue for the capacity. 6 other customers already have their booking in:
AMD, Bitmain, Intel/Altera, MediaTek, nVidia, and Qualcomm.
https://www.eenewsembedded.com/news/report-names-first-seven-tsmc-5nmn-customers
I want 7nm thin dammit!
And 7 nm isn't anywhere remotely close to the thickness of a wafer. Wafers tend to be on the order of a millimeter thick.
But I might have lost you on the joke, we can make the most complex shrinking chips, capable of so many precise computations... per Moores Law.
...But it takes over a decade to develop a thinner condom. Bad joke, I know.