It looks like you're new here. If you want to get involved, click one of these buttons!
In 2011, Intel launched Sandy Bridge, which was substantially better than anything that came before it. In 2012, they launched Ivy Bridge, which was a bit faster than Sandy at stock speeds, but wouldn't overclock as far, so they were basically tied if you overclocked both. In 2013, Intel released Haswell, which likewise was a little faster at stock speeds but wouldn't overclock quite as far. So a CPU released in 2011 is still as fast as it gets if you overclock everything. Furthermore, with 2014's Broadwell rumored to not come to desktops at all, or perhaps only in a crippled, low power form, Sandy Bridge could easily remain nearly tied with the high end until Sky Lake arrives well into 2015--or maybe 2016, with the way everything is getting delayed lately.
And it's far from clear that Sky Lake will improve single-threaded CPU performance, either. There's a huge emphasis on bringing CPU power consumption down, whether for laptops, tablets, data centers, or whatever. Process nodes can be tuned for higher performance or lower power consumption, but there are trade-offs. While x86 CPUs once were built heavily for higher performance, now there's more emphasis on lower power consumption, so it's far from clear that future process nodes will even be able to clock as high as current ones. There's also the issue that with CPU cores proper now taking up a very small percentage of total die space, even if you can dissipate a fixed amount of heat from the entire die, if where the heat is actually produced is very heterogeneous with most of the heat in a few tiny areas, that can still be a major pain to cool.
This isn't to say that we've reached the end on CPU performance period, of course. You can still add more CPU cores. But that only helps if code is designed to take advantage of more CPU cores, and that gets tricky for many programs once the core count gets high enough. In contrast, GPUs will still scale well to arbitrarily many shaders for the foreseeable future, and I see no reason to believe that GPU advances will slow down until Moore's Law dies. (Memory bandwidth is a big problem today, but that will get something of a reprieve at 14/16 nm with large L3 caches on the GPU die.)
I'm not willing to answer the question in the title with a "yes". The history of people predicting the end of technological advancements mostly features one wildly wrong prediction after another. But for the first time in the history of computers, it's a question to be taken seriously--and to get any further serious advancements might well require moving away from traditional silicon transistors to some other radical, new technology. Of course, the history of computers is one of doing that routinely, from solid state drives to LCD monitors to optical mice, to name a few of the relatively recent innovations.
Comments
Interesting post. Actually anything above 1 core adds layers of complexity and "CPU bookkeeping".
Sure we have preemptive OS'es but that is basically just a layer to prevent instruction collisions and memory corruption. Even with a preemptive OS you need to consider raceconditions at application level.
So basically we already have a massive platform diversity (even if we limit ourselves to the i86/x86 and x64/amd64 platforms).
We basically have to work with 32bit/64bit memoryspace, at least 3 different MS OS'es (XP, 7 and 8) that have different preemptive kernels.
On HW layer we have a multitude of options: 1-12/2*1-2*12 CPU cores, more or less intelligent blockdevices (disks/arrays) AND:
Most devtools implement their own "optimizing" before the code is fed to the OS.
In the end: On application level/layer the developer/programmer needs to choose a fitting compromize for his/her application...
For many applications (Word processors, web browser etc.) the default choice is: "Optimize for lowest common denominator".
For more demanding application that are 90% CPU/MMU intensive its obvious: Optimize for a high number of cores and a 64bit addressspace. (Multimedia apps, database engines etc.)
Applications that are I/O intensive (lots of busactivity disk/GPU interdependency and where you expect your target audience to be on extremely different platforms (think games) the image gets a little blurred... You want the "bleeding edge" crew to see an advantage to having heftier HW, but you prefer not to set the bar too high to exclude a lot of potential clients with lower spec HW...
Thus you end up with what most already know: Most games are optimized for ~2 cores and 32bit OS. Separation/optimization on application level, between mem/block/socket activity and HID/GPU would make sense in many MMO's because you can leave the core-bookkeeping to the OS that way and still be effective on 2 cores.
Edit: Sigh... I always skip the second p in preemptive - Its a mental racecondition!
We dont need casuals in our games!!! Errm... Well we DO need casuals to fund and populate our games - But the games should be all about "hardcore" because: We dont need casuals in our games!!!
(repeat ad infinitum)
You don't try to write software to use exactly two CPU cores and no more unless you know exactly what hardware it's going to run on. If you can make your program scale well to 12 cores (or more), you can just do that and let the OS decide which threads to schedule when, and it will keep however many CPU cores you have busy.
But the difficulty of making the CPU code scale well to many cores depends greatly on your algorithm, and can range from trivial to impossible. Figuring out how to break a program up into chunks that can run at the same time with little to no dependence on each other has to be done by the programmer, not an OS or compiler. There is some work to be done by the compiler and OS in threading, but the tools are there and ready, and people just need to use them.
The common myth that games can only use two cores is just that: a myth, and a stupid one at that. Many games do have CPU code that scales well to many CPU cores, but sometimes you don't notice because something else is the bottleneck rather than the CPU. I think it's perpetuated to some degree by tech sites that go out of their way to pick badly coded games for their benchmarks, as those are the games that manage to run poorly on capable hardware.
How many years did it take Pentiums to stop appearing on bargain-basement motherboards from Joe'sBuildYerOwnCheapMachine.com? And how long for the last of the VGA games to drop off the market?
I mean, you're basically asking a question about consumer habits, more than retailers, yes?
I'm probably looking at it backwards; de facto instead of de jure. But there's always been a lag between the theoretical limits and the practical user limits, guess I'm trying to say.
Self-pity imprisons us in the walls of our own self-absorption. The whole world shrinks down to the size of our problem, and the more we dwell on it, the smaller we are and the larger the problem seems to grow.
No, it's about what hardware is out there, whether people buy it or not.
Dont we kind of bounce between the idea of single vs parallel. Isnt parallel almost always hindered by timing issues. I thought with parallel that latency occurs due to some bits arriving at a later time, forcing a wait time on the bits that have already arrived. I have no idea really, but I always wondered why serial seems faster than parallel when it goes to cabling and I wonder if the same problems occur with circuits and multi-processing.
I would think that multi-processing would have become more common, why else do we have 8 core's on 1 cpu and in the server world its easy to dedicate cores to tasks. What I dont understand is why this hasnt translated to the desktop. That is one of the main reasons I believe AMD is struggling because I think they took a gamble and thought multi-processing would be further along by this point.
I also assume that AMD thought by this point that we would be able to leverage the power of the gpu much more easily on every day tasks. Its funny how you have this powerful tool sitting in your computer, idle.
An excellent post on a topic that fascinates me. Programming most intensive applications for multiple cores is usually an incredibly complicated process. Sometimes things flow logically mathematically such as a thread diverges or in order to run a new thread it needs output values from two other threads.
When talking about gaming applications we're usually limited in how indepth our multicore efficiency can get. But for OS applications and the several self contained threads running on your pc... multicore is almost a complete advantage.
Almost every developer working on any application that needs performance will go the multi-threaded route in some way or another. If it is something that obviously doesn't need to be multi-threaded than a good developer won't waste time implementing it.
Designing processors and their cores is a dance of logic, intent, resources, and physics. While we're limited to printing circuits on two dimensions our only route, with the materials we have, is to be more efficient with multiple cores. I'm sure everyone has heard of graphine and several other breakthroughs that might allow us to print circuits in three dimensions. If that technology hits then you can expect a resurgence of 'single core' designs for the sake of simplicity.
I'd like to restate the reason we are using multi-core cpus and systems. We've relatively reached the processing power capacity of our single core designs with the materials and designs available to us. So instead of simply building a bigger engine we're making everything more efficient so that speed is still increasing as it used to... 'relative' to the observer.
When a single program is using both cores, then yes, timing everything is a huge issue. But the real advantage of multi-core systems on desktops and servers is that unrelated programs can run on any available core instead of waiting... usually.
Modern operating systems do try to leverage the power of the GPU but the inherent design of the gpu stifles this. The GPU is a very specialized piece of hardware. Several companies I've worked with spent quite some time investigating GPU alternatives such as multi-purpose processing cards.
The problem with that idea then you get into the issue with specialization. If you're using the GPU for processing then you're still getting into the timing issues for multi-threaded applications... and in the end you could just have gotten another processor. If you're using the memory of the gpu/mppc then you could have just gotten more memory. The list goes on. No one wanted to use them because there was already a cheaper more specialized alternative available.
I think we can agree on most things. Ideally we should be able to write programs without a lot of consideration of the underlying OS/HW structure.
Sorry I wrote "~2" cores. It could as well have been "~8" cores - my point was you do not program for a set max. number of cores - but you optimize for one. (Or suffer the consequences)
Things MAY be very different in the games programming domain (I am not working in the gaming industy - I work with realtime applications and blockdevices/socket programming is my main focus.)
My remarks are targeted at the MS Windows platform and while you say the tools are "there and ready" please feel free to PM me and tell me exactly which tools you are referring to? (The C/C++ components and compiler in VS is not exactly "ready" for "fire and forget" programming, in my opinion.)
The stuff I highligted in red: My point exactly... Thats NOT an application that "scales well" - If it cant trust the underlying OS the application needs to accept and adapt to that. Enduser experience is more important than what the progammer thinks and the OS commercial said!
BTW: Sorry if I sort of drove the topic offrails. Never was my intention. While we havent reached max yet, we are getting close to physical limits.
We dont need casuals in our games!!! Errm... Well we DO need casuals to fund and populate our games - But the games should be all about "hardcore" because: We dont need casuals in our games!!!
(repeat ad infinitum)
Assuming this is all true, wouldn't it turn into a non-issue pretty quickly once the wall was actually reached? Even assuming that we don't find some quantum tunneling effect or whatever that gives us a lot more performance per core, couldn't this all be bypassed by better software or firmware that allows a sixteen core cpu to function as a four core cpu at the OS level? I mean, that's really the stumbling block now, if you don't write your application to take advantage of multiple cores, you have an application that runs on a single core and that's it. If the system could divide the tasks up efficiently, and then return results all in order, then the number of cores isn't really all that relevant.
I can not remember winning or losing a single debate on the internet.
I think there is a valid point there, but it wasn't framed very well.
The focus on lowering power right now is due to a sudden market shift towards mobile computing. Tablets, phones, and other devices represent a large chunk of processing that Intel hasn't been a part of. It would be a good guess to predict Intel is spending less on improving processing capability and spending more to reduce power consumption so they can get into the mobile market.
I'm not worried, though. Between graphene and quantum computing, the possibility for dramatic improvements in processors over the next ten years is there. It is just a matter of waiting to see who can implement it first.
It depends greatly on the algorithm being used and how often and how extensively different threads need to communicate. If you have several threads trying to operate on the same data and it needs to be kept coherent across all threads, you can easily get big problems with locking and synchronization. If you're not careful, you can even end up with a live-lock situation where everything comes to a screeching halt. Perhaps worse than that is a race condition in which you get the wrong answer without realizing it, and the answer depends on which threads finish their work first.
But if a program can operate where you break things up into a bunch of threads that can each work on their own data set for tens of milliseconds at a time with no communication with other threads whatsoever in that time, and when they do communicate, it consists only of passing a pointer to a different thread so that some other thread can continue with the data set, then the overhead of threading is inconsequential. Or in extreme cases, a program can have threads that can go for minutes or hours at a time without needing to contact other threads. Think of GIMPS, where if the only way for one thread to communicate with another was by the postal service (aka, snail mail) and having a human manually transfer the data between a computer and a piece of paper, even that wouldn't be all that big of an impediment to performance.
Leveraging a GPU is very different, and much harder. A GPU needs very SIMD-heavy stuff, with little to no branching and massive amounts of parallelism available. In order to really exploit a GPU, you need to be able to have many thousands of different computations going on with their own data sets, and not mattering what order the computations are done. Computer graphics actually fits that quite nicely: millions of pixels on a screen means millions of fragment shader invocations per frame that do not depend on each other at all.
With typical games, most of the work consists of setting up which objects are going to be drawn where in a frame and then passing that along to the video card. Only one thread can pass data to the video card, but breaking the work of what needs to be passed into as many threads as you like is fairly trivial to do, apart from a little bit of per-frame overhead such as determining where the camera will be that has to precede everything else. Then having all of those other threads communicate with the rendering thread is pretty much a textbook case of a producer-consumer queue. So games are fairly easy to thread to use several cores.
One general principle is that more, weaker cores will beat fewer, stronger cores if the workload scales well to more cores and the weaker cores can handle it. That's the real reason to move to multi-core processing. If CPU core A only offers 1/4 of the performance of its contemporary CPU core B, then core A might well get you that performance with only 1/10 of the power consumption. Ten of CPU core A thus gets you vastly more total performance than one of CPU core B with the same power consumption--but only if your workload scales well to many cores.
Whether the tools to scale well to several cores are there and ready to be used depends tremendously on what you need to do. For games, it works fine. For things that are harder to thread, it might well still be problematic. In the case of games, there's only so much work to be done in a given frame, and so long as you're keeping the GPU busy and all of the CPU cores busy (until the CPU work for that frame is done), it doesn't matter that much which order stuff gets processed in. Thus, you can break the CPU work into a bunch of threads (with an intelligently chosen number based on how many CPU cores the system has), let the OS worry about which thread to schedule on which core when, and it just works. (Incidentally, my personal experience with this is with Java and Windows 7; I've ran the same code on a Core i7 quad core and an AMD E-350 dual core.)
Now, there are many things that are much harder to scale well to many cores. And there, you could argue that the tools to do it properly really aren't ready. Or that they might never be ready, because there are some things that just can't be done in hardware very well. Maybe your experience is more in dealing with those things.
It's not just mobile computing. Any processors that are going to be kept running around the clock will have a considerable electrical bill as a result (and sometimes a huge cooling bill, too), and this can easily be greater over the years than the cost of buying the CPU up front. If you're selling a processor that is identical to your competitor's except that it will cost $200 less in electricity and cooling over its lifetime, then you can charge $100 more for it and still seriously claim that it's cheaper. That's a huge deal to data centers, and often to other servers. Power consumption is also a huge deal in laptops.
Furthermore, the focus on power consumption is partially because we can make chips that burn so much power than they used to. Twenty years ago, the difference between carefully optimizing for power consumption and simply not caring about power consumption might have been the difference between a chip that put out 2 W and one that used 5 W. Today, it can be the difference between a 50 W chip and a 200 W chip. The AMD FX-9590 has a ton of power-saving features, and still has a TDP of 220 W.
Quantum computing is not going to be a "now everything is faster" sort of thing. If it pans out, certain problems that classical computers are terrible at will suddenly be fast on a quantum computer. But quantum computers will still trail far behind classical computers in many, many things. I don't see quantum computers ever being relevant to gaming.
A very interesting discussion, everyone.
--- On Parallel Processing
I'm definitely in the 'algorithm is everything' camp when it comes to parallel processing. Some things can be broken into smaller, independent streams of code capable of running simultaneously on multiple processors, and some things simply can't. The performance of an individual processor core will be important for a considerable time to come, because some tasks will remain serialized.
Until there is a substantial (revolutionary) way to understand and develop tasks, algorithms will determine which tasks will benefit from parallel processing, and how efficient the parallel process can be.
--- On Single Core Processing Limits
I do feel that the hardware is at (or close to) its practical performance limits. I expect that the cost of improving performance on a single core is growing exponentially, and there is little or no chance to re-coup the costs of pushing the conventional materials any further. There may be some advances to come due to advances in material sciences, but no one can predict if these technologies can be cost effective, either.
I don't know that quantum operations are within anyone's lifespan. There may be early quantum computers in 25 years, but without equally revolutionary breakthroughs in mathematics and computing, these solutions would only give a boost to a very small number of potential applications. I'd guess a time frame of 100 to 125 years to be far more accurate for quantum computing to have any significant impact at the consumer level.
Logic, my dear, merely enables one to be wrong with great authority.
http://www.technologyreview.com/view/518426/how-to-save-the-troubled-graphene-transistor/
The smart people at MIT just figured out how to switch graphine from 0 to 1, which was the holdup for the past 10 or so years. Now they have a road map to create the first graphine processor which can clock around 400 GHz (my guess is you will see the first graphine processor in the next 10-15 years. With a cpu clocking that high, will we even need a graphics card anymore? Will be interesting to see how this changes everything.
Meh,
Processors as we know them now may be hitting their limit. They can probably get down to <10nm, but probably not <5nm. Smaller dies don't necessarily translate to better IPC, but they give designers flexibility to get them. That, combined with the Ghz wall - we've only rarely seen chips sold at stock 4.0Ghz (AMD just pushed their 9590 at 4.7/5.0, but it's pretty limited availability), and while we've seen some samples that can go as high as 6 and 7 Ghz, nothing commercially can hang there, so we may see commercial stock speeds get upwards of 5-6Ghz, but probably not much past that.
So yeah, using our existing methods, we're about done with IPC. There are probably a few more tricks engineers can do with OOOE and predictive branching and caching and whatnot, but none of those will be huge gamechangers, since we'll still be stuck with the same silicon limits mentioned in the first paragraph.
However;
There are other technologies in the works.
Specialized processors are coming back in vogue - used to be the CPU and FPU were two different chips on a motherboard. Now we are seeing GPUs starting to be used for algorithms where SIMD really helps. We see specialized instructions for crypto. Add in things like Quantum, which are situational as well. Once some of these highly specialized processors start to become standard enough that routine compilers can just assume they will be present, we'll see software able to speed up more, even without an increase in the generic CPU IPC.
And there could always be another process breakthrough. Graphene is one, light-driven fiberoptic based CPU may be another - and there are hundreds of other ideas I haven't even heard of and couldn't even imagine.
And even if those technologies are a long way away:
I'm not worried. Sure, there are some computer processes that can use all the CPU power available and then some. But for my personal use, I don't use 1/10th of what I have available in my 5 year old computer now. Faster IPC is nice, and it allows for breakthroughs in software, but the software I use now sits around waiting on me most of the time, not vice versa.
One thing I'd like to point out is that, for years, we got both more cores and faster cores. Moving from Pentium D to Conroe to Penryn to Bloomfield to Lynnfield to Clarkdale to Sandy Bridge, we got faster per-core performance every step of the way at stock speeds, and also if you compare overclock versus overclock with only the exception of Bloomfield basically tying Lynnfield since the cores were basically identical and on the same process node.
The fastest single-core x86 processor ever made might be the AMD Sempron 150, basically a 2.9 GHz single-core version of an Athlon II (since it was basically an Athlon II X2 where one core didn't work). Today, that would get blown apart in single-threaded performance, not just by the latest and greatest Core i7, but also by recent lower end Intel Pentium and AMD A4 chips. As we moved from two cores to four, the trend again was toward both more cores and faster cores simultaneously.
But that seems to have come to a halt. We can still make CPUs with more cores, and AMD is still improving per-core performance, though that only serves to close the gap somewhat with Intel. (I expect that Kaveri will be to Haswell as Phenom II was to Bloomfield: slower, but respectable, and not a compete blowout as we saw with Phenom I versus Core 2 Quad or Bulldozer versus Sandy Bridge.) A few years ago, AMD said that they would improve their CPU performance by 10%-15% per year, and that that would be enough to keep pace with Intel. At the time, that seemed very pessimistic as it trailed far behind historical gains, but in retrospect, it seems too optimistic about the gains that either company would be able to deliver.
IMEC is already at 10nm
this is a very good look into the future
this is from 2011 but IMEC is about 10 years ahead of the industry
From the practical point of view, the new generation of consoles will influence consumer CPUs greatly. They've got eight 1.6 GHz cores, if I rememember correctly. Most multiplatform games will be built with that in mind. Time will tell how that'll influence their performance on standard four core CPUs. Maybe we'll see lots of games optimised for eight cores or maybe hyperthreading will finally get used. We'll probably not see many games optimised exclusively for 2+ GHz cores.
If the average gamer doesn't need the newest Fancybridge CPU, he less likely buy it and CPU producers are less likely to focus on it.
Umm, the new generation of consoles uses a consumer x86 CPU, it's an AMD with Jaguar cores and GCN graphics - nothing really new or terribly different than you couldn't already get in the wild. AMD has been selling 8-core CPU's for a long time, and Jaguar cores since the beginning of this year.
You can consider all Intel 4-core CPU's with Hyperthreading (the Desktop i7's) as 8-core machines as well, and those have been available for a long time too.
Now maybe just the fact that it's being used in consoles will influence PC games - we certainly can attest to the fact that it did for the current generation. But the hardware is absolutely nothing special, so there is absolutely no reason to believe it will have any influence what so ever on consumer CPUs, because it is (more or less) a consumer CPU.
You should give them a look, they're the industry leader in lithography. Half of the technology Intel uses (Tri-gate for example) came out of Imec. A lot of Intel's R&D budget goes to IMEC.
They will open their new 450nm wafer cleanroom with EU and Samsung / Intel support in 2014. First in the world on that scale. It will help them get below 10nm.
Yes, I'm not claiming it's anything new. However, the fact that new console games will be optimized to utilize eight cores may influence Intel's CPUs. So far, people have been ignoring those eight core CPUs, because they're a bit crappy compared to Intel's offerings performance wise. Most people ignore i7s, too, because they're just not worth the price at the moment. Them being available or not doesn't make much of a difference.