Originally posted by QuizzicalAs only one thread can use execution resources at a time, if they're both ready to do computations in one clock cycle, one of them has to sit and wait.
As I pointed out above, the instructions are executed with out of order loading - you can process 2 threads within 1 clock cycle.
You are still talking about how multithreading within single core system, without Hyperthreading. That is what you are dexcribing there.
Hyperthreading starts at decoder level when 2 threads get decoded and are passed to be processed concurrently. That is what Hyperthreading actually does.
Hyperthreading doesn't allow the CPU to execute two threads simultaneously, it allows it (ideally) to execute two threads faster. A single execution core cannot execute two threads at the same time. It can switch between them, of course.
For the small amount of money that can be saved, and that's often debatable, the i3 doesn't seem like a great first choice. For my computer needs it's not really a choice at all.
I'd say it was a poor choice all round really. At the low end, AMD's APUs provide the best available integrated graphics and decent CPU performance, and the FX 83xx is now cheap enough to make the i3 pretty pointless unless all you do is run single threaded software benchmarks.
Hyperthreading doesn't allow the CPU to execute two threads simultaneously
That is exactly what HT does, that else do you think those 2 logical processors with own registers are there for?
The answer on superuser.com explains it well enough:
Hyper-threading is where your processor pretends to have 2 physical processor cores, yet only has 1 and some extra junk.
The point of hyperthreading is that many times when you are executing code in the processor, there are parts of the processor that is idle. By including an extra set of CPU registers, the processor can act like it has two cores and thus use all parts of the processor in parallel. When the 2 cores both need to use one component of the processor, then one core ends up waiting of course. This is why it can not replace dual-core and such processors.
Originally posted by Athisar The answer on superuser.com explains it well enough:Hyper-threading is where your processor pretends to have 2 physical processor cores, yet only has 1 and some extra junk.The point of hyperthreading is that many times when you are executing code in the processor, there are parts of the processor that is idle. By including an extra set of CPU registers, the processor can act like it has two cores and thus use all parts of the processor in parallel. When the 2 cores both need to use one component of the processor, then one core ends up waiting of course. This is why it can not replace dual-core and such processors.
See, it does process 2 threads simultaneously...was it so hard?
Just to give you an idea, a high end 4 core I7 cpu retails for about $350
edit:
I3 retail about $150
I5 retail about $250
I7 retail about $350
Ya this is spot on, not sure if someone trying to pull a fast one on your or if the PC you looking at is 300 more because its i5 and some things are upgraded as well, like more ram, video card maybe? Whats the specs on the PCs you are looking at?
In any games where the GPU is tested, which is all the AAA titles, there is hardly any difference between the modern cpus. But you can buy an i7 and be glad you have 160 frames in WoW, instead of just a 100.
Look around in this article, you see that running games in low at very low resolution, on the biggest GPU they could get their hands on, will show you the "real" difference in cpu capability.
Which is what those criticizing FX processors always do. But noone buys the biggest GPU to use those settings. It is a "scientific" test, but also very much a synthetic test.
But look then when the game is played on Full HD, and the settings are cramped up to tax the GPU. This is a real life situation, and much closer to user experience.
You see in the different games they all perform more or less the same, only with Skyrim as an outlier. And even Skyrim the AMD cpu will give you plenty of FPS.
This is why people who have actually bought AMD FX cpus, write posts saying they are perfectly happy with their cpu and that it performs very well, to the dismay of people who think they can read benchmarks, but can't read benchmarks.
I don't understand this entire Hyperthreading versus Multithreading derail.
Ok, that is the main difference between an i3 and i5 - in that context I can understand a little bit.
But I think we've pretty well established that real cores are better that simulated cores, and that with that in mind, that a CPU with 4 (or more) real cores is better than one with 2 cores and 2 HT cores.
Past that, I think it's pretty obvious in the posts who understands what these processes really do, and who does not really understand but likes to throw around jargon.
If we stick to i3 vs i5 question in the OP - the hands down answer is that an i5 doesn't even cost $300 baseline, so the upgrade question probably indicates that the entire system cost is out of whack, and that the base model probably isn't worth it, let alone the upgrade.
Can we now just let this one die, or shift the discussion to another "CPU Deathmatch" thread.
Hyperthreading doesn't allow the CPU to execute two threads simultaneously
That is exactly what HT does, that else do you think those 2 logical processors with own registers are there for?
I don't think you understand what registers are, either. Registers don't process data. Registers only store data. Registers are a very small (e.g., tens or hundreds of bytes), very short-term (on the order of nanoseconds) data storage location with very low latency. But you want the data that the CPU is actively processing to have somewhere that it can be stuck briefly and then be retrieved very, very fast when needed. CPUs also have a series of increasingly larger but longer latency caches (L1, L2, sometimes L3, then off to system memory) for when data may be needed again soonish, but not immediately. Registers are the smallest, lowest latency (probably 1 clock cycle) type of CPU cache.
You probably don't actually want to know what hyperthreading is, but I made a picture to illustrate it for the sake of anyone who does. Click on it to see it larger.
At the top, we have four threads. Think of it as water or some such intermittently coming through a pipe--but at any given moment, a given thread likely has nothing available to send through.
The black part is the CPU core(s). A single core means that only one thread can be connected at a time. If we want to switch to a different thread, we have to stop to unscrew the top thread and screw on a different thread. During this transition time, no threads can make any progress at all.
Hyperthreading allows two threads to be connected at once. If we want to switch from the red thread to the green thread, there's no real penalty for doing so. If we want to switch from the red thread to the blue thread, the green thread can continue executing while we disconnect the red thread and attach the blue. This makes the cost of switching from one thread to another much less than in the single core case. But ultimately, there's only one set of execution resources available, so there's only one opening out the bottom. The greatest possible throughput with hyperthreading is no greater than for just a single core.
On the right, we have two cores. As with hyperthreading, both the red and green threads can be attached simultaneously. If we want to switch from the red thread to the blue thread, one core can't execute anything for a while while we switch, but the other core can keep running the green thread. But if both the red and green threads have stuff to execute at the same time, they can do so at the same time. This is unlike the hyperthreading case where one will have to wait a moment--and if both threads are very busy, they could easily spend 1/3 of their time or more having something ready to execute but having to wait their turn.
Originally posted by QuizzicalHyperthreading allows two threads to be connected at once. If we want to switch from the red thread to the green thread, there's no real penalty for doing so. If we want to switch from the red thread to the blue thread, the green thread can continue executing while we disconnect the red thread and attach the blue. This makes the cost of switching from one thread to another much less than in the single core case. But ultimately, there's only one set of execution resources available, so there's only one opening out the bottom. The greatest possible throughput with hyperthreading is no greater than for just a single core.On the right, we have two cores. As with hyperthreading, both the red and green threads can be attached simultaneously. If we want to switch from the red thread to the blue thread, one core can't execute anything for a while while we switch, but the other core can keep running the green thread. But if both the red and green threads have stuff to execute at the same time, they can do so at the same time. This is unlike the hyperthreading case where one will have to wait a moment--and if both threads are very busy, they could easily spend 1/3 of their time or more having something ready to execute but having to wait their turn.
You did not answer my question what 2 sets of registers are there for and more importantly why. You do not really understand what happens to your thread past FSB, threfore you cannot understand HT.
You keep looking at HT from OS point of view, however that is not how the threads are processed internally on CPU.
There is a difference between multithreading - the ability of application and OS to manage multiple threads, and Hyperthreading - ability of CPU to process multiple threads simultaneously. One is software, the other is hardware. Difference you are utterly oblivious to.
Originally posted by Gdemami Originally posted by QuizzicalHyperthreading allows two threads to be connected at once. If we want to switch from the red thread to the green thread, there's no real penalty for doing so. If we want to switch from the red thread to the blue thread, the green thread can continue executing while we disconnect the red thread and attach the blue. This makes the cost of switching from one thread to another much less than in the single core case. But ultimately, there's only one set of execution resources available, so there's only one opening out the bottom. The greatest possible throughput with hyperthreading is no greater than for just a single core.On the right, we have two cores. As with hyperthreading, both the red and green threads can be attached simultaneously. If we want to switch from the red thread to the blue thread, one core can't execute anything for a while while we switch, but the other core can keep running the green thread. But if both the red and green threads have stuff to execute at the same time, they can do so at the same time. This is unlike the hyperthreading case where one will have to wait a moment--and if both threads are very busy, they could easily spend 1/3 of their time or more having something ready to execute but having to wait their turn.
You did not answer my question what 2 sets of registers are there for and more importantly why. You do not really understand what happens to your thread past FSB, threfore you cannot understand HT.
You keep looking at HT from OS point of view, however that is not how the threads are processed internally on CPU.
There is a difference between multithreading - the ability of application and OS to manage multiple threads, and Hyperthreading - ability of CPU to process multiple threads simultaneously. One is software, the other is hardware. Difference you are utterly oblivious to.
So... umm..
So let's say the Red thread is running MOV ebx, eax And the Green thread is running MOV ebx, 4h
And Hyperthreading swaps over to from the Red to the Green thread. What value do we have in EBX when we go back to the Red thread?
The red thread is expecting the value of EAX to be in the EBX register. The green thread, running independently, is expecting the number 4 to be in the BX register.
So, since we don't exactly understand what is happening past the FSB - how does this work exactly with HT so you don't corrupt another threads data, or you don't invoke a massive context switching penalty and have to swap out the entire set of registers every time you swap threads?
Originally posted by Gdemami Yeah, HT allows truly concurrent processing of 2 threads...
Nope, not even close.
Concurrent processing means that on any give clock cycle, you are executing both processes simultaneously. HT absolutely cannot do that.
HT can take an otherwise idle processor and run a waiting thread, but it cannot run two threads simultaneously. It can only process the second thread if the first thread goes idle waiting for an external process or relinquishes itself (which is where the OS scheduler comes into play).
Out-of-order tricks can let you get one instruction a ways down the pipeline, and have multiple threads in the pipeline at any time - but again, you get back to at any give clock cycle, you can only be executing on a single thread of instructions.
Since the processor is sitting around waiting on something, it is a more efficient use of the processor resources, and it can give the appearance of multiple threads executing simultaneously, but they are still executing in a serial fashion. That's why you get a variable performance increase of aournd 10-40%, based on how much dead time the particular process you are running has available and how aggressively the OS scheduler allows the threads to run. On any given clock cycle, the core can still only execute a single instruction.
But you are only truly processing concurrently if you are able to process simultaneous instructions on any given clock cycle.
If hyper-threading could run two threads simultaneously it wouldn't need to exist in the first place. That's what you have two cores for. HT optimises a single core's processing of two threads but cannot run them at the same time.
If the OS has no knowledge of HT but does support SMP it will not be very effective at all, as a single core with HT is nothing nearly as powerful as two cores, even in ideal circumstances where the OS schedules with HT in mind.
Originally posted by AthisarIf hyper-threading could run two threads simultaneously it wouldn't need to exist in the first place. That's what you have two cores for. HT optimises a single core's processing of two threads but cannot run them at the same time.If the OS has no knowledge of HT but does support SMP it will not be very effective at all, as a single core with HT is nothing nearly as powerful as two cores, even in ideal circumstances where the OS schedules with HT in mind.
Erm...no, apples and oranges there.
Where to start...I am not good at writing walls of text.
When OS feed the data and instruction into CPU, the instructions get decoded and low-level instructions(ops) are created. The instructions are feeded by OS in particular order but ops are processed in different order. Why?
Imagine there is a request to sum 4 values - 1+2+3+4.
Normaly, this would take 3 cycles - 1+2, 3+3 and then 6+4. This is very inefficient.
Our example is simple but real code consists of lots of conditions that will determine next step depending on result - if value y, then run z, if value i, then run g, etc.
To help with this, there is used pre-decoding and branching prediction in decoding process. In that process, decoder is trying to figure out what the results and next step will be without actually creating ops, the information is stored in branching prediction table.
With branching prediction, our example would get processed as follows:
In first cycle, values 1+2 and 3+4 are summed up. In second cycle, we get 3+7.
We just saved 1 cycle!
But in order for this to work, CPU needs to keep track what ops belongs to what instruction since ops are processed in different order then ops. This track is stored in registers.
There is lots of scheduling done to utilize execution units. OPs are processed in different order so the idle time is minimized and single thread is processed parallely - different execution units and "sub-units" are processing ops simultaneously.
Yet, CPU utilization is low - there is still lots of branching dependent calculations, instructions requiring same execution unit, etc. The paralelism isn't all that great.
So now we have a picture how a single thread is processed parallely but how to increase the utilization even further? Let's process one more thread!
And here goes HyperThreading, CPU got 2 virtual processors, 2 registers to keep track of ops that belong to 2 different threads and OPs from 2 threads can processed simultaneously.
Very simplified but that should cover the basics how it works.
How much more powerful HT is? Well, that depends on task. Imo, for gaming it provides that very nice boost that dual core CPUs are lacking these days. The benchamrks are out there so everyone can make their own judgement.
If hyper-threading could run two threads simultaneously it wouldn't need to exist in the first place. That's what you have two cores for. HT optimises a single core's processing of two threads but cannot run them at the same time.
If the OS has no knowledge of HT but does support SMP it will not be very effective at all, as a single core with HT is nothing nearly as powerful as two cores, even in ideal circumstances where the OS schedules with HT in mind.
Erm...no, apples and oranges there.
Where to start...I am not good at writing walls of text.
When OS feed the data and instruction into CPU, the instructions get decoded and low-level instructions(ops) are created. The instructions are feeded by OS in particular order but ops are processed in different order. Why?
Imagine there is a request to sum 4 values - 1+2+3+4.
Normaly, this would take 3 cycles - 1+2, 3+3 and then 6+4. This is very inefficient.
Our example is simple but real code consists of lots of conditions that will determine next step depending on result - if value y, then run z, if value i, then run g, etc.
To help with this, there is used pre-decoding and branching prediction in decoding process. In that process, decoder is trying to figure out what the results and next step will be without actually creating ops, the information is stored in branching prediction table.
With branching prediction, our example would get processed as follows:
In first cycle, values 1+2 and 3+4 are summed up. In second cycle, we get 3+7.
We just saved 1 cycle!
But in order for this to work, CPU needs to keep track what ops belongs to what instruction since ops are processed in different order then ops. This track is stored in registers.
There is lots of scheduling done to utilize execution units. OPs are processed in different order so the idle time is minimized and single thread is processed parallely - different execution units and "sub-units" are processing ops simultaneously.
Yet, CPU utilization is low - there is still lots of branching dependent calculations, instructions requiring same execution unit, etc. The paralelism isn't all that great.
So now we have a picture how a single thread is processed parallely but how to increase the utilization even further? Let's process one more thread!
And here goes HyperThreading, CPU got 2 virtual processors, 2 registers to keep track of ops that belong to 2 different threads and OPs from 2 threads can processed simultaneously.
Very simplified but that should cover the basics how it works.
How much more powerful HT is? Well, that depends on task. Imo, for gaming it provides that very nice boost that dual core CPUs are lacking these days. The benchamrks are out there so everyone can make their own judgement.
Branch prediction is for branching. The name is descriptive. For example:
if (x > 3) {
x += 5;
} else {
x *= 7;
}
If this is in a loop and x has been greater than three the last thousand times, then branch prediction might guess that x will still be greater than three the next time and we're going to add five rather than multiply by seven. If there are latency issues such that waiting to find out for certain whether x is greater than three this time, too, it might well start speculative execution and compute x+5 and then figure out later whether we actually need that value. Or it might not; it depends on the architecture.
The addition example you gave doesn't involve any branching at all. Furthermore, that simple reordering of operations is something that a compiler can do at compile time (if it's a sensible thing to do on that particular architecture) so that you get the same benefit without even needing out of order execution.
But this has nothing to do with hyperthreading whatsoever. Hyperthreading does mean that a single core has multiple sets of registers and schedulers and such to handle multiple threads, but that doesn't allow both threads to execute instructions at the same time. Maybe one thread will figure out that it needs to compute 5+3 and the other thread figures out at the same time that it needs to compute 7*9. But that doesn't let the core actually compute 5+3 and 7*9 at the same time. Hyperthreading does not replicate the execution resources of a core, and while both threads may know what they need to compute, they still have to wait until there is an ALU free to actually carry out the desired computation.
Originally posted by Quizzical BBranch prediction is for branching. The name is descriptive. For example: if (x > 3) { x += 5; } else { x *= 7; } If this is in a loop and x has been greater than three the last thousand times, then branch prediction might guess that x will still be greater than three the next time and we're going to add five rather than multiply by seven. If there are latency issues such that waiting to find out for certain whether x is greater than three this time, too, it might well start speculative execution and compute x+5 and then figure out later whether we actually need that value. Or it might not; it depends on the architecture. The addition example you gave doesn't involve any branching at all. Furthermore, that simple reordering of operations is something that a compiler can do at compile time (if it's a sensible thing to do on that particular architecture) so that you get the same benefit without even needing out of order execution. But this has nothing to do with hyperthreading whatsoever. Hyperthreading does mean that a single core has multiple sets of registers and schedulers and such to handle multiple threads, but that doesn't allow both threads to execute instructions at the same time. Maybe one thread will figure out that it needs to compute 5+3 and the other thread figures out at the same time that it needs to compute 7*9. But that doesn't let the core actually compute 5+3 and 7*9 at the same time. Hyperthreading does not replicate the execution resources of a core, and while both threads may know what they need to compute, they still have to wait until there is an ALU free to actually carry out the desired computation.
In first cycle, values 1+2 and 3+4 are summed up. In second cycle, we get 3+7.
We just saved 1 cycle!
Umm...
You can only add 1 thing at a time, because you only have 1 set of transistors dedicated to integer addition per core. There are other floating point adders, and various multimedia extensions that include an addition function, but the ADD assembly instruction uses one particular set of transistors (the "adder", oddly enough), and there are only 1 set of those per core. Assembly instructions, as you may well know, translate directly into bytecode, which is what runs directly on the CPU.
How do 1+2 and 3+4 get summed at the same time? Apart from that, a decent explanation of dynamic execution and how the CPU can use branch prediction to look ahead to try to optimize the instruction queue (although Quiz gets it better, your just kinda scratching the surface there, mostly because your doing it by accident thinking your talking about something else entirely), but that isn't the same thing at all as Hyperthreading. The older NetBurst P4's where infamous for having very deep instruction queues, which allowed them to do very extravanant out-of-order tricks. It turned out that deep queues really need faster cycles to get more performance, and Intel throttled back with the Core line, which actually has a smaller pipeline.
Also, your 1+2+3+4 example is the same thread. Or at least I am assuming, because you put it on the same line of psuedocode. You could split it into multiple threads... but then you get into when you switch those threads, which you claim is definitely OS territory.
Originally posted by RidelynnI'm glad there is a Block feature on these forums. I'm sorry I even took the bait from this guy ><
When you do not truly understand how it works and I explain it to you, you just add your ignorance/misconceptions and keep arguing...viscious circle. You either want to discuss/learn or you want to argue, it is your choice.
Comments
As I pointed out above, the instructions are executed with out of order loading - you can process 2 threads within 1 clock cycle.
You are still talking about how multithreading within single core system, without Hyperthreading. That is what you are dexcribing there.
Hyperthreading starts at decoder level when 2 threads get decoded and are passed to be processed concurrently. That is what Hyperthreading actually does.
I'd say it was a poor choice all round really. At the low end, AMD's APUs provide the best available integrated graphics and decent CPU performance, and the FX 83xx is now cheap enough to make the i3 pretty pointless unless all you do is run single threaded software benchmarks.
That is exactly what HT does, that else do you think those 2 logical processors with own registers are there for?
The answer on superuser.com explains it well enough:
Hyper-threading is where your processor pretends to have 2 physical processor cores, yet only has 1 and some extra junk.
The point of hyperthreading is that many times when you are executing code in the processor, there are parts of the processor that is idle. By including an extra set of CPU registers, the processor can act like it has two cores and thus use all parts of the processor in parallel. When the 2 cores both need to use one component of the processor, then one core ends up waiting of course. This is why it can not replace dual-core and such processors.
See, it does process 2 threads simultaneously...was it so hard?
Ya this is spot on, not sure if someone trying to pull a fast one on your or if the PC you looking at is 300 more because its i5 and some things are upgraded as well, like more ram, video card maybe? Whats the specs on the PCs you are looking at?
In any games where the GPU is tested, which is all the AAA titles, there is hardly any difference between the modern cpus. But you can buy an i7 and be glad you have 160 frames in WoW, instead of just a 100.
Look around in this article, you see that running games in low at very low resolution, on the biggest GPU they could get their hands on, will show you the "real" difference in cpu capability.
Which is what those criticizing FX processors always do. But noone buys the biggest GPU to use those settings. It is a "scientific" test, but also very much a synthetic test.
But look then when the game is played on Full HD, and the settings are cramped up to tax the GPU. This is a real life situation, and much closer to user experience.
You see in the different games they all perform more or less the same, only with Skyrim as an outlier. And even Skyrim the AMD cpu will give you plenty of FPS.
This is why people who have actually bought AMD FX cpus, write posts saying they are perfectly happy with their cpu and that it performs very well, to the dismay of people who think they can read benchmarks, but can't read benchmarks.
http://www.ocaholic.ch/modules/smartsection/item.php?itemid=1117&page=8
Buying a FX6300 on a low budget, or a FX8350 on a budget, makes very good sense. After that the i5 takes over.
I don't understand this entire Hyperthreading versus Multithreading derail.
Ok, that is the main difference between an i3 and i5 - in that context I can understand a little bit.
But I think we've pretty well established that real cores are better that simulated cores, and that with that in mind, that a CPU with 4 (or more) real cores is better than one with 2 cores and 2 HT cores.
Past that, I think it's pretty obvious in the posts who understands what these processes really do, and who does not really understand but likes to throw around jargon.
If we stick to i3 vs i5 question in the OP - the hands down answer is that an i5 doesn't even cost $300 baseline, so the upgrade question probably indicates that the entire system cost is out of whack, and that the base model probably isn't worth it, let alone the upgrade.
Can we now just let this one die, or shift the discussion to another "CPU Deathmatch" thread.
I don't think you understand what registers are, either. Registers don't process data. Registers only store data. Registers are a very small (e.g., tens or hundreds of bytes), very short-term (on the order of nanoseconds) data storage location with very low latency. But you want the data that the CPU is actively processing to have somewhere that it can be stuck briefly and then be retrieved very, very fast when needed. CPUs also have a series of increasingly larger but longer latency caches (L1, L2, sometimes L3, then off to system memory) for when data may be needed again soonish, but not immediately. Registers are the smallest, lowest latency (probably 1 clock cycle) type of CPU cache.
You probably don't actually want to know what hyperthreading is, but I made a picture to illustrate it for the sake of anyone who does. Click on it to see it larger.
At the top, we have four threads. Think of it as water or some such intermittently coming through a pipe--but at any given moment, a given thread likely has nothing available to send through.
The black part is the CPU core(s). A single core means that only one thread can be connected at a time. If we want to switch to a different thread, we have to stop to unscrew the top thread and screw on a different thread. During this transition time, no threads can make any progress at all.
Hyperthreading allows two threads to be connected at once. If we want to switch from the red thread to the green thread, there's no real penalty for doing so. If we want to switch from the red thread to the blue thread, the green thread can continue executing while we disconnect the red thread and attach the blue. This makes the cost of switching from one thread to another much less than in the single core case. But ultimately, there's only one set of execution resources available, so there's only one opening out the bottom. The greatest possible throughput with hyperthreading is no greater than for just a single core.
On the right, we have two cores. As with hyperthreading, both the red and green threads can be attached simultaneously. If we want to switch from the red thread to the blue thread, one core can't execute anything for a while while we switch, but the other core can keep running the green thread. But if both the red and green threads have stuff to execute at the same time, they can do so at the same time. This is unlike the hyperthreading case where one will have to wait a moment--and if both threads are very busy, they could easily spend 1/3 of their time or more having something ready to execute but having to wait their turn.
You did not answer my question what 2 sets of registers are there for and more importantly why. You do not really understand what happens to your thread past FSB, threfore you cannot understand HT.
You keep looking at HT from OS point of view, however that is not how the threads are processed internally on CPU.
There is a difference between multithreading - the ability of application and OS to manage multiple threads, and Hyperthreading - ability of CPU to process multiple threads simultaneously. One is software, the other is hardware. Difference you are utterly oblivious to.
You did not answer my question what 2 sets of registers are there for and more importantly why. You do not really understand what happens to your thread past FSB, threfore you cannot understand HT.
You keep looking at HT from OS point of view, however that is not how the threads are processed internally on CPU.
There is a difference between multithreading - the ability of application and OS to manage multiple threads, and Hyperthreading - ability of CPU to process multiple threads simultaneously. One is software, the other is hardware. Difference you are utterly oblivious to.
So... umm..
So let's say the Red thread is running
MOV ebx, eax
And the Green thread is running
MOV ebx, 4h
And Hyperthreading swaps over to from the Red to the Green thread. What value do we have in EBX when we go back to the Red thread?
The red thread is expecting the value of EAX to be in the EBX register.
The green thread, running independently, is expecting the number 4 to be in the BX register.
So, since we don't exactly understand what is happening past the FSB - how does this work exactly with HT so you don't corrupt another threads data, or you don't invoke a massive context switching penalty and have to swap out the entire set of registers every time you swap threads?
Nope, not even close.
Concurrent processing means that on any give clock cycle, you are executing both processes simultaneously. HT absolutely cannot do that.
HT can take an otherwise idle processor and run a waiting thread, but it cannot run two threads simultaneously. It can only process the second thread if the first thread goes idle waiting for an external process or relinquishes itself (which is where the OS scheduler comes into play).
Out-of-order tricks can let you get one instruction a ways down the pipeline, and have multiple threads in the pipeline at any time - but again, you get back to at any give clock cycle, you can only be executing on a single thread of instructions.
Since the processor is sitting around waiting on something, it is a more efficient use of the processor resources, and it can give the appearance of multiple threads executing simultaneously, but they are still executing in a serial fashion. That's why you get a variable performance increase of aournd 10-40%, based on how much dead time the particular process you are running has available and how aggressively the OS scheduler allows the threads to run. On any given clock cycle, the core can still only execute a single instruction.
But you are only truly processing concurrently if you are able to process simultaneous instructions on any given clock cycle.
Thread pooling is managed by OS, not CPU. One is SW, the other is HW.
Running in circles...
Wait, you just accused us all of looking at this from an OS perspective and not looking at the hardware --
And that the OS had nothing to do with it...
And that HT could simultaneously run two threads, without OS intervention...
If hyper-threading could run two threads simultaneously it wouldn't need to exist in the first place. That's what you have two cores for. HT optimises a single core's processing of two threads but cannot run them at the same time.
If the OS has no knowledge of HT but does support SMP it will not be very effective at all, as a single core with HT is nothing nearly as powerful as two cores, even in ideal circumstances where the OS schedules with HT in mind.
Erm...no, apples and oranges there.
Where to start...I am not good at writing walls of text.
When OS feed the data and instruction into CPU, the instructions get decoded and low-level instructions(ops) are created. The instructions are feeded by OS in particular order but ops are processed in different order. Why?
Imagine there is a request to sum 4 values - 1+2+3+4.
Normaly, this would take 3 cycles - 1+2, 3+3 and then 6+4. This is very inefficient.
Our example is simple but real code consists of lots of conditions that will determine next step depending on result - if value y, then run z, if value i, then run g, etc.
To help with this, there is used pre-decoding and branching prediction in decoding process. In that process, decoder is trying to figure out what the results and next step will be without actually creating ops, the information is stored in branching prediction table.
With branching prediction, our example would get processed as follows:
In first cycle, values 1+2 and 3+4 are summed up.
In second cycle, we get 3+7.
We just saved 1 cycle!
But in order for this to work, CPU needs to keep track what ops belongs to what instruction since ops are processed in different order then ops. This track is stored in registers.
There is lots of scheduling done to utilize execution units. OPs are processed in different order so the idle time is minimized and single thread is processed parallely - different execution units and "sub-units" are processing ops simultaneously.
Yet, CPU utilization is low - there is still lots of branching dependent calculations, instructions requiring same execution unit, etc. The paralelism isn't all that great.
So now we have a picture how a single thread is processed parallely but how to increase the utilization even further? Let's process one more thread!
And here goes HyperThreading, CPU got 2 virtual processors, 2 registers to keep track of ops that belong to 2 different threads and OPs from 2 threads can processed simultaneously.
Very simplified but that should cover the basics how it works.
How much more powerful HT is? Well, that depends on task. Imo, for gaming it provides that very nice boost that dual core CPUs are lacking these days. The benchamrks are out there so everyone can make their own judgement.
Branch prediction is for branching. The name is descriptive. For example:
if (x > 3) {
x += 5;
} else {
x *= 7;
}
If this is in a loop and x has been greater than three the last thousand times, then branch prediction might guess that x will still be greater than three the next time and we're going to add five rather than multiply by seven. If there are latency issues such that waiting to find out for certain whether x is greater than three this time, too, it might well start speculative execution and compute x+5 and then figure out later whether we actually need that value. Or it might not; it depends on the architecture.
The addition example you gave doesn't involve any branching at all. Furthermore, that simple reordering of operations is something that a compiler can do at compile time (if it's a sensible thing to do on that particular architecture) so that you get the same benefit without even needing out of order execution.
But this has nothing to do with hyperthreading whatsoever. Hyperthreading does mean that a single core has multiple sets of registers and schedulers and such to handle multiple threads, but that doesn't allow both threads to execute instructions at the same time. Maybe one thread will figure out that it needs to compute 5+3 and the other thread figures out at the same time that it needs to compute 7*9. But that doesn't let the core actually compute 5+3 and 7*9 at the same time. Hyperthreading does not replicate the execution resources of a core, and while both threads may know what they need to compute, they still have to wait until there is an ALU free to actually carry out the desired computation.
Still stuck on OS level...pointless discussion.
Umm...
You can only add 1 thing at a time, because you only have 1 set of transistors dedicated to integer addition per core. There are other floating point adders, and various multimedia extensions that include an addition function, but the ADD assembly instruction uses one particular set of transistors (the "adder", oddly enough), and there are only 1 set of those per core. Assembly instructions, as you may well know, translate directly into bytecode, which is what runs directly on the CPU.
How do 1+2 and 3+4 get summed at the same time? Apart from that, a decent explanation of dynamic execution and how the CPU can use branch prediction to look ahead to try to optimize the instruction queue (although Quiz gets it better, your just kinda scratching the surface there, mostly because your doing it by accident thinking your talking about something else entirely), but that isn't the same thing at all as Hyperthreading. The older NetBurst P4's where infamous for having very deep instruction queues, which allowed them to do very extravanant out-of-order tricks. It turned out that deep queues really need faster cycles to get more performance, and Intel throttled back with the Core line, which actually has a smaller pipeline.
Also, your 1+2+3+4 example is the same thread. Or at least I am assuming, because you put it on the same line of psuedocode. You could split it into multiple threads... but then you get into when you switch those threads, which you claim is definitely OS territory.
Yeah, tell me about accidental thinking when you ask those questions...
Yeah, tell me about accidental thinking when you ask those questions...
I'm glad there is a Block feature on these forums. I'm sorry I even took the bait from this guy ><
When you do not truly understand how it works and I explain it to you, you just add your ignorance/misconceptions and keep arguing...viscious circle. You either want to discuss/learn or you want to argue, it is your choice.
http://en.wikipedia.org/wiki/Instruction-level_parallelism
There is 17 execution units that can run 8 ops within 1 cycle. Haswell got quite beefed up in parallel processing after Sandy and Ivy bridge.