Pascal and async, muddy waters?

Hrimnir · May 2016

http://www.pcgamer.com/gtx-1080-review/

"As GPUs have become more programmable and more powerful, the workloads they run have changed. Early programmable shaders had some serious limitations, but with DX11/DX12 and other APIs, whole new realms of potential are opened up. The problem is that GPUs have generally lacked certain abilities that still limit what can be done. One of those is workload preemption, and with Pascal Nvidia brings their hardware up to the level of CPUs.

Maxwell had preemption at the command level, and now Pascal implements preemption at the pixel level for graphics, thread level for DX12, and instruction level for CUDA programs. This is basically as far as you need to go with preemption, as you can stop the rendering process at a very granular level. When utilized properly, preemption requires less than 100us and allows effective scheduling of time sensitive operations—asynchronous time warp in VR being a prime example.

Pascal has also added dynamic load balancing (for when a GPU is running graphics plus compute), where Maxwell used static partitioning. The goal is to allow a GPU to run both graphics and compute, since Nvidia GPUs can only run graphics commands or compute commands in a specific block. With Maxwell, the developer had to guess the workload split; now with Pascal, when either workload finishes, the rest of the GPU can pick up the remaining work and complete it more quickly.

AMD has made a lot of noise about GCN's asynchronous compute capabilities. What Nvidia is doing with preemption and dynamic load balancing isn't the same as asynchronous shaders, but it can be used to accomplish the same thing. We'll have to see how this plays out in developer use, but the potential is here. If you're wondering, AMD's async compute relies on the GCN architecture's ability to mix compute and graphics commands without the need to partition workloads. Preemption is still needed for certain tasks, however, like changing from one application workload to another."

So, i'll be honest and state my knowledge of the inner workings of how GPU's operate is limited. So what I am gathering from this is that GCN has hardware level asynchronous compute ability, and that Pascal essentially has instruction level async compute. So, not quite the same. However it *appears* that functionally it may not be an issue that it's not hardware level as at least from this review, that the preemption effectively allows for the same effect.

All that being said of course since it is a software level (albeit low level) it does mean it will be reliant upon the software developer to properly implement it.

Please feel to correct me if I'm wrong or I am misunderstanding the significance or lack thereof of these statements.

Cleffy · May 2016

DX12 and Vulkan really does change the game in how GPUs work. If you look at benchmarks there is only 1 DX12 title. It shows the 1080 getting beat by AMD R9 Nano's in Crossfire. The Nano itself is equivalent to the R9 290X. When looking at cards that have similar price points, the AMD cards are handedly ahead of their nVidia counterparts.
Pre-emption isn't going to magically make 8 schedulers appear. It's just a smarter scheduler trying to do the task of 8 dumber schedulers. Sure it can do it faster, but no way can it keep up with the volume. Its just a short term fix. It's also not as important as its made out to be. There is a substantial performance benefit, but its not on the scale of 800% because the workloads really don't over-saturate the schedulers.

Also how these shaders and game engines are built won't change just because nVidia isn't on board. Both major consoles use GCN based hardware. The console developers and those who port to PC will continue maximizing the capability of the GCN hardware. It also doesn't seem like this will be different for any title developed using DX12 or Vulkan. The difference in performance is just too great for it to be architecture specific optimizations.

I think this year will be a hold out year for nVidia as they properly integrate async capability into a future architecture much the same way they did with Fermi until Maxwell arrived. At least this time around they didn't use wood screws.

Cleffy · May 2016

Because they will. Only about 5% of new games offer DX12 right now. That may go up with Microsoft's push to get pc games on the Microsoft Store through console ports.

Gdemami · May 2016

Cleffy said:

Because they will. Only about 5% of new games offer DX12 right now.

...and how many of those use, and more importantly benefit, from async computation?

CPU isn't a bottleneck with the exception of strategy titles thus it is questionable how much more performance an average title would get off the feature.

Time will tell.

Cleffy · May 2016

Nearly all DX12 titles thus far have seen sizable benefits using AMD GPUs. There is a bit more to FPS that is happening which is benefiting the GCN hardware.

Gdemami · May 2016

Cleffy said:

Nearly all DX12 titles thus far have seen sizable benefits using AMD GPUs. There is a bit more to FPS that is happening which is benefiting the GCN hardware.

Yes, but that neither means use of async compute(and benefit from it) and generely neither is no clear explanation for the significant performance difference.

Hrimnir · May 2016

Cleffy said:

Nearly all DX12 titles thus far have seen sizable benefits using AMD GPUs. There is a bit more to FPS that is happening which is benefiting the GCN hardware.

To be fair, that could be more just because AMD sucks at writing drivers (even by their own admission). Nvidia employs 2 full time people whose entire job is to write drivers, AMD doesn't even have 1, they literally just take someone off something else and make them bang something out real quick. So the fact that they are seeing gains could easily just be because their drivers had a lot more overhead than nvidia.

Secondarily. I still hesitate with both of the listed games as being a good purveyor of DX12, ashes of singularity by the developers own admission is not a good use case for async compute, and the other game kind of had dx12 hacked in, as in it wasn't designed from the ground up.

Again, i really don't know. I'm always a skeptic, particularly when something that's THAT specific gets heralded as the second coming of christ by the company who happens to support it. Nvidia has done that kind of nonsense with some of their gameworks stuff that didn't pan out to be any significant benefit. Honestly i feel its a matter of wait and see. However, from all the technical stuff i am reading from people in the know, the things they implemented with pascal should clear up the majority of "issues" with async compute.

Malabooga · May 2016

Wut? Does that means NVidia sucks at writing drivers because their DX12 driver is so bad? Are you completely out of your mind? AMD doesnt have anyone to "write" drivers? Give sme source on that.

They are all currently doing only ligth Async because NVidia is incapable of performing Async. Both devs have said its easy to implement and thats easy performance win. Hotman lead dev even prodded NVidia and their "Async". Also consoles get up to 30% performance from Async. So yeah potential IS there.

NVidia still cant do it, they still have to switch around graphic/compute tasks which still takes time. Its improved over Maxwell but still very rudimentary, most people guess that at least they wont LOSE performance from using Async like Maxwell did

As for Async, Deus Ex will use it, Warhammer will use it. And im pretty sure every EA Frostbyte game will use it (theres rumors EA will go DX12 exclusive as they say advantages are tremenduous by ditching DX11 completely as current games deploy only fraction of what DX12 brings ot the table)

The sooner DX11 is history the better. And its happening. 2016 is already DX12 year as pretty much every AAA game will have DX12.

TO put it into perspective, GTX1080 is 20-25% faster than Titan X/980ti and that is touted as "great". But 15-20% feee performance boost just from Async is somehow "worthless".

And, to burst some bubbles before they even emerge - Vulkan can use Async too.

Phry · May 2016

DMKano said:

Cleffy said:

Because they will. Only about 5% of new games offer DX12 right now. That may go up with Microsoft's push to get pc games on the Microsoft Store through console ports.

It always depends on what the game developers choose to support - and that's almost always driven by what majority users are running.

This is the reason why no dev studio right now is going crazy over anything that has limited market - like VR and DX12.

Game studios go for max sales potential - so whatever is the most available - that's usually the best platform to target.

At the moment, that is probably DX11, as Win7 is still the most used OS, that may change in the future, but we're probably still years away from seeing a significant implementation of DX12, i don't think a few console ports to PC that are limited to UWP is really going to influence anyone, at the moment there is no guarantee that UWP will even succeed, it may well just be another GFWL thing, and like Dx12 there is no incentive to develop for it unless there is a large enough userbase supporting it, which isn't the case at the moment, not by a long shot.

Malabooga · May 2016

Sorry, but Win10 is most used OS among gamers. Publisher target specific group among PC users, not global OS install base.

MS pushes DX12 hard. They already have 3 DX12 exclusives (GOWU/QB/Forza). And in fact they will push it even harder (exlusive Win10 support for new (and some old-er) CPUs from both AMD/Intel and so on)

Vrika · May 2016

Malabooga said:

Sorry, but Win10 is most used OS among gamers. Publisher target specific group among PC users, not global OS install base.

Publishers don't do arbitrary limitations to their target audience.

Remember when DX10 was launched? Microsoft made Halo DX10 exclusive to generate some hype, but large publishers all supported DX9 for years.

Malabooga · May 2016

oh they will, especially with "Incentives"

EA already hinted of being DX12 exclusive quite fast on Frostbyte.

Malabooga · May 2016

Pascal "improved" Async:

directx-12-async-compute-performance-4k-gtx-1080-fury-x-gtx-980-ti

It loses less performance than Maxwell lol

SEANMCAD · May 2016

Hrimnir said:

.. but with DX11/DX12 and other APIs...

as a side note..

that kind of phrasing really ticks me off 'and other' like what OpenGL and....hmmm?

SEANMCAD · May 2016

Vulkan on a 1080 smashes expectations and DX12 is currently not working at all.

how many times did the article mention Vulkan? zero
How many times did it mention DX12? more than once.

SpottyGekko · May 2016

Malabooga said:

Sorry, but Win10 is most used OS among gamers. Publisher target specific group among PC users, not global OS install base.

....

Ya think ?

The latest Steam survey numbers put both Win 10 and Win 7 at around 39% of responders, with Win 8 at almost 13%.

What is your source for that Win 10 dominance claim ?

SEANMCAD · May 2016

umm yeah...umm Vulkan?

which works on all of it

Quizzical · May 2016

The question isn't what the GPU can do eventually. The question is how fast it can do things. I read somewhere that Pascal's pre-emption takes about 100 microseconds to save everything off and switch contexts, and that's an improvement over Maxwell, which would have taken milliseconds. But it's still not something you want to do several times per frame.

What matters is not how fast a GPU can render a game in a particular manner. What matters is how fast it can render the game at given settings, regardless of what is going on under the hood. Sometimes different GPUs will take wildly different approaches to reach the same result, though this is the case less than it used to be. Asynchronous compute is a means, not an end. It's something to care about when trying to figure out why this game is relatively more pro-AMD and that one more pro-Nvidia, but not something that consumers should worry about too much in itself.

If a GTX 1080 can't do asynchronous compute at all, but is twice as fast as a Fury X in games that use it, then the card is awesome and no one should care that it doesn't technically support asynchronous compute. And if a GTX 1080 is half as fast as a Fury X in games that use it, then no matter how well it supports compute, the card is junk because it's slow. The problem with asynchronous compute on Maxwell is that, when games could take advantage of the feature, AMD GPUs performed much better relative to Maxwell than in games that couldn't use asynchronous compute.

Ridelynn · May 2016

So I can test my understanding here, let me try to tl;dr this:

Fury / Fury X are faster at DX12 than DX11, with Async Compute
980 / 1080 are faster at DX11 than DX12, with Async Compute

What I don't understand...
why does this matter so much, since the 1080 is faster than the Fury / Fury X at both DX11 and DX12

Now, this may not hold true once Vega comes out. It probably will remain unchanged with Polaris, since that isn't likely to be faster than Fury. But even if the 1080 is slower at DX12 than DX11 (marginally, it's not a huge amount), it still beats the Fury at DX12 even though the Fury gets a performance bump by using DX12.

Or are we all scandalizing the fact that the 1080 "only" performs roughly the same at DX11 and DX12. 2%....

Malabooga · May 2016

Ask Nvidia why are they advertising this "feature" they obviously dont really have..

This ISNT just about 1080 but all Pascal cards vs all AMD cards. When someone buys a card he looks at similar price points. Do you honestly believe NVidia will sell 10-15% faster GPUs at same price as AMD just to compensate Async?

This is present, DX12 with Async is here. And Vulkan also has Async.

Quizzical · May 2016

Hrimnir said:

Cleffy said:

Nearly all DX12 titles thus far have seen sizable benefits using AMD GPUs. There is a bit more to FPS that is happening which is benefiting the GCN hardware.

To be fair, that could be more just because AMD sucks at writing drivers (even by their own admission). Nvidia employs 2 full time people whose entire job is to write drivers, AMD doesn't even have 1, they literally just take someone off something else and make them bang something out real quick. So the fact that they are seeing gains could easily just be because their drivers had a lot more overhead than nvidia.

I'd be absolutely shocked if Nvidia and AMD don't both have a lot more than two people working on drivers. Video drivers are incredibly complicated, and not the sort of thing that one person can crank out on his own. Among other things, modern video drivers have to compile source code from many different shader and kernel languages to many different GPU architectures.

Hrimnir · May 2016

I'll have to dig up the link quizzical, but there was a statement in a forum where someone had shown links to statements from both companies in that regards. I may not have book marked it. But at the time I read it they did provide links from employees within both companies that verified it.

Like everything that could have been wrong or changed by now.

Malabooga · May 2016

Until recently AMD had monthly WHQL driver releases. So no, they certainly did not have "noone working on drivers". You fell for some propaganda from certain company.

Quizzical said:

The question isn't what the GPU can do eventually. The question is how fast it can do things. I read somewhere that Pascal's pre-emption takes about 100 microseconds to save everything off and switch contexts, and that's an improvement over Maxwell, which would have taken milliseconds. But it's still not something you want to do several times per frame.

What matters is not how fast a GPU can render a game in a particular manner. What matters is how fast it can render the game at given settings, regardless of what is going on under the hood. Sometimes different GPUs will take wildly different approaches to reach the same result, though this is the case less than it used to be. Asynchronous compute is a means, not an end. It's something to care about when trying to figure out why this game is relatively more pro-AMD and that one more pro-Nvidia, but not something that consumers should worry about too much in itself.

If a GTX 1080 can't do asynchronous compute at all, but is twice as fast as a Fury X in games that use it, then the card is awesome and no one should care that it doesn't technically support asynchronous compute. And if a GTX 1080 is half as fast as a Fury X in games that use it, then no matter how well it supports compute, the card is junk because it's slow. The problem with asynchronous compute on Maxwell is that, when games could take advantage of the feature, AMD GPUs performed much better relative to Maxwell than in games that couldn't use asynchronous compute.

Nvidia will show only "NVidia approved" PR benchmarks that completely omit Async Compute, but OTOH they will claim they have it and that there are actual benefits in having it just like AMD has and mislead their customers.. Saying that consumer can just be ignorant abou it....no.

Once NVidia includes picture like the one up there to give complete information....until then its buyers beware.

Malabooga · May 2016

DX12 + Async confirmed for Civ VI

http://www.bitsandchips.it/52-english-news/7034-civilization-vi-will-have-a-solid-directx-12-backend

Howdy, Stranger!

Pascal and async, muddy waters?

Comments

Howdy, Stranger!

Quick Links

Pascal and async, muddy waters?

Comments