While Nvidia's RTX line of cards has the support of additional hardware with around 10-25% of Turing's GPU real estate being reserved for Ray Tracing cores, AMD's forthcoming NAVI cards won't be able to ray trace. That hasn't stopped AMD working on ray tracing who has filed a patent here:
http://www.freepatentsonline.com/y2019/0197761.htmlThe patented describes the approach much like Nvidia's, as a hybrid one, calling it “Texture processor based ray tracing accelerator method and system.”.
Around E3 there was confirmations and
commentary on the PS5 and next Xbox both supporting ray tracing so it's may not as unsurprising to see the patent details.
AMD's take on Ray Tracing techniques:
Software:
Ray tracing is a rendering technique that generates three-dimensional (3D) imagery by simulating the paths of photons in a scene. There are two primary approaches for implementing ray tracing: software based solutions that implement ray tracing purely in compute unit based shaders and fully hardware based solutions that implement the full ray tracing pipeline in hardware. Software based ray tracing solutions suffer drastically from the execution divergence of bounded volume hierarchy (BVH) traversal which can reduce performance substantially over what is theoretically achievable. Additionally, software based solutions fully utilize the shader resources, which prevents material shading and other work from being processed concurrently. Moreover, software based solutions are very power intensive and difficult to scale to higher performance levels without expending significant die area.
Hardware:
While hardware based solutions may have better performance and efficiency than software based solutions because they can completely eliminate divergence, they suffer from a lack of programmer flexibility as the ray tracing pipeline is fixed to a given hardware configuration. Hardware based solutions are also generally fairly area inefficient since they must keep large buffers of ray data to reorder memory transactions to achieve peak performance. These large buffers can be over twice as large as the fixed function logic that does the calculation. Moreover, fixed function hardware based solutions generally have high complexity as they have to replicate the scheduling of ray processing that would ordinarily be handled automatically in a software based solution.
What does AMD mean by hybrid approach:
The hybrid approach (doing fixed function acceleration for a single node of the BVH tree) and using a shader unit to schedule the processing addresses the issues with solely hardware based and/or solely software based solutions. Flexibility is preserved since the shader unit can still control the overall calculation and can bypass the fixed function hardware where needed and still get the performance advantage of the fixed function hardware. In addition, by utilizing the texture processor infrastructure, large buffers for ray storage and BVH caching are eliminated that are typically required in a hardware raytracing solution as the existing VGPRs and texture cache can be used in its place, which substantially saves area and complexity of the hardware solution.
The system includes a shader, texture processor (TP) and cache, which are interconnected. The TP includes a texture address unit (TA), a texture cache processor (TCP), a filter pipeline unit and a ray intersection engine. The shader sends a texture instruction which contains ray data and a pointer to a bounded volume hierarchy (BVH) node to the TA. The TCP uses an address provided by the TA to fetch BVH node data from the cache. The ray intersection engine performs ray-BVH node type intersection testing using the ray data and the BVH node data. The intersection testing results and indications for BVH traversal are returned to the shader via a texture data return path. The shader reviews the intersection results and the indications to decide how to traverse to the next BVH node.
AMD's take is that this hybrid approach will address some issues that can be found with solely hardware-based ray tracing solutions, and will bring major performance improvements to games taking advantage of it.
We'll have to wait and see....
Comments
Think of tensors as mathematical operation of matrix multiplication and addition - [4x4] * [4x4] + [4x4]. It is just a matter of "organization" of ALUs performing basic operands.
Same ALUs used for any other computation, there are no extra "cores".
RTX cards have RT cores for ray tracing. Afaik those are pure ray tracing cores.
from wiki
Those are just mathematical operations, simple ones, performed in a cluster, it is an extension of ALU unit, not a separate computational unit itself.
I'm not sure exactly what Nvidia did for ray-tracing, but what counts as a "core" doesn't have a completely clear definition. Is a shader a "core"? If so, then what about the special function units? Are those cores on their own, or do you have to combine them with a shader to be a core? What about when Turing/Volta split the integer and floating-point functionality into separate parts? Are those each cores, or do you have to pick one of each to make a core? If it takes one each of multiple things to make a core, then what if there is more of one type of thing than another?
And what about other things on a GPU? Is a texture mapping unit a core? How about a raster unit? A tessellation unit? A render output? Even if you say "no" because they're fixed function logic that isn't functional on its own, that's true of shaders, too.
No matter whether you are doing rasterization, raytracing or matrix computations, you are still using same SM. There are no extra physical circuits to do the computation, they all use same ALUs.
Just look how GTX card is processing raytracing - within SM on Cuda.
RTX card is not doing it differently, it only has added pipeline for intersection evaluation - 1x "RT Core" per SM, which saves tons of cycles and speed up the raytracing algorithm significantly.
"RT core" is just an extension for ALU.
Link to larger image: https://i.imgur.com/kf6noSM.jpg
I think it's irrelevant whether it's an extension to ALU or something else. It's dedicated hardware.
Dedicated workstations were eventually replaced by "professional" gpus. NVidia entered the market for specialised gpus c. 20 years ago and in that time have established a dominant position commanding c. 70% of the market; AMD have the rest basically today. (There were other companies.)
What has changed is that first NVidia and now AMD have decided to dangle ray tracing for games.
Why Turing? The answer probably has multiple reasons. Obvious ones probably include:
- an update for their professional cards;
- spread development costs across professional and gaming;
- a big marketing splash;
- an attempt to maintain the need for discrete gaming gpu relevant - which have been following the path of discrete sound cards into extinction. If - near term - developers create games that absolutely have to have ray tracing then, they hope. this should slow the onward march on on-board gpus.
AMD probably the same - although arguably they are in a much stronger position when it comes to discrete gpus (supplying both the next gen XBox and PS). Maybe they would be happy to see the demise of (most) gpus if they are able to get the bulk of the on-board gpu market!
Bottomline nothing has been "rushed" when it comes to ray tracing.
Dedicated hardware makes sense, but it's wasted resources for what amounts to more accurate reflections than cube maps. You could probably do it with the current stream processors from both AMD and nVidia. As of right now without AMD supporting raytracing, and only high end nVidia cards supporting it there won't be much developer support. I expect any future iterations won't support NVidias solution.
1) You could technically use the tensor cores for denoising,
2) Using tensor cores for denoising runs fast enough to be a worthwhile thing to do if you have them available, and
3) Tensor cores are such an enormous speed improvement for denoising as to justify the die space they take on the GPU.
(1) is likely to be true. I'm more skeptical of (2), but it's at least plausible. I'm extremely skeptical of (3).
Doing a 4x4 matrix multiply-add of half-precision floating-point numbers is a really weird thing to do. Even if you need some half-precision arithmetic, that's far more rigid than ordinary packed math such as Vega or the lower end Turing cards offer. And that matrix multiply-add is 8 times the cost of an ordinary pair of half-precision fma operations, so you'd better be using a whole lot of the matrices to justify it.