AMD's Hybrid Ray Tracing Tech - Texture processor based ray tracing accelerator method and system

AmazingAvery · July 2019

While Nvidia's RTX line of cards has the support of additional hardware with around 10-25% of Turing's GPU real estate being reserved for Ray Tracing cores, AMD's forthcoming NAVI cards won't be able to ray trace. That hasn't stopped AMD working on ray tracing who has filed a patent here: http://www.freepatentsonline.com/y2019/0197761.html
The patented describes the approach much like Nvidia's, as a hybrid one, calling it “Texture processor based ray tracing accelerator method and system.”.

Around E3 there was confirmations and commentary on the PS5 and next Xbox both supporting ray tracing so it's may not as unsurprising to see the patent details.

AMD's take on Ray Tracing techniques:

Software:

Ray tracing is a rendering technique that generates three-dimensional (3D) imagery by simulating the paths of photons in a scene. There are two primary approaches for implementing ray tracing: software based solutions that implement ray tracing purely in compute unit based shaders and fully hardware based solutions that implement the full ray tracing pipeline in hardware. Software based ray tracing solutions suffer drastically from the execution divergence of bounded volume hierarchy (BVH) traversal which can reduce performance substantially over what is theoretically achievable. Additionally, software based solutions fully utilize the shader resources, which prevents material shading and other work from being processed concurrently. Moreover, software based solutions are very power intensive and difficult to scale to higher performance levels without expending significant die area.

Hardware:

While hardware based solutions may have better performance and efficiency than software based solutions because they can completely eliminate divergence, they suffer from a lack of programmer flexibility as the ray tracing pipeline is fixed to a given hardware configuration. Hardware based solutions are also generally fairly area inefficient since they must keep large buffers of ray data to reorder memory transactions to achieve peak performance. These large buffers can be over twice as large as the fixed function logic that does the calculation. Moreover, fixed function hardware based solutions generally have high complexity as they have to replicate the scheduling of ray processing that would ordinarily be handled automatically in a software based solution.

What does AMD mean by hybrid approach:

The hybrid approach (doing fixed function acceleration for a single node of the BVH tree) and using a shader unit to schedule the processing addresses the issues with solely hardware based and/or solely software based solutions. Flexibility is preserved since the shader unit can still control the overall calculation and can bypass the fixed function hardware where needed and still get the performance advantage of the fixed function hardware. In addition, by utilizing the texture processor infrastructure, large buffers for ray storage and BVH caching are eliminated that are typically required in a hardware raytracing solution as the existing VGPRs and texture cache can be used in its place, which substantially saves area and complexity of the hardware solution.

The system includes a shader, texture processor (TP) and cache, which are interconnected. The TP includes a texture address unit (TA), a texture cache processor (TCP), a filter pipeline unit and a ray intersection engine. The shader sends a texture instruction which contains ray data and a pointer to a bounded volume hierarchy (BVH) node to the TA. The TCP uses an address provided by the TA to fetch BVH node data from the cache. The ray intersection engine performs ray-BVH node type intersection testing using the ray data and the BVH node data. The intersection testing results and indications for BVH traversal are returned to the shader via a texture data return path. The shader reviews the intersection results and the indications to decide how to traverse to the next BVH node.

AMD's take is that this hybrid approach will address some issues that can be found with solely hardware-based ray tracing solutions, and will bring major performance improvements to games taking advantage of it.

We'll have to wait and see....

Gdemami · July 2019

AmazingAvery said:

around 25% of Turing's GPU real estate being reserved for Ray Tracing cores

...nonsense.

Connmacart · July 2019

And the patent was filed on 12/22/2017 So not really something they had to drum up because of Nvidia RTX

AmazingAvery · July 2019

Gdemami said:

AmazingAvery said:

around 25% of Turing's GPU real estate being reserved for Ray Tracing cores

...nonsense.

When looking at the TU102 it looks from high res pictures to be as high as that but on lower SKUs up to 10% as relative to overall die size. Unless you can show otherwise?

Gdemami · July 2019

AmazingAvery said:
When looking at the TU102 it looks from high res pictures to be as high as that but on lower SKUs up to 10% as relative to overall die size. Unless you can show otherwise?

That is not how it works, there is no such thing as "Ray Tracing core".

Think of tensors as mathematical operation of matrix multiplication and addition - [4x4] * [4x4] + [4x4]. It is just a matter of "organization" of ALUs performing basic operands.

Same ALUs used for any other computation, there are no extra "cores".

Vrika · July 2019

Gdemami said:

AmazingAvery said:
When looking at the TU102 it looks from high res pictures to be as high as that but on lower SKUs up to 10% as relative to overall die size. Unless you can show otherwise?

That is not how it works, there is no such thing as "Ray Tracing core".

Think of tensors as mathematical operation of matrix multiplication and addition - [4x4] * [4x4] + [4x4]. It is just a matter of "organization" of ALUs performing basic operands.

Same ALUs used for any other computation, there are no extra "cores".

You're speaking about tensor cores.

RTX cards have RT cores for ray tracing. Afaik those are pure ray tracing cores.

AmazingAvery · July 2019

Gdemami said:

AmazingAvery said:
When looking at the TU102 it looks from high res pictures to be as high as that but on lower SKUs up to 10% as relative to overall die size. Unless you can show otherwise?

That is not how it works, there is no such thing as "Ray Tracing core".

Think of tensors as mathematical operation of matrix multiplication and addition - [4x4] * [4x4] + [4x4]. It is just a matter of "organization" of ALUs performing basic operands.

Same ALUs used for any other computation, there are no extra "cores".

Yes, as mentioned there is physical hardware called RT cores on Nvidia RTX cards.

from wiki

Ray-tracing (RT) cores
- bounding volume hierarchyacceleration[5]
- shadows, ambient occlusion, lighting, reflections

Gdemami · July 2019

Vrika said:
You're speaking about tensor cores.

RTX cards have RT cores for ray tracing. Afaik those are pure ray tracing cores.

Same thing. These are not extra "cores", they are just modes for SM.

Gdemami · July 2019

AmazingAvery said:
Yes, as mentioned there is physical hardware called RT cores on Nvidia RTX cards.

No, it isn't.

Those are just mathematical operations, simple ones, performed in a cluster, it is an extension of ALU unit, not a separate computational unit itself.

Quizzical · July 2019

AmazingAvery said:

Gdemami said:

AmazingAvery said:

around 25% of Turing's GPU real estate being reserved for Ray Tracing cores

...nonsense.

When looking at the TU102 it looks from high res pictures to be as high as that but on lower SKUs up to 10% as relative to overall die size. Unless you can show otherwise?

We don't really know how much of the extra size of the TU10* dies is due to ray-tracing versus how much is due to tensor cores. All of the Turing dies either have both or neither.

Quizzical · July 2019

Gdemami said:

AmazingAvery said:
When looking at the TU102 it looks from high res pictures to be as high as that but on lower SKUs up to 10% as relative to overall die size. Unless you can show otherwise?

That is not how it works, there is no such thing as "Ray Tracing core".

Think of tensors as mathematical operation of matrix multiplication and addition - [4x4] * [4x4] + [4x4]. It is just a matter of "organization" of ALUs performing basic operands.

Same ALUs used for any other computation, there are no extra "cores".

The tensor cores in Volta and Turing are completely separate logic from the ray-tracing stuff. Tensor cores are really a machine learning ASIC part of the chip, in the same sense that the video decode block is an ASIC for video decoding. Tensor cores aren't useful for graphics, whether rasterized or ray-traced. And yes, I'm aware of DLSS; that's not useful for anything besides marketing.

I'm not sure exactly what Nvidia did for ray-tracing, but what counts as a "core" doesn't have a completely clear definition. Is a shader a "core"? If so, then what about the special function units? Are those cores on their own, or do you have to combine them with a shader to be a core? What about when Turing/Volta split the integer and floating-point functionality into separate parts? Are those each cores, or do you have to pick one of each to make a core? If it takes one each of multiple things to make a core, then what if there is more of one type of thing than another?

And what about other things on a GPU? Is a texture mapping unit a core? How about a raster unit? A tessellation unit? A render output? Even if you say "no" because they're fixed function logic that isn't functional on its own, that's true of shaders, too.

Gdemami · July 2019

Quizzical said:
Tensor cores aren't useful for graphics, whether rasterized or ray-traced.

Doesn't matter. They are still part of the SM, same way as RT "cores" but are easier to explain.

No matter whether you are doing rasterization, raytracing or matrix computations, you are still using same SM. There are no extra physical circuits to do the computation, they all use same ALUs.

Quizzical · July 2019

Gdemami said:

Quizzical said:
Tensor cores aren't useful for graphics, whether rasterized or ray-traced.

Doesn't matter. They are still part of the SM, same way as RT "cores" but are easier to explain.

No matter whether you are doing rasterization, raytracing or matrix computations, you are still using same SM. There are no extra physical circuits to do the computation, they all use same ALUs.

Tensor cores are based on logic that goes inside of a compute unit, but not everything on a GPU is. Blending and transparency are handled by the render outputs that are not inside of a compute unit, for example. Tessellation units are inside of a compute unit on Nvidia GPUs, but not on AMD. I'm not sure exactly what is handled on GeForce cards at the GPC level, but whatever it is, it's not inside of a compute unit, either.

Gdemami · July 2019

Quizzical said:
I'm not sure exactly what is handled on GeForce cards at the GPC level, but whatever it is, it's not inside of a compute unit, either.

...where else do you think it is processed? lol

Just look how GTX card is processing raytracing - within SM on Cuda.

RTX card is not doing it differently, it only has added pipeline for intersection evaluation - 1x "RT Core" per SM, which saves tons of cycles and speed up the raytracing algorithm significantly.

"RT core" is just an extension for ALU.

Vrika · July 2019

Gdemami said:

Quizzical said:
I'm not sure exactly what is handled on GeForce cards at the GPC level, but whatever it is, it's not inside of a compute unit, either.

...where else do you think it is processed? lol

Just look how GTX card is processing raytracing - within SM on Cuda.

RTX card is not doing it differently, it only has added pipeline for intersection evaluation - 1x "RT Core" per SM, which saves tons of cycles and speed up the raytracing algorithm significantly.

"RT core" is just an extension for ALU.

According to NVidia's graphs intersection evaluations for the rays are done within the RT core

Link to larger image: https://i.imgur.com/kf6noSM.jpg

Gdemami · July 2019

Vrika said:
According to NVidia's graphs intersection evaluations for the rays are done within the RT core

...well, that's what I said.

Vrika · July 2019

Gdemami said:

Vrika said:
According to NVidia's graphs intersection evaluations for the rays are done within the RT core

...well, that's what I said.

Then we can agree that NVidia put a lot of dedicated physical hardware for ray-tracing on those cards. And it looks like it's so specific that it's not used for anything else.

I think it's irrelevant whether it's an extension to ALU or something else. It's dedicated hardware.

Gdemami · July 2019

Vrika said:
Then we can agree that NVidia put a lot of dedicated physical hardware for ray-tracing on those cards.

..no, of course we can't.

Vrika · July 2019

Quizzical said:

Gdemami said:

AmazingAvery said:
When looking at the TU102 it looks from high res pictures to be as high as that but on lower SKUs up to 10% as relative to overall die size. Unless you can show otherwise?

That is not how it works, there is no such thing as "Ray Tracing core".

Think of tensors as mathematical operation of matrix multiplication and addition - [4x4] * [4x4] + [4x4]. It is just a matter of "organization" of ALUs performing basic operands.

Same ALUs used for any other computation, there are no extra "cores".

The tensor cores in Volta and Turing are completely separate logic from the ray-tracing stuff. Tensor cores are really a machine learning ASIC part of the chip, in the same sense that the video decode block is an ASIC for video decoding. Tensor cores aren't useful for graphics, whether rasterized or ray-traced. And yes, I'm aware of DLSS; that's not useful for anything besides marketing.

I thought tensor cores would be used for denoising when you do ray-tracing so that they'd be useful for ray-tracing. Just useless when you aren't doing ray-tracing.

gervaise1 · July 2019

Mars_Orbital said:

think the bottom line is this :

N/via rushed with again brute force mentality "moar cores" while AMD has taken a much more thought out plan for ray tracing that will eventually become the standard, for which /vda will have to pay a licensing fee to use.

No. Your error is thinking that ray tracing is new. It isn't. Companies have been using ray tracing since before there were IBM PCs! Using dedicated - and expensive - graphics "workstations".

Dedicated workstations were eventually replaced by "professional" gpus. NVidia entered the market for specialised gpus c. 20 years ago and in that time have established a dominant position commanding c. 70% of the market; AMD have the rest basically today. (There were other companies.)

What has changed is that first NVidia and now AMD have decided to dangle ray tracing for games.

Why Turing? The answer probably has multiple reasons. Obvious ones probably include:
- an update for their professional cards;
- spread development costs across professional and gaming;
- a big marketing splash;
- an attempt to maintain the need for discrete gaming gpu relevant - which have been following the path of discrete sound cards into extinction. If - near term - developers create games that absolutely have to have ray tracing then, they hope. this should slow the onward march on on-board gpus.

AMD probably the same - although arguably they are in a much stronger position when it comes to discrete gpus (supplying both the next gen XBox and PS). Maybe they would be happy to see the demise of (most) gpus if they are able to get the bulk of the on-board gpu market!

Bottomline nothing has been "rushed" when it comes to ray tracing.

Cleffy · July 2019

AMD was actually very early with ray-tracing, and there is a reason they haven't attempted to tackle it again. There was real time raytracing from AMD since the start of their VLIW architecture in 2006. But the graphics were subpar compared to what you got with deferred lighting.
Dedicated hardware makes sense, but it's wasted resources for what amounts to more accurate reflections than cube maps. You could probably do it with the current stream processors from both AMD and nVidia. As of right now without AMD supporting raytracing, and only high end nVidia cards supporting it there won't be much developer support. I expect any future iterations won't support NVidias solution.

Quizzical · July 2019

Vrika said:

Quizzical said:

Gdemami said:

AmazingAvery said:
When looking at the TU102 it looks from high res pictures to be as high as that but on lower SKUs up to 10% as relative to overall die size. Unless you can show otherwise?

That is not how it works, there is no such thing as "Ray Tracing core".

Think of tensors as mathematical operation of matrix multiplication and addition - [4x4] * [4x4] + [4x4]. It is just a matter of "organization" of ALUs performing basic operands.

Same ALUs used for any other computation, there are no extra "cores".

The tensor cores in Volta and Turing are completely separate logic from the ray-tracing stuff. Tensor cores are really a machine learning ASIC part of the chip, in the same sense that the video decode block is an ASIC for video decoding. Tensor cores aren't useful for graphics, whether rasterized or ray-traced. And yes, I'm aware of DLSS; that's not useful for anything besides marketing.

I thought tensor cores would be used for denoising when you do ray-tracing so that they'd be useful for ray-tracing. Just useless when you aren't doing ray-tracing.

I haven't studied denoising algorithms, so I don't know if that's the case. Regardless, there is a big difference between:

1) You could technically use the tensor cores for denoising,
2) Using tensor cores for denoising runs fast enough to be a worthwhile thing to do if you have them available, and
3) Tensor cores are such an enormous speed improvement for denoising as to justify the die space they take on the GPU.

(1) is likely to be true. I'm more skeptical of (2), but it's at least plausible. I'm extremely skeptical of (3).

Doing a 4x4 matrix multiply-add of half-precision floating-point numbers is a really weird thing to do. Even if you need some half-precision arithmetic, that's far more rigid than ordinary packed math such as Vega or the lower end Turing cards offer. And that matrix multiply-add is 8 times the cost of an ordinary pair of half-precision fma operations, so you'd better be using a whole lot of the matrices to justify it.

Ozmodan · July 2019

Gdemami said:

Vrika said:
According to NVidia's graphs intersection evaluations for the rays are done within the RT core

...well, that's what I said.

Enough everyone on this board knows you spouting complete nonsense.

Howdy, Stranger!

AMD's Hybrid Ray Tracing Tech - Texture processor based ray tracing accelerator method and system

Comments

Howdy, Stranger!

Quick Links

AMD's Hybrid Ray Tracing Tech - Texture processor based ray tracing accelerator method and system

Comments