public:cs:rasterization_on_larrabee

差别

这里会显示出您选择的修订版和当前版本之间的差别。

到此差别页面的链接

两侧同时换到之前的修订记录 前一修订版
后一修订版
前一修订版
public:cs:rasterization_on_larrabee [2015/09/21 19:06] – [A high-level view of Larrabee rasterization] oakfirepublic:cs:rasterization_on_larrabee [2021/01/06 17:31] (当前版本) – [Tile assignment] oakfire
行 277: 行 277:
 非光栅化也很类似这个, 我们很快就会看到. 非光栅化也很类似这个, 我们很快就会看到.
 ==== Tile assignment ==== ==== Tile assignment ====
 +瓦片(Tile)的分配
  
 As noted earlier, Larrabee uses chunked (also known as binned or tiled) rendering, where the target is divided into multiple rectangles, called tiles. The rendering commands are sorted according to the tiles they touch and stored in the corresponding bins, and then the contents of each bin are rendered separately to the corresponding tile. It's a bit complex, but it considerably improves cache coherence and parallelization. As noted earlier, Larrabee uses chunked (also known as binned or tiled) rendering, where the target is divided into multiple rectangles, called tiles. The rendering commands are sorted according to the tiles they touch and stored in the corresponding bins, and then the contents of each bin are rendered separately to the corresponding tile. It's a bit complex, but it considerably improves cache coherence and parallelization.
 +
 +之前提到, Larrabee 使用块渲染, 把目标分成多个四边形, 称为瓦片(Tile). 渲染指令按瓦片被碰触的情况来排序,并存储到相应容器, 然后每个容器的指令分别执行渲染相应的瓦片. 这有点复杂,  但它改进了缓冲的一致性与并行性.
  
 For chunking, rasterization consists of two steps; the first identifies which tiles a triangle touches, and the second rasterizes the triangle within each tile. So it's a two-stage process, and I'm going to discuss the two stages separately. For chunking, rasterization consists of two steps; the first identifies which tiles a triangle touches, and the second rasterizes the triangle within each tile. So it's a two-stage process, and I'm going to discuss the two stages separately.
 +
 +对于分块, 光栅化由两步组成; 第一步确认哪些瓦片碰触到三角形, 第二步对每个瓦片进行光栅化. 所以这个过程有两个阶段, 我将会分开讨论这两个阶段.
  
 Figure 12 shows an example of a triangle to be drawn to a tiled render target. The light blue area is a 256x256 render target, subdivided into four 128x128 tiles. Figure 12 shows an example of a triangle to be drawn to a tiled render target. The light blue area is a 256x256 render target, subdivided into four 128x128 tiles.
 +
 +图例 12 显示了一个三角形将被画到一个瓦片的例子. 亮蓝色区域是 256x256 的渲染对象, 被分成四个 128x128 的瓦片.
  
 {{:public:cs:fig_12.jpg?direct|}} {{:public:cs:fig_12.jpg?direct|}}
  
 **Figure 12**: A triangle to be drawn to a 256x256 render target consisting of four 128x128 tiles. **Figure 12**: A triangle to be drawn to a 256x256 render target consisting of four 128x128 tiles.
 +
 +图例 12 : 一个三角形要被画到由四个 128x128 瓦片组成的 256x256 渲染对象上.
  
 With Larrabee's chunking architecture, the first step in rasterizing the triangle in Figure 12 is to determine which tiles the triangle touches, and put the triangle in the bins for those tiles, tiles 1 and 3. Once all triangles have been binned, the second step, intra-tile rasterization, is to rasterize the triangle to tile 1 when the tile 1 bin is rendered, and to rasterize the triangle to tile 3 when the tile 3 bin is rendered. With Larrabee's chunking architecture, the first step in rasterizing the triangle in Figure 12 is to determine which tiles the triangle touches, and put the triangle in the bins for those tiles, tiles 1 and 3. Once all triangles have been binned, the second step, intra-tile rasterization, is to rasterize the triangle to tile 1 when the tile 1 bin is rendered, and to rasterize the triangle to tile 3 when the tile 3 bin is rendered.
 +
 +对于 Larrabee 的分块架构, 光栅化图例 12 三角形的第一步是判定哪些瓦片碰触到三角形, 然后把三角形数据放到这些瓦片容器里. 所有三角形都放置完毕后, 第二步就到瓦片内部进行光栅化, 当瓦片 1 被渲染时进行光栅化瓦片 1 中的三角形, 当瓦片 3 被渲染时光栅化 瓦片 3 中的三角形.
  
 Assignment of triangles to tiles can easily be performed for relatively small triangles - say, up to a tile in size, which covers 90% of all triangles - by doing bounding box tests. For example, it would be easy with bounding box tests to find out what two tiles the triangle in Figure 12 is in. Larger triangles are currently assigned to tiles by simply walking the bounding box and testing each tile against the triangle; that doesn't sound very efficient, but tile-assignment time is generally an insignificant part of total rendering time for larger triangles, since there's usually a lot of shading work and the like to do for those triangles. However, if large-triangle tile assignment time does turn out to be significant, we could use a sweep approach, as discussed earlier, or a variation of the hierarchical approach used for intra-tile rasterization, which I'll discuss next. This is a good example of how a CPU makes it easy to use two completely different approaches in order to do the 90% case well and the 10% case adequately (or well, but in a different way), rather than having to have one size fit all. Assignment of triangles to tiles can easily be performed for relatively small triangles - say, up to a tile in size, which covers 90% of all triangles - by doing bounding box tests. For example, it would be easy with bounding box tests to find out what two tiles the triangle in Figure 12 is in. Larger triangles are currently assigned to tiles by simply walking the bounding box and testing each tile against the triangle; that doesn't sound very efficient, but tile-assignment time is generally an insignificant part of total rendering time for larger triangles, since there's usually a lot of shading work and the like to do for those triangles. However, if large-triangle tile assignment time does turn out to be significant, we could use a sweep approach, as discussed earlier, or a variation of the hierarchical approach used for intra-tile rasterization, which I'll discuss next. This is a good example of how a CPU makes it easy to use two completely different approaches in order to do the 90% case well and the 10% case adequately (or well, but in a different way), rather than having to have one size fit all.
 +
 +通过使用盒包围测试能让相对小的三角形易分配到瓦片 - 所谓相对小是瓦片大小能达到覆盖三角形的90% :?:. 例如, 很容易使用盒包围测试来找出图例 12 中三角形在哪两个瓦片中. 更大的三角形的分配是移动盒包围与测定每个瓦片相对三角形位置; 这听起来不怎么有效率, 但瓦片分配时间在整个大三角形渲染中通常微不足道. 然而, 如果大三角形瓦片分配时间比重变大, 我们可以使用之前提到的扫描法, 或者使用瓦片内部光栅化的分级方法的变种, 这之后会讨论到.  这是个好例子, 来看 CPU 怎么方便地使用两种完全不同的方法来干好 90% 的情况并适当处理(或是做好, 用另一种办法)剩余 10% 的情况, 而不是只能采用一种方法来通用处理.
  
 Large-triangle assignment to tiles is performed with scalar code, for simplicity and because it's not a significant performance factor. Let's look at how that scalar process works, because it will help us understand vectorized intra-tile rasterization later. I'll use a small triangle for the example, for simplicity and to make the figures legible, but as noted above, normally such a small triangle would be assigned to its tile or tiles using bounding box tests. Large-triangle assignment to tiles is performed with scalar code, for simplicity and because it's not a significant performance factor. Let's look at how that scalar process works, because it will help us understand vectorized intra-tile rasterization later. I'll use a small triangle for the example, for simplicity and to make the figures legible, but as noted above, normally such a small triangle would be assigned to its tile or tiles using bounding box tests.
 +
 +大三角形的瓦片分配使用标量代码执行, 因为简单而且它不是性能的重要因素. 让我们看看标量过程是怎么工作的, 这会帮助我们理解之后的瓦片内光栅化的向量化. 为了让图例简单易懂, 我将使用一个小三角形的例子, 但如上面提到过的, 正常情况下小三角形是用盒包围测试来分配瓦片的.
  
 Once we've set up the equation for an edge (by calculating B and C, as discussed when we looked at Figure 1), the first thing we do is calculate its value at the trivial reject corner of each tile. The trivial reject corner is the corner at which an edge's equation is most negative within a tile; the selection of the trivial reject corner for a given edge is based on its slope, as we'll see shortly. We set things up so that negative means inside in order to allow us to generate masks directly from the sign bit, so you can think of the trivial reject corner as the point in the tile that's most inside the edge. If this point isn't inside the edge, no point in the tile can be inside the edge, and therefore the whole triangle can be ignored for that tile. Once we've set up the equation for an edge (by calculating B and C, as discussed when we looked at Figure 1), the first thing we do is calculate its value at the trivial reject corner of each tile. The trivial reject corner is the corner at which an edge's equation is most negative within a tile; the selection of the trivial reject corner for a given edge is based on its slope, as we'll see shortly. We set things up so that negative means inside in order to allow us to generate masks directly from the sign bit, so you can think of the trivial reject corner as the point in the tile that's most inside the edge. If this point isn't inside the edge, no point in the tile can be inside the edge, and therefore the whole triangle can be ignored for that tile.
 +
 +当我们建立三角形一条边的方程式后( 计算 B 和 C 系数, 如图例 1 我们所讨论的), 我们做的第一件事是计算每个瓦片的平凡拒绝角:?:的值. 平凡拒绝角就是瓦片里该边方程式值最小的角; 该边的平凡拒绝角的选择基于边的斜率, 我们很快就会看到. 前面我们已经确立了负值意味着在内部以方便直接遮罩标志位, 所以你可以认为一个瓦片的平凡拒绝角是该瓦片在该条边上最内部的点. 如果这个点不在该边内部, 那么就该瓦片就没有点在该边内部, 因此该瓦片就可以忽略整个三角形.
  
 Figure 13 shows the trivial reject test in action. Tile 0 is trivially rejected for the black edge and can be ignored, because its trivial reject corner is positive, and therefore the whole tile must be positive and must lie outside the triangle, while the other three tiles must be investigated further. You can see here how the trivial reject corner is the corner of each tile most inside the black edge; that is, the point with the most negative value in the tile. Figure 13 shows the trivial reject test in action. Tile 0 is trivially rejected for the black edge and can be ignored, because its trivial reject corner is positive, and therefore the whole tile must be positive and must lie outside the triangle, while the other three tiles must be investigated further. You can see here how the trivial reject corner is the corner of each tile most inside the black edge; that is, the point with the most negative value in the tile.
 +
 +图例 13 显示了测定平凡拒绝角的手法. 瓦片 0 被黑边平凡拒绝, 可以忽略, 因为它的平凡拒绝角是正的值, 因此整个瓦片肯定都是正的,位于三角形之外, 而另外三个瓦片则需要进一步研究. 这里你可以看到平凡拒绝角是每个瓦片在黑边最内部的角; 即瓦片上值最小的点.
  
 {{:public:cs:fig_13.jpg?direct|}} {{:public:cs:fig_13.jpg?direct|}}
  
 **Figure 13**: The tile trivial reject test. **Figure 13**: The tile trivial reject test.
 +
 +图例 13 瓦片平凡拒绝角的测定.
  
 Note that which corner is the trivial reject corner will vary from edge to edge, depending on slope. For example, it would be the lower left corner of each tile for the edge shown in red in Figure 14, because that's the corner that's most inside that edge. Note that which corner is the trivial reject corner will vary from edge to edge, depending on slope. For example, it would be the lower left corner of each tile for the edge shown in red in Figure 14, because that's the corner that's most inside that edge.
 +
 +注意哪个角是平凡拒绝角会随着边变化, 取决于边斜率. 比如, 图例 14 中红边的平凡拒绝角是各个瓦片的坐下角, 因为这是该边最内部的角.
  
 {{:public:cs:fig_14.jpg?direct|}} {{:public:cs:fig_14.jpg?direct|}}
  
 **Figure 14**: Which corner is the trivial reject corner varies with edge slope. Here the lower left corner of each tile is the trivial reject corner. **Figure 14**: Which corner is the trivial reject corner varies with edge slope. Here the lower left corner of each tile is the trivial reject corner.
 +
 +图例 14 : 平凡拒绝角随边斜率而改变, 这里各个瓦片的平凡拒绝角是左下角.
  
 If you understand what we've just discussed, you're ninety percent of the way to understanding the whole Larrabee rasterizer. The trivial reject test is actually very straightforward once you understand it - it's just a matter of evaluating the sign of a simple equation at the right point - but it can take a little while to get it, so you may find it useful to re-read the previous section if you're at all uncertain or confused. If you understand what we've just discussed, you're ninety percent of the way to understanding the whole Larrabee rasterizer. The trivial reject test is actually very straightforward once you understand it - it's just a matter of evaluating the sign of a simple equation at the right point - but it can take a little while to get it, so you may find it useful to re-read the previous section if you're at all uncertain or confused.
 +
 +如果你理解了刚才我们所讨论的, 你就能理解整个 Larrabee 光栅化的百分之九十. 平凡拒绝角测试是很直接的, 只要你理解了它 - 它仅是关于对适当的点进行方程式求值的问题 - 但理解它需要花费一些时间, 如果你还是很困惑, 重读一遍前面内容会很有用.
  
 So that's the tile trivial reject test. The other tile test is the trivial accept test. For this, we take the value at the trivial reject corner (the corner we just discussed) and add the amount that the edge equation changes for a step all the way to the diagonally opposite tile corner, the tile trivial accept corner. This is the point in the tile where the edge equation is most positive; you can think of this as the point in the tile that's most outside the edge. If the trivial accept corner for an edge is negative, that whole tile is trivially accepted for that edge, and there's no need to consider that edge when rasterizing within the tile. So that's the tile trivial reject test. The other tile test is the trivial accept test. For this, we take the value at the trivial reject corner (the corner we just discussed) and add the amount that the edge equation changes for a step all the way to the diagonally opposite tile corner, the tile trivial accept corner. This is the point in the tile where the edge equation is most positive; you can think of this as the point in the tile that's most outside the edge. If the trivial accept corner for an edge is negative, that whole tile is trivially accepted for that edge, and there's no need to consider that edge when rasterizing within the tile.
 +
 +这就是平凡拒绝角测验. 另一个瓦片测验是平凡接受测验. 这个测验从获取平凡拒绝角((刚讨论过的)边方程式的值转变成获取该瓦片对角的, 即平凡接受角. 这是瓦片上边方程式值最大的点; 你可以认为是该瓦片上离该边最外部的点. 如果该边的平凡接受角是负的, 那么整个瓦片被该边平凡接受, 那么就没必要在该瓦片光栅化时考虑这条边了.
  
 Figure 15 shows the trivial accept test in action. Since the trivial accept corner is the corner at which the edge's equation is most positive, if this point is negative - and therefore inside the edge - all points in the tile must be inside the edge. Thus, tiles 0 and 1 are not trivially accepted for the black edge, because the equation for the black edge is positive at their trivial accept corners, but tiles 2 and 3 are trivially accepted, so rasterization of this triangle in tiles 2 and 3 can ignore the black edge entirely, saving a good bit of work. Figure 15 shows the trivial accept test in action. Since the trivial accept corner is the corner at which the edge's equation is most positive, if this point is negative - and therefore inside the edge - all points in the tile must be inside the edge. Thus, tiles 0 and 1 are not trivially accepted for the black edge, because the equation for the black edge is positive at their trivial accept corners, but tiles 2 and 3 are trivially accepted, so rasterization of this triangle in tiles 2 and 3 can ignore the black edge entirely, saving a good bit of work.
 +
 +图例 15 显示了如何测验平方接受角. 由于平凡接受角是边方程式值最大的角, 如果这个点是负值 - 因此在该边内部 - 那么该瓦片所有点肯定都在该边内部. 这样, 瓦片 0 和 1 不被黑边平凡接受, 因为它们的平凡接受角的黑边方程式值是正的, 但瓦片 2 和 3 被平凡接受, 那么瓦片 2 和 3 的光栅化就可以忽略整条黑边, 节省了好一些工作量.
  
 {{:public:cs:fig_15.jpg?direct|}} {{:public:cs:fig_15.jpg?direct|}}
  
 **Figure 15**: The tile trivial accept test. **Figure 15**: The tile trivial accept test.
 +
 +图例 15 : 平凡接受角测验.
  
 There's an important asymmetry here. When we looked at trivial reject, we saw that it applies to the whole triangle in the tile being drawn to, in the sense that trivially rejecting a tile against any edge means the triangle doesn't touch that tile, and so the triangle can be ignored in that tile. However, trivial accept applies only to the edge being checked; that is, trivially accepting a tile against an edge only means that specific edge doesn't have to be checked when rasterizing the triangle to that tile, because the whole tile is inside that edge; it has no direct implication for the whole triangle. The tile may be trivially accepted against one edge, but not against the others; in fact, it may be trivially rejected against one or both of the other edges, in which case the triangle won't be drawn to the tile at all. This is illustrated in Figure 16, where tile 3 is trivially accepted against the black edge, so the black edge wouldn't need to be considered in rasterizing the triangle to that tile, but tile 3 is trivially rejected against the red edge, and that means that the triangle doesn't have to be rasterized to tile 3 at all. There's an important asymmetry here. When we looked at trivial reject, we saw that it applies to the whole triangle in the tile being drawn to, in the sense that trivially rejecting a tile against any edge means the triangle doesn't touch that tile, and so the triangle can be ignored in that tile. However, trivial accept applies only to the edge being checked; that is, trivially accepting a tile against an edge only means that specific edge doesn't have to be checked when rasterizing the triangle to that tile, because the whole tile is inside that edge; it has no direct implication for the whole triangle. The tile may be trivially accepted against one edge, but not against the others; in fact, it may be trivially rejected against one or both of the other edges, in which case the triangle won't be drawn to the tile at all. This is illustrated in Figure 16, where tile 3 is trivially accepted against the black edge, so the black edge wouldn't need to be considered in rasterizing the triangle to that tile, but tile 3 is trivially rejected against the red edge, and that means that the triangle doesn't have to be rasterized to tile 3 at all.
 +
 +这里有个重要的不对等. 当我们看平凡拒绝, 我们发现它应用于整个三角形, 意味着任意边的平凡拒绝都意味整个三角形都不碰触该瓦片, 该瓦片可忽略整个三角形. 然而, 平凡接受只应用于所测验的边; 也就是, 一条边平凡接受一个瓦片只意味着该边在该瓦片光栅化时可不被检查, 因为整个瓦片都在该边内部; 对整个三角形而言则没有直接意义. 一个瓦片可能被一条边平凡接受(译者: 此处作者可能有笔误), 但这不意味着被其它边平凡接受:?:; 实际上, 它可能被一条或两条边平凡拒绝, 这种情况就是三角形不会画在该瓦片上. 图 16 有说明, 瓦片 3 被黑边平凡接受, 这样在光栅化三角形到该瓦片时就没必要考虑黑边, 但瓦片 3 被红边平凡拒绝, 这意味着三角形不需要在瓦片 3 中光栅化.
  
 {{:public:cs:fig_16.jpg?direct|}} {{:public:cs:fig_16.jpg?direct|}}
  
 **Figure 16**: Tile 3 is trivially accepted against the black edge, but trivially rejected against the red edge. **Figure 16**: Tile 3 is trivially accepted against the black edge, but trivially rejected against the red edge.
 +
 +图例 16 :瓦片 3 被黑边平凡接受, 但被红边平凡拒绝.
  
 In Figure 17, however, tile 3 is trivially accepted by all three edges, and here we come to a key point. In Figure 17, however, tile 3 is trivially accepted by all three edges, and here we come to a key point.
 +
 +然而在图例 17 中, 瓦片 3 被三条边平凡接受, 这里有个关键点.
  
 {{:public:cs:fig_17.jpg?direct|}} {{:public:cs:fig_17.jpg?direct|}}
  
 **Figure 17**: Tile 3 is trivially accepted against all three edges. **Figure 17**: Tile 3 is trivially accepted against all three edges.
 +
 +图例 17 :瓦片 3 被三条边平凡接受.
  
 If all three edges are negative at their respective trivial accept corners, then the whole tile is inside the triangle, and no further rasterization tests are needed - and this is what I meant earlier when I said the rasterizer takes advantage of CPU smarts by not-rasterizing whenever possible. The tile-assignment code can just store a draw-whole-tile command in the bin, and the bin rendering code can simply do the equivalent of two nested loops around the shaders, resulting in a full-screen triangle rasterization speed of approximately infinity - one of my favorite performance numbers! If all three edges are negative at their respective trivial accept corners, then the whole tile is inside the triangle, and no further rasterization tests are needed - and this is what I meant earlier when I said the rasterizer takes advantage of CPU smarts by not-rasterizing whenever possible. The tile-assignment code can just store a draw-whole-tile command in the bin, and the bin rendering code can simply do the equivalent of two nested loops around the shaders, resulting in a full-screen triangle rasterization speed of approximately infinity - one of my favorite performance numbers!
  
-By the way, this whole process should be familiar to 3-D programmers, because testing of bounding boxes against planes (for example, for frustum culling) is normally +如果三条边各自的平凡接受角都是负值, 那么整个瓦片都在三角形内部, 就不需要进一步光栅化了 - 这就是我之前所说的利用 CPU 的优势来尽可能的非光栅化. 瓦片分配的代码可以仅仅储存一个"画整个瓦片"的命令在容器里, 容器渲染代码可以简单做两个渲染器的嵌套循环:?:, 结果是无限接近于全屏三角形光栅化的速度 - 我最爱的性能数字之一. 
-done in exactly the same way - although in three dimensions instead of two - with the + 
-same use of sign to indicate inside and outside for trivial accept and reject. Also, structures such as octrees employ a 3-D version of the hierarchical recursion used by the Larrabee rasterizer..+By the way, this whole process should be familiar to 3-D programmers, because testing of bounding boxes against planes (for example, for frustum culling) is normally done in exactly the same way - although in three dimensions instead of two - with the same use of sign to indicate inside and outside for trivial accept and reject. Also, structures such as octrees employ a 3-D version of the hierarchical recursion used by the Larrabee rasterizer.
 + 
 +顺便一提, 3D 程序员应该对这整个过程比较熟悉, 因为平面盒包围测试(比如圆锥截取)通常使用相同的方法 - 虽然是三维取代了二维 - 同样使用了标记平凡拒绝与平凡接受来区分内部与外部. 同样, 八叉树的架构使用了 3D 版本的Larrabee 光栅器分层递归.
  
 That completes our overview of how rasterization of large triangles for tile assignment works. As I said, this is done as a scalar evaluation in the Larrabee pipeline, so the trivial accept and reject tests for each tile are performed separately. Intra-tile rasterization, to which we turn next, is much the same, but vectorized, and it is this vectorization that will give us insight into applying the Larrabee New Instructions to semi-parallel tasks. That completes our overview of how rasterization of large triangles for tile assignment works. As I said, this is done as a scalar evaluation in the Larrabee pipeline, so the trivial accept and reject tests for each tile are performed separately. Intra-tile rasterization, to which we turn next, is much the same, but vectorized, and it is this vectorization that will give us insight into applying the Larrabee New Instructions to semi-parallel tasks.
  
-Intra-tile rasterization: 16x16 blocks+这就是完整的大三角形瓦片分配的概况. 如我所说, 在 Larrabee 处理管道中这些是作为标量来计算的, 所以每个瓦片的平凡接受与平凡拒绝测验是分开执行的. 我们接下来讲的瓦片内光栅化, 也很类似, 但它要被向量化, 这些向量化可以让我们理解怎么在半并行化任务上应用 Larrabee 新指令集. 
 + 
 +==== Intra-tile rasterization: 16x16 blocks ==== 
  
 Intra-tile rasterization starts at the level of a whole tile. Tile size varies, depending on factors such as pixel size, but let's assume that we're working with a 64x64 tile. Given that starting size, we calculate the edge equation values at the 16 trivial reject and trivial accept corners of the 16x16 blocks that make up the tile, just as we did at the tile level - but now we do these calculations 16 at a time. Let's start with the trivial reject test. Intra-tile rasterization starts at the level of a whole tile. Tile size varies, depending on factors such as pixel size, but let's assume that we're working with a 64x64 tile. Given that starting size, we calculate the edge equation values at the 16 trivial reject and trivial accept corners of the 16x16 blocks that make up the tile, just as we did at the tile level - but now we do these calculations 16 at a time. Let's start with the trivial reject test.
 +
 +瓦片内光栅化先从整个瓦片级别开始. 瓦片尺寸是可变的,取决于像素点尺寸等因素, 但我们先假设处理的瓦片是 64x64. 从这个尺寸, 我们计算组成瓦片的 16 个块的平凡拒绝角与平凡接受角, 和我们之前在瓦片级别干的一样 - 但是现在我们是同时计算 16 个. 让我们先开始平凡拒绝测验.
  
 First, we calculate which corner of the tile is the trivial reject corner, calculate the value of the edge equation at that point, and set up a table containing the 16 steps of the edge equation from the value at the tile trivial reject corner to the trivial reject corners of the 16x16 blocks that make up the tile. The signs of the 16 values that result tell us which of the blocks are entirely outside the edge, and can therefore be ignored, and which are at least partially accepted, and therefore have to be evaluated further. First, we calculate which corner of the tile is the trivial reject corner, calculate the value of the edge equation at that point, and set up a table containing the 16 steps of the edge equation from the value at the tile trivial reject corner to the trivial reject corners of the 16x16 blocks that make up the tile. The signs of the 16 values that result tell us which of the blocks are entirely outside the edge, and can therefore be ignored, and which are at least partially accepted, and therefore have to be evaluated further.
 +
 +首先, 我们计算哪个角是瓦片的平凡拒绝角, 并计算该点的边方程值, 然后建立一个表,以瓦片平凡拒绝角的值来步进以获得包括 16 个组成瓦片的块的平凡拒绝角的边方程式值. 16 个结果值的正负号告诉我们哪些块完全在边外部也就是能被忽略, 以及哪些块至少部分被接受,因此需要进一步评估.
  
 In Figure 18, for example, we calculate the trivial reject values for the black edge, by stepping from the value we calculated earlier at the trivial reject corner of the tile, and eliminate five of the 16x16 blocks that make up the tile. The trivial reject corner for the tile is shown in red, and the 16 trivial reject corners for the blocks are shown in white. The gray blocks are the ones that are rejected against the black edge; you can see that their trivial reject corners all have positive edge equation values. The other 11 blocks have negative values at their trivial reject corners, so they're at least partially inside the black edge. In Figure 18, for example, we calculate the trivial reject values for the black edge, by stepping from the value we calculated earlier at the trivial reject corner of the tile, and eliminate five of the 16x16 blocks that make up the tile. The trivial reject corner for the tile is shown in red, and the 16 trivial reject corners for the blocks are shown in white. The gray blocks are the ones that are rejected against the black edge; you can see that their trivial reject corners all have positive edge equation values. The other 11 blocks have negative values at their trivial reject corners, so they're at least partially inside the black edge.
 +
 +图例 18 中, 举个例子, 我们从先前计算出的瓦片的平凡拒绝角的值来步进获取黑边的块平凡拒绝角的值, 然后在组成瓦片的 16x16 块中去除了其中 5 个. 瓦片的平凡拒绝角显示为红色, 16 个块平凡拒绝角则显示为白色. 灰色块即被黑边拒绝的; 你可以看到它们的平凡拒绝角的边方程式值都是正的. 另外 11 个块的平凡拒绝角则是负值, 所以他们至少是部分在黑边内部. 
  
 {{:public:cs:fig_18.jpg?direct|}} {{:public:cs:fig_18.jpg?direct|}}
  
 **Figure 18**: The trivial reject tests for the 16 16x16 blocks in the tile. **Figure 18**: The trivial reject tests for the 16 16x16 blocks in the tile.
 +
 +图例 18 : 瓦片 16x16 个块的平凡拒绝角的测验.
  
 To make this process clearer, in Figure 19 the arrows represent the 16 steps that are added to the black edge's tile trivial reject value. Each of these steps is just an add, and we can do 16 adds with a single vector instruction, so it takes only one vector instruction to generate the 16 trivial reject values, and one more vector instruction - a compare - to test them. To make this process clearer, in Figure 19 the arrows represent the 16 steps that are added to the black edge's tile trivial reject value. Each of these steps is just an add, and we can do 16 adds with a single vector instruction, so it takes only one vector instruction to generate the 16 trivial reject values, and one more vector instruction - a compare - to test them.
 +
 +为了让这个过程更为清晰, 在图例 19 中展示了 19 个箭头到瓦片的黑边平凡拒绝角, 显示 16 个值的步进. 其中每个步进只是一个加法, 我们可以在一个向量结构中一次处理 16 个加法, 所以获取 16 个平凡拒绝角的值只需要做一个向量结构, 和另一次向量结构 - 用于比较 - 来测验.
  
 {{:public:cs:fig_19.jpg?direct|}} {{:public:cs:fig_19.jpg?direct|}}
  
 **Figure 19**: The steps from the tile trivial reject corner for the black edge to the trivial reject corners of the 16 16x16 blocks for the black edge. **Figure 19**: The steps from the tile trivial reject corner for the black edge to the trivial reject corners of the 16 16x16 blocks for the black edge.
 +
 +图例 19: 瓦片黑边平凡拒绝角到 16 个块的黑边平凡拒绝角的步进.
  
 All this comes down to just setting up the right values, then doing one vector add and one vector compare. Remember that the edge equation is of the form Bx + Cy; therefore, to step a distance across the tile, we just set x and y to the horizontal and vertical components of that distance, evaluate the equation, and add that to the starting value. So all we're doing in Figure 19 is adding the 16 values that step the edge equation to the 16 trivial reject corners. For example, to get the edge equation value at the trivial reject corner of the upper-left block, we'd start with the value at the tile trivial reject corner, and add the amount that the edge equation changes for a step of -48 pixels in x and -48 pixels in y, as shown by the yellow arrow in Figure 20. To get the edge equation value at the trivial reject corner of the lower-left block, we'd instead add the amount that the edge equation changes for a step of -48 pixels in x only, as shown by the purple arrow. And that's really all there is to the Larrabee rasterizer - it's just a matter of stepping the edge equation values around the tile so as to determine what blocks and pixels are inside and outside the edges. All this comes down to just setting up the right values, then doing one vector add and one vector compare. Remember that the edge equation is of the form Bx + Cy; therefore, to step a distance across the tile, we just set x and y to the horizontal and vertical components of that distance, evaluate the equation, and add that to the starting value. So all we're doing in Figure 19 is adding the 16 values that step the edge equation to the 16 trivial reject corners. For example, to get the edge equation value at the trivial reject corner of the upper-left block, we'd start with the value at the tile trivial reject corner, and add the amount that the edge equation changes for a step of -48 pixels in x and -48 pixels in y, as shown by the yellow arrow in Figure 20. To get the edge equation value at the trivial reject corner of the lower-left block, we'd instead add the amount that the edge equation changes for a step of -48 pixels in x only, as shown by the purple arrow. And that's really all there is to the Larrabee rasterizer - it's just a matter of stepping the edge equation values around the tile so as to determine what blocks and pixels are inside and outside the edges.
 +
 +所有这些归结为只要建立对应的值, 然后做一次向量加法与一次向量比较. 记得边方程式的形式是 Bx + Cy; 因此, 在瓦片上的步进, 可以把 x 和 y 设为横竖相对距离, 然后求值, 并加上初始值. 所以图例 19 中我们所做的就是把 16 个步进值加到 16 个平凡拒绝角. 比如, 为了得到左上角块的平凡拒绝角的边方程式值, 我们会从瓦片的平凡拒绝角的值开始, 加上 x 为 -48 及 y 为 -48 的边方程式的值, 如图例 20 中的黄色箭头. 为了得到左下角块的平凡拒绝角的值, 我们则加上 x 为 -48 及 y 为 0 的边方程式值, 如紫色箭头所示. 这就是 Larrabee 光栅化真正所有的 - 它就是步进这些边方程式值来决定哪些块和像素点是在边的外部还是内部.
  
 {{:public:cs:fig_20.jpg?direct|}} {{:public:cs:fig_20.jpg?direct|}}
  
 **Figure 20**: Examples of stepping the edge equation Bx + Cy. **Figure 20**: Examples of stepping the edge equation Bx + Cy.
 +
 +图例 20 : 边方程式 Bx + Cy 的步进例子.
  
 Once again, we'll do trivial accept tests as well as trivial reject tests. In Figure 21 we've calculated the trivial accept values, and determined that 6 of the 16x16 blocks are trivially accepted for the black edge, and 10 of them are not trivially accepted. We know this because the values of the equation of the black edge at the trivial accept corners of the 6 pink blocks are negative, so those blocks are entirely inside the edge, while the values at the trivial accept corners of the other 10 blocks are positive, so those blocks are not entirely inside the black edge. The trivial accept values for the blocks can be calculated by stepping in any of several different ways: from the tile trivial accept corner, from the tile trivial reject corner, or from the 16 trivial reject values for the blocks. Regardless, it again takes only one vector instruction to step and one vector instruction to test the results. Combined with the results of the trivial reject test, we also know that 5 blocks are partially accepted. Once again, we'll do trivial accept tests as well as trivial reject tests. In Figure 21 we've calculated the trivial accept values, and determined that 6 of the 16x16 blocks are trivially accepted for the black edge, and 10 of them are not trivially accepted. We know this because the values of the equation of the black edge at the trivial accept corners of the 6 pink blocks are negative, so those blocks are entirely inside the edge, while the values at the trivial accept corners of the other 10 blocks are positive, so those blocks are not entirely inside the black edge. The trivial accept values for the blocks can be calculated by stepping in any of several different ways: from the tile trivial accept corner, from the tile trivial reject corner, or from the 16 trivial reject values for the blocks. Regardless, it again takes only one vector instruction to step and one vector instruction to test the results. Combined with the results of the trivial reject test, we also know that 5 blocks are partially accepted.
 +
 +再次, 我们会像平凡拒绝角那样来做平凡接受角测验. 在图例 21 中我们已经计算出这些平凡接受角的值, 并判定 其中 6 个 16x16 的块是被黑边平凡接受的, 另外 10 个则不被平凡接受. 我们知道这是因为那 6 个粉丝块的黑边平凡接受角的值是负的, 那么这些块整体都在该边的内部; 而另外 10 个块的平凡接受角值是正的, 那么它们不是整体处于黑边内部. 这些块的平凡接受角的值的计算可以使用几种方法来步进: 从瓦片的平凡接受角来步进, 从瓦片的平凡拒绝角来步进, 或从 16 个平凡拒绝角的值来步进. 不管怎样, 这再次只耗费一次向量结构来步进与一次向量结构来测验. 结合平凡接受角的测验, 我们也知道有 5 个块被部分接受.
  
 {{:public:cs:fig_21.jpg?direct|}} {{:public:cs:fig_21.jpg?direct|}}
  
 **Figure 21**: The trivial accept tests for the 16 16x16 blocks in the tile. **Figure 21**: The trivial accept tests for the 16 16x16 blocks in the tile.
 +
 +图例 21 : 瓦片上 16 个 16x16 的块的平凡接受角测验.
  
 The 16 trivial reject and trivial accept values can be calculated with a total of just two vector adds per edge, using tables generated at triangle set-up time, and can be tested with two vector compares, which generate mask registers describing which blocks are trivially rejected and which are trivially accepted. We do this for the three edges, ANDing the results to create masks for the triangle, do some bit manipulation on the masks so they describe trivial and partial accept, and bit-scan through the results to find the trivially and partially accepted blocks. Each 16x16 that's trivially accepted against all three edges becomes one bin command; again, no further rasterization is needed for pixels in trivially accepted blocks. The 16 trivial reject and trivial accept values can be calculated with a total of just two vector adds per edge, using tables generated at triangle set-up time, and can be tested with two vector compares, which generate mask registers describing which blocks are trivially rejected and which are trivially accepted. We do this for the three edges, ANDing the results to create masks for the triangle, do some bit manipulation on the masks so they describe trivial and partial accept, and bit-scan through the results to find the trivially and partially accepted blocks. Each 16x16 that's trivially accepted against all three edges becomes one bin command; again, no further rasterization is needed for pixels in trivially accepted blocks.
 +
 +通过三角形建立时生成的表, 每个边的 16 个平凡拒绝和平凡接受的值可以仅由两次向量加法来计算, 并由两次向量比较来测验, 生成标记寄存器来标识哪些块被平凡拒绝哪些被平凡接受. 我们执行三条边, 生成建立三角形的标识, 在这些标识上做些位操作来表示被平凡拒绝或平凡接受, 之后通过结果的位扫描可以找出被平凡拒绝和平凡接受的块. 每个 16x16 的被三条边平凡接受的会转为一个二进制命令:?: 再次, 被平凡接受的块不需要进一步光栅化.
  
 This is not obvious stuff, so let's take a moment to visualize the process. First, let's just look at one edge and trivial accept. This is not obvious stuff, so let's take a moment to visualize the process. First, let's just look at one edge and trivial accept.
 +
 +这并不怎么明显, 所以让我们花一些时间来对这过程有个印象. 首先, 我们只看一条边和平凡接受.
  
 For a given edge, say edge number 1, we take the edge equation value at the tile trivial accept corner, broadcast it out to a vector, and vector add it to the precalculated values of the 16 steps to the trivial accept corners of the 16x16 blocks. This gives us the edge 1 values at the trivial accept corners of the 16 blocks, as shown in Figure 22. (The values shown are illustrative, and are not taken from a real triangle.) For a given edge, say edge number 1, we take the edge equation value at the tile trivial accept corner, broadcast it out to a vector, and vector add it to the precalculated values of the 16 steps to the trivial accept corners of the 16x16 blocks. This gives us the edge 1 values at the trivial accept corners of the 16 blocks, as shown in Figure 22. (The values shown are illustrative, and are not taken from a real triangle.)
 +
 +对给定边, 称为边 1, 我们取得瓦片平凡接受角的该边方程式的值, 广播给一个向量处理器(阵列处理器),阵列处理器把它加到预先计算好的 16x16 块的步进值. 这就给了我们 16 个块的边 1 平凡接受角的值. 如图 22. ( 显示的值是用过说明的, 所以不是取自真实三角形)
  
 {{:public:cs:fig_22.jpg?direct|}} {{:public:cs:fig_22.jpg?direct|}}
  
 **Figure 22**: Edge 1 trivial accept tests for the 16 16x16 blocks. **Figure 22**: Edge 1 trivial accept tests for the 16 16x16 blocks.
 +
 +图 22 : 16 个 16x16 块的边 1 的平凡接受测验.
  
 The step values shown on the second line in Figure 22 are computed when the triangle is set up, using a vector multiply and a vector multiply-add. At the top level of the rasterizer - testing the 16x16 blocks that make up the tile, as shown in Figure 22 - those set-up instructions are just direct additional rasterization costs, since the top level only gets executed once per triangle, so it would be accurate to add 6 instructions to the cost of the 16x16 code we'll look at shortly in Listings 1 and 2. However, as the hierarchy is descended, the tables for the lower levels (16x16-to-4x4 and 4x4-to-mask) get reused multiple times. For example, when descending from 16x16 to 4x4 blocks, the same table is used for all partial 16x16 blocks in the tile. Likewise, there is only one table for generating masks for partial 4x4 blocks, so the additional cost per iteration in Listing 3 due to table set-up would be 2 instructions divided by the number of partial 4x4 blocks in the tile. This is generally much less than 1 instruction per 4x4 block per edge, although it gets higher the smaller the triangle is. The step values shown on the second line in Figure 22 are computed when the triangle is set up, using a vector multiply and a vector multiply-add. At the top level of the rasterizer - testing the 16x16 blocks that make up the tile, as shown in Figure 22 - those set-up instructions are just direct additional rasterization costs, since the top level only gets executed once per triangle, so it would be accurate to add 6 instructions to the cost of the 16x16 code we'll look at shortly in Listings 1 and 2. However, as the hierarchy is descended, the tables for the lower levels (16x16-to-4x4 and 4x4-to-mask) get reused multiple times. For example, when descending from 16x16 to 4x4 blocks, the same table is used for all partial 16x16 blocks in the tile. Likewise, there is only one table for generating masks for partial 4x4 blocks, so the additional cost per iteration in Listing 3 due to table set-up would be 2 instructions divided by the number of partial 4x4 blocks in the tile. This is generally much less than 1 instruction per 4x4 block per edge, although it gets higher the smaller the triangle is.
  • public/cs/rasterization_on_larrabee.1442833602.txt.gz
  • 最后更改: 2015/09/21 19:06
  • oakfire