1<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd"> 2<html xml:lang="en-US" lang="en-US" xmlns="http://www.w3.org/1999/xhtml"> 3 <head> 4 <meta http-equiv="Content-Type" content="text/html; charset=utf-8" /> 5 <meta http-equiv="Content-Style-Type" content="text/css" /> 6 <link rel="stylesheet" href="../css/manpage.css" type="text/css" /> 7 <link rel="stylesheet" href="../css/timetable.css" type="text/css" /> 8 <title>Shader Implementations That Result in Illegal Operations</title> 9 </head> 10 <body> 11 <h1><a name="top">Shader Implementations That Result in Illegal Operations</a></h1> 12 <div class="section"> 13 <p> 14 Illegal hardware operations can occur when assembler instructions in a shader are executed in certain orders.<BR> If illegal operations result, the GPU will hang.<BR> The shader assembler or shader linker will output a warning if the shader implementation includes the possibility of resulting in illegal operations.<BR> <br> Conditions that give rise to illegal operations are described below.<br> 15 </p> 16 </div> 17 18 <h2><a name="by_mova">Illegal Operations Caused by the <CODE>mova</CODE> Instruction</a></h2> 19 <div class="section"> 20 <p> 21 This section describes illegal operations related to the <CODE>mova</CODE> instruction.<BR> <br> There are several conditions that may cause illegal operations. Although all conditions are required for problems to occur, none of them alone are necessarily enough to cause problems. In other words, illegal operations may or may not result even when conditions are met.<BR> Whether or not illegal operations occur depends on the implementation of that part of the shader. In other words, it depends on the type of instructions executed and/or the combination of registers being used when an instruction that may cause a problem is reached. It does not depend on the exact time that the instruction executes, the contents of registers (except for those related to branches), or other factors.<BR> If an implementation does give rise to illegal operations, problems will always occur. If an implementation does not cause illegal operations, they will never occur.<BR> <br> Even if the conditions causing possible illegal operations described below are met, you may continue using the shader in question if there are no problems with operations on the actual hardware.<br> 22 </p> 23 24 <h3><a name="last_instruction">Executing <CODE>mova</CODE> Immediately Before the Last Instruction</a></h3> 25 <div class="section"> 26 <p> 27 Hardware may hang if a <CODE>mova</CODE> instruction that does not have a dependency on the last instruction is executed immediately before the instruction executed last by the shader.<BR> In a vertex shader, the instruction that executes last is the one that writes to an output register last. In a geometry shader, the <CODE>end</CODE> instruction executes last. <br> Example of an illegal operation with a vertex shader:<br> 28<pre class="definition"> 29// Vertex Shader 1 30mova a0.x, r0 // The last instruction appears immediately after mova 31mov o0, r2 // Hardware may hang because the last instruction does not use a0.x 32end 33</pre> 34 Example of a legal operation in a vertex shader:<br> 35<pre class="definition"> 36// Vertex Shader 2 37mova a0.x, r0 // The last instruction appears immediately after mova 38mov o0, c[a0.x] // There is no problem because the last instruction uses a0.x 39end 40</pre> 41 Example of illegal operation in a geometry shader:<BR> 42<pre class="definition"> 43// Geometry Shader 44mova a0.x, r0 // Execute a mova instruction 45end // Hardware may hang when an end instruction occurs immediately after mova 46</pre> 47 The shader assembler outputs a warning (400a0003) if <CODE>mova</CODE> occurs immediately before an <CODE>end</CODE> instruction. The shader assembler also outputs a warning (400a0004) when a <CODE>mova</CODE> instruction is followed by an instruction that writes to an output register, followed by an <CODE>end</CODE> instruction if the second instruction does not use the address written by the <CODE>mova</CODE> instruction.<BR> You can avoid this condition by changing the order of instructions or by inserting a <CODE>nop</CODE> instruction immediately before the last instruction.<BR> 48 </p> 49 </div> 50 51 <h3><a name="specific_instruction">Executing <CODE>mova</CODE> Both Before and After Certain Instructions</a></h3> 52 <div class="section"> 53 <p> 54 Hardware may hang depending on how <CODE>mova</CODE> is used with <CODE>else</CODE> and <CODE>endif</CODE>, <CODE>call</CODE> and <CODE>ret</CODE>, and <CODE>loop</CODE> and <CODE>endloop</CODE> instruction pairs.<BR> <br> Each case is described below.<br> <br> Hardware may hang if a <CODE>mova</CODE> instruction appears both immediately before the <CODE>else</CODE> instruction and immediately after the <CODE>endif</CODE> instruction of an <CODE>else</CODE>-<CODE>endif</CODE> pair that matches the same <CODE>ifb</CODE> or <CODE>ifc</CODE> instruction.<br> The shader assembler outputs a warning (40070003) if this type of implementation is detected.<br> This condition can be avoided by changing the order of instructions or inserting a <CODE>nop</CODE> instruction.<br> <br> Example when using <CODE>else</CODE>-<CODE>endif</CODE>:<br> 55<pre class="definition"> 56ifb b0 57 ... 58 mova a0.x, r0 // Execute mova before else. After execution, jump to the line immediately after the endif. 59else 60 ... 61endif 62mova a0.x, r1 // Hardware may hang if jumped to from immediately before the "else". 63</pre> 64 <br> Hardware may hang if a <CODE>mova</CODE> instruction occurs both immediately before the final <CODE>ret</CODE> instruction of a subroutine called by the <CODE>call</CODE>, <CODE>callb</CODE>, or <CODE>callc</CODE> instructions and immediately after the instruction making the call and accepting the return.<br> The shader assembler outputs a warning (4009000c) if this type of implementation is detected.<BR><br> This condition can be avoided by changing the order of instructions or inserting a <CODE>nop</CODE> instruction.<br> <br> Example using <CODE>call</CODE>-<CODE>ret</CODE>:<br> 65<pre class="definition"> 66main: // main function 67... 68call l_function // Jump to l_function 69mova a0.x, r0 // mova appears again immediately after returning from l_function 70... 71end // The main function ends 72 73l_function: // Subroutine 74... 75mova a0.x, r1 // "mova" occurs immediately before "ret" 76ret 77</pre> 78 <br> Hardware may hang if a <CODE>mova</CODE> instruction is used both immediately after a <CODE>loop</CODE> instruction and immediately before an <CODE>endloop</CODE> instruction.<br> The shader assembler outputs a warning (40070004) if this type of implementation is detected.<br> This condition can be avoided by changing the order of instructions or inserting a <CODE>nop</CODE> instruction.<br> <br> Example when using <CODE>loop</CODE>-<CODE>endloop</CODE>:<br> 79<pre class="definition"> 80loop i0 81 mova a0.y, r0 // Execute "mova" immediately after "loop". 82 ... 83 mova a0.x, r1 // Execute "mova" immediately before "endloop". 84endloop 85</pre> 86 </p> 87 </div> 88 89 90 91 92 93 <h3><a name="stall_branch">Executing a Branch Instruction Immediately After a Stall Caused by a <CODE>mova</CODE> Instruction</a></h3> 94 <div class="section"> 95 <p> 96 Hardware may hang when execution of a <CODE>mova</CODE> instruction stalls due to dependency on a previous instruction if a branch instruction is executed immediately after it.<BR> This applies to any of the following branch instructions: <CODE>jpb</CODE>, <CODE>jpc</CODE>, <CODE>call</CODE>, <CODE>callb</CODE>, <CODE>callc</CODE>, <CODE>ifb</CODE>, <CODE>ifc</CODE>, and <CODE>breakc</CODE>.<BR> <br> Example:<br> 97<pre class="definition"> 98dp4 r0, r1, r2 // A write is made to r0. 99mova a0.x, r0.x // A stall occurs because r0 depends on dp4. 100call l_function // A branch instruction is executed immediately after the mova instruction that causes the stall. 101</pre> 102 The shader assembler outputs a warning (400a0005) if a <CODE>mova</CODE> instruction where src is a temporary register followed by a branch instruction. No check is made whether the <CODE>mova</CODE> instruction stalls.<BR> This condition can be avoided by changing the order of instructions or by inserting a <CODE>nop</CODE> instruction.<BR> <br> 103 </p> 104 </div> 105 106<h3><a name="stall_branch">Illegal Operations Caused by the Register Dependencies of Instructions Before and After the <CODE>mova</CODE> Instruction</a></h3> 107 <div class="section"> 108 <p> 109For any particular <CODE>mova</CODE> instruction, the GPU stops responding if all of the following conditions are satisfied.<br> <br> 110 <ul> 111 <li>Condition 1: The register read by the instruction immediately after the <CODE>mova</CODE> instruction is written by an instruction prior to the <CODE>mova</CODE>, and the instruction immediately after the <CODE>mova</CODE> causes a 1-clock cycle stall. In this case, the unconditional 3-clock cycle stall of the <CODE>mova</CODE> instruction is ignored.</li> 112 <li>Condition 2: The <CODE>mova</CODE> instruction and both of the following two instructions all read an input register or temporary register. In this case, the indices and components of the registers are ignored.</li> 113 <li>Condition 3: There is no dependency between the <CODE>mova</CODE> instruction and the previous instruction, and no stall occurs because of the <CODE>mova</CODE> register dependencies.</li> 114 </ul> 115<br> The following example demonstrates a case where the GPU stops responding.<br> Example:<br> 116<pre class="definition"> 117mul r0, c0, r1 // Stall the instruction immediately after the mova (condition 1). 118mova a0.x, v0.x // No stall because of the dependency with the previous instruction occurs (condition 3). 119add r0, r0, c1 // Because of the dependency between the mova and the previous instruction, 120 // a 1-clock cycle stall occurs (condition 1). 121mov r5, v1 // The same input register that is read by the mova is read by the other instruction (condition 2). 122</pre> 123To indicate that the CPU has stopped responding, the shader assembler sends a warning (400a0006) if the following conditions are met.<br> <br> 124 <ul> 125<li>Warning condition 1: The instruction immediately after the <CODE>mova</CODE> instruction reads a temporary register. </li> 126<li>Warning condition 2: The <CODE>mova</CODE> instruction and the following two instructions all read an input register, or both read a temporary register.</li> 127 </ul> 128<br> No check for instruction stalls is performed. If the warning is sent, avoid these three conditions by changing the order of instructions or by inserting a <CODE>nop</CODE> instruction. 129 </p> 130 </div> 131 </div> 132 133 <h2><a name="specific_order">Illegal Operations Caused by Executing Instructions in a Particular Order</a></h2> 134 <div class="section"> 135 <p> 136 Illegal operations will occur if four consecutive instructions meet all of the following conditions.<BR> These are necessary and sufficient conditions. The GPU will always hang if they are all met.<BR> <br> These conditions are described below.<br> <br> 137 <ul> 138 <li>Condition 1: The first and third instructions have an associated latency of 2 clock cycles (<CODE>flr</CODE>, <CODE>litp</CODE>, <CODE>max</CODE>, <CODE>min</CODE>, <CODE>mov</CODE>, <CODE>sge</CODE>, <CODE>slt</CODE>, or <CODE>abs</CODE>).</li> 139 <li>Condition 2: The second instruction has an associated latency of 2 or fewer clock cycles (<CODE>flr</CODE>, <CODE>litp</CODE>, <CODE>max</CODE>, <CODE>min</CODE>, <CODE>mov</CODE>, <CODE>sge</CODE>, <CODE>slt</CODE>, <CODE>abs</CODE>, or <CODE>nop</CODE>).</li> 140 <li>Condition 3: The fourth instruction is a branch instruction (<CODE>jpb</CODE>, <CODE>jpc</CODE>, <CODE>call</CODE>, <CODE>callb</CODE>, <CODE>callc</CODE>, <CODE>ifb</CODE>, <CODE>ifc</CODE>, or <CODE>breakc</CODE>).</li> 141 <li>Condition 4: A stall occurs during execution of the first instruction. The stall must last at least two clock cycles if the second instruction is a <CODE>nop</CODE>, but at least three clock cycles if it is other than a <CODE>nop</CODE>.</li> 142 <li>Condition 5: The dest register of the first instruction is not the same as any of the src registers of the second instruction.</li> 143 <li>Condition 6: The dest register of the second instruction is not the same as any of the src registers of the third instruction.</li> 144 <li>Condition 7: The dest register of the first instruction is the same as one of the src registers of the third instruction.</li> 145 <li>Condition 8: The dest register of the third instruction is the same as one of the src registers of the third instruction.</li> 146 </ul> 147 <BR>The term "same register" in Conditions 5, 6, 7 and 8 does not mean the registers have the same component specification, but the same index and type. (Assume that r0 and r0, and r1.x and r1.y are the same registers.)<br> <br> The following example is a case where all of the above conditions are met.)<br> <br> Example:<br> 148<pre class="definition"> 149rcp r1, r2.x // An instruction is executed that causes a stall on the first of the next four instructions 150min r0, r1, r2 // The first instruction: From this point on, four consecutive instructions meet the conditions given 151max r3, r4, r5 // The second instruction 152slt r5, r0, r5 // The third instruction 153call l_function // The fourth instruction 154</pre> 155 The first instruction, <CODE>min</CODE>, and the third instruction, <CODE>slt</CODE>,both have a latency of 2 clock cycles, meeting Condition 1.<BR> The second instruction, <CODE>max</CODE>, has a latency of 2 clock cycles, meeting Condition 2.<BR> The fourth instruction is <CODE>call</CODE>. This meets Condition 3.<BR> The <CODE>min</CODE> instruction results in a 3-clock cycle stall because it has to wait for <CODE>rcp</CODE> to write r1. This meets Condition 4.<BR> The dest register of <CODE>min</CODE> and the src register of <CODE>max</CODE> differ. This meets Condition 5. The dest register of <CODE>max</CODE> and the src register of <CODE>slt</CODE> differ. This meets Condition 6.<BR> The dest register of <CODE>min</CODE> and the src0 register of <CODE>slt</CODE> are the same. This meets Condition 7. The dest register of <CODE>slt</CODE> is the same as the src1 register of <CODE>slt</CODE>. This meets Condition 8.<BR> <br> 156 <table class="timetable"> 157 <tr> 158 <th></th> 159 <th>1</th><th>2</th><th>3</th><th>4</th><th>5</th><th>6</th><th>7</th><th>8</th><th>9</th> 160 </tr> 161 <tr> 162 <th>rcp</th> 163 <td class="read">read</td> 164 <td class="RCP" colspan="2">RCP / RSQ</td> 165 <td class="post">post</td> 166 <td class="write">write</td> 167 </tr> 168 <tr> 169 <th>min</th> 170 <td class="empty" colspan="1"></td> 171 <td class="stall" colspan="3">STALL</td> 172 <td class="read">read</td> 173 <td class="MIN">MIN</td> 174 <td class="write">write</td> 175 </tr> 176 <tr> 177 <th>max</th> 178 <td class="empty" colspan="5"></td> 179 <td class="read">read</td> 180 <td class="MAX">MAX</td> 181 <td class="write">write</td> 182 </tr> 183 <tr> 184 <th>slt</th> 185 <td class="empty" colspan="6"></td> 186 <td class="read">read</td> 187 <td class="SLT">SLT</td> 188 <td class="write">write</td> 189 <td class="dummy"></td> 190 </tr> 191 <tr> 192 <th>call</th> 193 <td class="empty" colspan="7"></td> 194 <td class="flow">call</td> 195 <td class="dummy"></td> 196 </tr> 197 </table> 198 <br> <br> The shader assembler outputs warning (400a0001) or (400a0002) if all conditions except Condition 4 are met.<br> The shader assembler cannot detect whether execution will stall long enough to meet Condition 4.<br> <br> Although the performance check feature of the shader linker can be used to obtain an approximate estimate of the number of clock cycles execution will stall, perfect detection is not possible. <br> Conditions for stalling depend on things like the instructions executed up to that point and how registers are being used.<br> <br> Shaders that do not result in illegal operations will never result in illegal operations, whereas shaders that do result in illegal operations will always result in illegal operations. If you cannot see any illegal operations even though the shader assembler outputs a warning, you can assume that Condition 4 has not been met and continue using the shader without problems.<BR> If you find there are illegal operations, this condition can be avoided by changing the order of instructions or by inserting a <CODE>nop</CODE> instruction.<BR> 199 </p> 200 </div> 201 202 203 <h2>Revision History</h2> 204 <div class="section"> 205 <dl class="history"> 206 <dt>2012/06/20</dt> 207<dd>Added "Illegal Operations Caused by the Register Dependencies of Instructions Before and After the <CODE>mova</CODE> Instruction."<br /> 208 </dd> 209 <dt>2011/12/20</dt> 210 <dd>Initial version.<br /> 211 </dd> 212 </dl> 213 </div> 214 215 <hr><p>CONFIDENTIAL</p></body> 216</html>