Illegal hardware operations can occur when assembler instructions in a shader are executed in certain orders.
If illegal operations result, the GPU hangs.
The shader assembler or shader linker outputs a warning if the shader implementation includes the possibility of resulting in illegal operations.
The following sections describe conditions that give rise to illegal operations.
mova Instruction
This section describes illegal operations related to the mova instruction.
There are several conditions that may cause illegal operations. Although all conditions are required for problems to occur, none of them alone are necessarily enough to cause problems. In other words, illegal operations may or may not result even when conditions are met.
Whether illegal operations occur depends on the implementation of that part of the shader. In other words, it depends on the type of instructions executed and the combination of registers being used when an instruction that may cause a problem is reached. It does not depend on the exact time that the instruction executes, the contents of registers (except for those related to branches), or other factors.
If an implementation does give rise to illegal operations, problems always occur. If an implementation does not cause illegal operations, they never occur.
Even if the conditions that cause possible illegal operations as described in this section are met, you may continue using the shader in question if there are no problems with operations on the actual hardware.
mova Immediately Before the Last Instruction
Hardware may hang if a mova instruction that does not have a dependency on the last instruction is executed immediately before the instruction executed last by the shader.
In a vertex shader, the instruction that executes last is the one that writes to an output register last. In a geometry shader, the end instruction executes last.
Example of an illegal operation with a vertex shader:
// Vertex Shader 1. mova a0.x, r0 // The last instruction appears immediately after mova. mov o0, r2 // Hardware may hang because the last instruction does not use a0.x. endExample of a legal operation in a vertex shader:
// Vertex Shader 2. mova a0.x, r0 // The last instruction appears immediately after mova. mov o0, c[a0.x] // There is no problem because the last instruction uses a0.x. endExample of illegal operation in a geometry shader:
// Geometry shader. mova a0.x, r0 // Execute a mova instruction. end // Hardware may hang when an end instruction occurs immediately after mova.The shader assembler outputs a warning (400a0003) if
mova occurs immediately before an end instruction. The shader assembler also outputs a warning (400a0004) when a mova instruction is followed by an instruction that writes to an output register, followed by an end instruction if the second instruction does not use the address written by the mova instruction.nop instruction immediately before the last instruction.mova Both Before and After Certain Instructions
Hardware may hang depending on how mova is used with else and endif, call and ret, and loop and endloop instruction pairs.
Each case is described below.
Hardware may hang if a mova instruction appears both immediately before the else instruction and immediately after the endif instruction of an else-endif pair that matches the same ifb or ifc instruction.
The shader assembler outputs a warning (40070003) if this type of implementation is detected.
This condition can be avoided by changing the order of instructions or inserting a nop instruction.
Example when using else-endif:
ifb b0 ... mova a0.x, r0 // Execute mova before else. After execution, jump to the line immediately after the endif. else ... endif mova a0.x, r1 // Hardware may hang if jumped to from immediately before the "else."
mova instruction occurs both immediately before the final ret instruction of a subroutine called by the call, callb, or callc instructions, and immediately after the instruction making the call and accepting the return.nop instruction.call-ret:main: // main function. ... call l_function // Jump to l_function. mova a0.x, r0 // mova appears again immediately after returning from l_function. ... end // The main function ends. l_function: // Subroutine. ... mova a0.x, r1 // "mova" occurs immediately before "ret." ret
mova instruction is used both immediately after a loop instruction and immediately before an endloop instruction.nop instruction.loop-endloop:loop i0 mova a0.y, r0 // Execute "mova" immediately after "loop". ... mova a0.x, r1 // Execute "mova" immediately before "endloop". endloop
mova Instruction
Hardware may hang when execution of a mova instruction stalls because of a dependency on a previous instruction if a branch instruction is executed immediately after it.
This applies to any of the following branch instructions: jpb, jpc, call, callb, callc, ifb, ifc, and breakc.
Example:
dp4 r0, r1, r2 // A write is made to r0. mova a0.x, r0.x // A stall occurs because r0 depends on dp4. call l_function // A branch instruction is executed immediately after the mova instruction that causes the stall.The shader assembler outputs a warning (400a0005) if a
mova instruction where src is a temporary register followed by a branch instruction. No check is made whether the mova instruction stalls.nop instruction.mova Instruction
For any particular mova instruction, the GPU stops responding if all of the following conditions are satisfied.
mova instruction is written by an instruction prior to the mova, and the instruction immediately after the mova causes a 1-clock cycle stall. In this case, the unconditional 3-clock cycle stall of the mova instruction is ignored.mova instruction and both of the following two instructions all read an input register or temporary register. In this case, the indices and components of the registers are ignored.mova instruction and the previous instruction, and no stall occurs because of the mova register dependencies.
mul r0, c0, r1 // Stall the instruction immediately after the mova (condition 1).
mova a0.x, v0.x // No stall because of the dependency with the previous instruction occurs (condition 3).
add r0, r0, c1 // Because of the dependency between the mova and the previous instruction,
// a 1-clock cycle stall occurs (condition 1).
mov r5, v1 // The same input register that is read by the mova is read by the other instruction (condition 2).
To indicate that the CPU has stopped responding, the shader assembler sends a warning (400a0006) if the following conditions are met.mova instruction reads a temporary register. mova instruction and the following two instructions all read an input register, or both read a temporary register.nop instruction.
Illegal operations occur if four consecutive instructions meet all of the following conditions.
These are necessary and sufficient conditions. The GPU always hangs if they are all met.
These conditions are described below.
flr, litp, max, min, mov, sge, slt, or abs).flr, litp, max, min, mov, sge, slt, abs, or nop).jpb, jpc, call, callb, callc, ifb, ifc, or breakc).nop, but at least three clock cycles if it is other than a nop.rcp r1, r2.x // An instruction is executed that causes a stall on the first of the next four instructions. min r0, r1, r2 // The first instruction: From this point on, four consecutive instructions meet the specified conditions. max r3, r4, r5 // The second instruction. slt r5, r0, r5 // The third instruction. call l_function // The fourth instruction.The first instruction,
min, and the third instruction, slt, both have a latency of 2 clock cycles, meeting Condition 1.max, has a latency of 2 clock cycles, meeting Condition 2.call. This meets Condition 3.min instruction results in a 3-clock cycle stall because it has to wait for rcp to write r1. This meets Condition 4.min and the src register of max differ. This meets Condition 5. The dest register of max and the src register of slt differ. This meets Condition 6.min and the src0 register of slt are the same. This meets Condition 7. The dest register of slt is the same as the src1 register of slt. This meets Condition 8.| 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | ||
|---|---|---|---|---|---|---|---|---|---|---|
| rcp | read | RCP / RSQ | post | write | ||||||
| min | STALL | read | MIN | write | ||||||
| max | read | MAX | write | |||||||
| slt | read | SLT | write | |||||||
| call | call | |||||||||
nop instruction.mova Instruction."CONFIDENTIAL