Shader Implementations That Result in Illegal Operations

Illegal hardware operations can occur when assembler instructions in a shader are executed in certain orders.
If illegal operations result, the GPU will hang.
The shader assembler or shader linker will output a warning if the shader implementation includes the possibility of resulting in illegal operations.

Conditions that give rise to illegal operations are described below.

Illegal Operations Caused by the mova Instruction

This section describes illegal operations related to the mova instruction.

There are several conditions that may cause illegal operations. Although all conditions are required for problems to occur, none of them alone are necessarily enough to cause problems. In other words, illegal operations may or may not result even when conditions are met.
Whether or not illegal operations occur depends on the implementation of that part of the shader. In other words, it depends on the type of instructions executed and/or the combination of registers being used when an instruction that may cause a problem is reached. It does not depend on the exact time that the instruction executes, the contents of registers (except for those related to branches), or other factors.
If an implementation does give rise to illegal operations, problems will always occur. If an implementation does not cause illegal operations, they will never occur.

Even if the conditions causing possible illegal operations described below are met, you may continue using the shader in question if there are no problems with operations on the actual hardware.

Executing mova Immediately Before the Last Instruction

Hardware may hang if a mova instruction that does not have a dependency on the last instruction is executed immediately before the instruction executed last by the shader.
In a vertex shader, the instruction that executes last is the one that writes to an output register last. In a geometry shader, the end instruction executes last.
Example of an illegal operation with a vertex shader:

// Vertex Shader 1
mova    a0.x, r0    // The last instruction appears immediately after mova
mov     o0,   r2    // Hardware may hang because the last instruction does not use a0.x
end
Example of a legal operation in a vertex shader:
// Vertex Shader 2
mova    a0.x, r0    // The last instruction appears immediately after mova
mov     o0,   c[a0.x]   // There is no problem because the last instruction uses a0.x
end
Example of illegal operation in a geometry shader:
// Geometry Shader
mova    a0.x, r0    // Execute a mova instruction
end                 // Hardware may hang when an end instruction occurs immediately after mova
The shader assembler outputs a warning (400a0003) if mova occurs immediately before an end instruction. The shader assembler also outputs a warning (400a0004) when a mova instruction is followed by an instruction that writes to an output register, followed by an end instruction if the second instruction does not use the address written by the mova instruction.
You can avoid this condition by changing the order of instructions or by inserting a nop instruction immediately before the last instruction.

Executing mova Both Before and After Certain Instructions

Hardware may hang depending on how mova is used with else and endif, call and ret, and loop and endloop instruction pairs.

Each case is described below.

Hardware may hang if a mova instruction appears both immediately before the else instruction and immediately after the endif instruction of an else-endif pair that matches the same ifb or ifc instruction.
The shader assembler outputs a warning (40070003) if this type of implementation is detected.
This condition can be avoided by changing the order of instructions or inserting a nop instruction.

Example when using else-endif:

ifb     b0
  ...
  mova    a0.x, r0  // Execute mova before else. After execution, jump to the line immediately after the endif.
else
  ...
endif
mova    a0.x, r1    // Hardware may hang if jumped to from immediately before the "else".

Hardware may hang if a mova instruction occurs both immediately before the final ret instruction of a subroutine called by the call, callb, or callc instructions and immediately after the instruction making the call and accepting the return.
The shader assembler outputs a warning (4009000c) if this type of implementation is detected.

This condition can be avoided by changing the order of instructions or inserting a nop instruction.

Example using call-ret:
main:   // main function
...
call    l_function  // Jump to l_function
mova    a0.x, r0    // mova appears again immediately after returning from l_function
...
end                 // The main function ends

l_function: // Subroutine
...
mova    a0.x, r1    // "mova" occurs immediately before "ret"
ret

Hardware may hang if a mova instruction is used both immediately after a loop instruction and immediately before an endloop instruction.
The shader assembler outputs a warning (40070004) if this type of implementation is detected.
This condition can be avoided by changing the order of instructions or inserting a nop instruction.

Example when using loop-endloop:
loop    i0
  mova    a0.y, r0  // Execute "mova" immediately after "loop".
  ...
  mova    a0.x, r1  // Execute "mova" immediately before "endloop".
endloop

Executing a Branch Instruction Immediately After a Stall Caused by a mova Instruction

Hardware may hang when execution of a mova instruction stalls due to dependency on a previous instruction if a branch instruction is executed immediately after it.
This applies to any of the following branch instructions: jpb, jpc, call, callb, callc, ifb, ifc, and breakc.

Example:

dp4     r0, r1, r2  // A write is made to r0.
mova    a0.x, r0.x  // A stall occurs because r0 depends on dp4.
call    l_function  // A branch instruction is executed immediately after the mova instruction that causes the stall.
The shader assembler outputs a warning (400a0005) if a mova instruction where src is a temporary register followed by a branch instruction. No check is made whether the mova instruction stalls.
This condition can be avoided by changing the order of instructions or by inserting a nop instruction.

Illegal Operations Caused by the Register Dependencies of Instructions Before and After the mova Instruction

For any particular mova instruction, the GPU stops responding if all of the following conditions are satisfied.


The following example demonstrates a case where the GPU stops responding.
Example:
mul        r0, c0, r1   // Stall the instruction immediately after the mova (condition 1).
mova       a0.x, v0.x   // No stall because of the dependency with the previous instruction occurs (condition 3).
add        r0, r0, c1   // Because of the dependency between the mova and the previous instruction,
                        // a 1-clock cycle stall occurs (condition 1).
mov        r5, v1       // The same input register that is read by the mova is read by the other instruction (condition 2).
To indicate that the CPU has stopped responding, the shader assembler sends a warning (400a0006) if the following conditions are met.


No check for instruction stalls is performed. If the warning is sent, avoid these three conditions by changing the order of instructions or by inserting a nop instruction.

Illegal Operations Caused by Executing Instructions in a Particular Order

Illegal operations will occur if four consecutive instructions meet all of the following conditions.
These are necessary and sufficient conditions. The GPU will always hang if they are all met.

These conditions are described below.


The term "same register" in Conditions 5, 6, 7 and 8 does not mean the registers have the same component specification, but the same index and type. (Assume that r0 and r0, and r1.x and r1.y are the same registers.)

The following example is a case where all of the above conditions are met.)

Example:
rcp     r1, r2.x    // An instruction is executed that causes a stall on the first of the next four instructions
min     r0, r1, r2  // The first instruction: From this point on, four consecutive instructions meet the conditions given
max     r3, r4, r5  // The second instruction
slt     r5, r0, r5  // The third instruction
call    l_function  // The fourth instruction
The first instruction, min, and the third instruction, slt,both have a latency of 2 clock cycles, meeting Condition 1.
The second instruction, max, has a latency of 2 clock cycles, meeting Condition 2.
The fourth instruction is call. This meets Condition 3.
The min instruction results in a 3-clock cycle stall because it has to wait for rcp to write r1. This meets Condition 4.
The dest register of min and the src register of max differ. This meets Condition 5. The dest register of max and the src register of slt differ. This meets Condition 6.
The dest register of min and the src0 register of slt are the same. This meets Condition 7. The dest register of slt is the same as the src1 register of slt. This meets Condition 8.

123456789
rcp read RCP / RSQ post write
min STALL read MIN write
max read MAX write
slt read SLT write
call call


The shader assembler outputs warning (400a0001) or (400a0002) if all conditions except Condition 4 are met.
The shader assembler cannot detect whether execution will stall long enough to meet Condition 4.

Although the performance check feature of the shader linker can be used to obtain an approximate estimate of the number of clock cycles execution will stall, perfect detection is not possible.
Conditions for stalling depend on things like the instructions executed up to that point and how registers are being used.

Shaders that do not result in illegal operations will never result in illegal operations, whereas shaders that do result in illegal operations will always result in illegal operations. If you cannot see any illegal operations even though the shader assembler outputs a warning, you can assume that Condition 4 has not been met and continue using the shader without problems.
If you find there are illegal operations, this condition can be avoided by changing the order of instructions or by inserting a nop instruction.

Revision History

2012/06/20
Added "Illegal Operations Caused by the Register Dependencies of Instructions Before and After the mova Instruction."
2011/12/20
Initial version.

CONFIDENTIAL