end Instruction
ctr_VertexShaderLinker links object files output by ctr_VertexShaderAssembler, and outputs an executable file.
% ctr_VertexShaderLinker32 <input file name> [options...]
Input files must be object files that include at least one main function. Options can be omitted. Help is displayed if the assembler is executed without any arguments.
Specify the object files to be input in place of <input file name>.
If input files include more than one main object, the application must specify the same number of shader objects as there are main objects to the glShaderBinary function.
All shader objects will access main objects in the order specified by arguments to ctr_VertexShaderLinker32 at this time.
For the file name, specify a string of 128 or fewer characters that contains no spaces, using ASCII alphanumeric characters and any symbols other than the \ / : * ? " < > and | symbols.
The following options can be specified in place of [options...].
| Options | Description |
|---|---|
| -I<file path> | Specifies the file path of the input file. Input files are looked for in the current directory, and in the order directories are specified using this option. |
| -O <File Name> | Specifies the output filename. The default is "shader.bin". |
| -M | Outputs a map file. This file has the same name as the executable file, with its extension changed to ".map". See Map File Output Features for more details about map files. |
| -debug | Links all object files with debug information included. |
| -nodebug | Links all object files without including debug information. |
| -check_consistency | Checks the consistency of all main objects being linked. For details on the consistency check, see Consistency Check Feature. |
| -check_performance | Checks the performance of all main objects being linked. See Performance Check Features for more information about performance checks. |
| -? or -help | Displays help. |
Information about executable files linked by executing ctr_VertexShaderLinker32 with the -M option specified. The output file is called a map file.
The map file is generated using the same name as the executable file, with the extension changed to ".map".
The following information is generated: Loading objects order, image sizes, program code information, object information, and swizzle pattern data.
This item shows the order in which main objects have been linked.
The order in which main objects are linked is in the same order that they were specified as arguments to ctr_VertexShaderLinker32. The order of objects shown here is the same as the order each reference in the shader object was specified by the glShaderBinary function. The main objects that are debug builds are also shown.
This item shows the size of the linked object files.
"Instruction" represents the number of assembler instructions, and "Swizzle" represents the number of swizzling and masking patterns. (For details, see the heading "Number of Swizzling and Masking Patterns" under "Notes and Limitations."))
"Total" represents the total size after linking.
This item shows the storage location for program code inside the executable file.
"Program code offset" represents the offset (in bytes) from the start of the file to the address at which program data is stored in the executable file. "Program code size" shows the size (in bytes) of program code data.
This item shows the symbol information, output data attribute information, and program start address for each main object that has been linked.
Each of these, respectively, represent settings specified using a #pragma bind_symbol or #pragma output_map statement, and the program address set by the main label.
This item shows the swizzle pattern data of the executable file.
The consistency check feature checks that the assembler has been implemented correctly.
This feature is implemented in the linker and can be enabled by specifying the -check_consistency option to ctr_VertexShaderLinker32.exe.
Several items are checked for each main object being linked. The check traverses each instruction in the main object from the main label up to the endmain label. A warning is output if an implementation of the item being checked is found.
Each instruction is checked according to the following conditions.
call instructions, as well as the source that called them, are checked.jpb, jpc, callb, and callc instructions are not considered. )if and else instructions are checked for ifb and ifc instructions.call instruction, if the same subroutine is called inside a nested call, execution proceeds to the next instruction without executing that call.end Instructionif instruction" used in this section, includes both ifb and ifc instructions.
end Instruction
This feature checks whether an end instruction has been called correctly.
A warning is output for the following checked items.
end instruction can be found.end instruction is found between a pair of loop-endloop instructions.end instruction is found in only one pair of if-else or else-endif instructions. If no else instruction corresponds to an if instruction, a warning is ouput if an end instruction is found between a pair of if-endif instructions.
This feature checks whether input registers are being used correctly.
This check is performed on the input registers and components specified using a #pragma bind_symbol statement.
A warning is output for the following checked items.
if-else instructions and else-endif instructions. This is reported because the way in which input registers are used may differ, due to the way if instructions branch. If no else instruction corresponds to an if instruction, a warning is output if an input register is used between a pair of if-endif instructions.
This feature checks whether output registers are being used correctly.
Output registers specified using #pragma output_map statements are checked. All xyzw components are also checked.
A warning is output for the following checked items.
loop-endloop instructions. This is reported because an output register may be written to more than once, depending on the number of times a loop instruction repeats.if-else instructions and else-endif instructions. This is reported because the way in which output registers are used may differ, due to the way if instructions branch. If no else instruction corresponds to an if instruction, a warning is output if an output register is written to between a pair of if-endif instructions.
Performance check features include a feature for detecting instructions that cause execution to stall and a feature that estimates the number of clock cycles required per vertex when executing from a shader assembler implementation.
This feature is implemented in the linker, and can be enabled by specifying the -check_performance option to ctr_VertexShaderLinker32.exe.
Each main object being linked is checked. The check traverses each instruction in the main object from the main label up to an end instruction or endmain label. Results of the check are output to the file and the command prompt that executed the linker. The output file is generated under the name of the binary file created by the linker, with its extension replaced by "perf.txt." (It is generated in the same location as the binary file.)
Each instruction is checked according to the following conditions.
call instructions, as well as the source that called them, are checked.jpb, jpc, callb, and callc instructions are not considered. This condition is always determined as FALSE.)else instructions are checked for ifb and ifc instructions. (This condition is always determined as FALSE.)call instruction, if the same subroutine is called inside a nested call instruction, execution proceeds to the next instruction without executing that call instruction.
You can use this feature to detect several reasons for stalling. A detailed description of each associated item is given below.
You can detect stalling due to dependency on the register used by an instruction.
Given two instructions where the second instruction references the calculation result of the first instruction, a stall occurs if the second instruction is called before the first instruction has completed its calculation and so must wait (stall) to reference the result.
The number of clock cycles that execution stalls depends on the latency of the instruction called first and on the number of instructions executed between these two instructions. For details on the latency of each instruction, see Instruction Latency. The register in question may be a temporary register or status register. In the case of status registers, ifc, callc, and breakc may be dependent on cmp.
* Example:
dp4 r0 , r1 , r2
add r3 , r4 , r0
In the example above, the result calculated by dp4 for r0 is used by the add instruction that immediately follows. Since the latency of dp4 is 5 clock cycles, a stall of 4 clock cycles is the detected result when add executes.
Some instructions depend on more than one other instruction. In this case, the number of clock cycles for the instruction with the greatest stall time is used as the stall count for the dependent instruction.
Example:
dp4 r0.x , r1 , r2
mov r0.y, c0
add r3 , r4 , r0
In the example above, the third add instruction depends on both dp4 and mov for r0. add stalls for 3 clock cycles due to dp4 and 1 clock cycle due to mov. Since the highest stall time of the two is 3 clock cycles, a stall of 3 clock cycles is used as the detected result.
mova Instruction
A 3-clock cycle stall results unconditionally when mova is called.
A 2-clock cycle stall results if the program counter varies by other than +1 due to executing a call or if instruction. A 2-clock cycle stall results at the instruction immediately after ret when execution returns from the source that called call.
With some instructions, there may be multiple reasons for stalling. In this case, the stall clock cycle count for the reason having the longest stall time is used.
Example:
main:
call label0 // Call to label0
...
end
label0:
dp4 r0.x , r1, r2
mova a0.x, r0.x
ret
In the above example, when the mova instruction at the end of the subroutine named label0 is reached, a 4-clock cycle stall occurs due to dependence on the immediately preceding dp4 instruction, a 3-clock cycle stall occurs due to calling the mova itself, and a 2-clock cycle stall occurs due to branching. Since the longest stall time out of all these reasons for stalling is 4 clock cycles, the calculated result is a 4-clock cycle stall.
Results of the performance checks are output to a file and the command prompt that executed the linker.
The output file is generated under the name of the binary file created by the linker, with its extension replaced by "perf.txt."
The following content is output to the file:
Main object:obj¥VShader.o
Total executed clock count 14 clock
Total executed instruction count 7 instructions
Detail of stall
============================================================
VShader.vsh(26):2 clock stall is caused by branch.
VShader.vsh(36):4 clock stall is caused.
|
+--- 3 clock stall is by mova instruction.
|
+--- 2 clock stall is by branch.
|
+--- VShader.vsh(35):4 clock is to wait for this instruction to finish writing r0.
VShader.vsh(30):1 clock stall is caused by reading temporary register.
|
+--- VShader.vsh(29):To wait for this instruction to finish writing r1.
...
"Main object" is the name of the main object being checked. Each check result is output for all main objects being linked.
"Total executed clock count" is the total number of clock cycles required for execution per vertex.
"Total executed instruction count" is the number of instructions executed.
The location where a stall occurred, the reason for the stall, and the number of clock cycles that execution stalled are listed under "Detail of stall." In the case of stalling due to instruction dependencies, the location of the instruction being depended upon and the register indicating the reason are shown.
If there is more than one reason for a stall, the stall count for each separate reason and the stall clock cycle count of all results taken together are shown.
Check results are output in the order instructions are executed, starting from the main label. If the same subroutine is called more than once, the same stall details are shown multiple times, because a check is made for each instance the subroutine executes.
Details nearly identical to those output to the file are also output to the command prompt that executed the linker. (Information regarding dependencies is indented.)
If the linker is executed from an environment such as Microsoft Visual Studio, you can jump to the shader assembler source file by clicking the output result of a reason for stalling in the output window. This is useful when you want to look at the location of a problem.
All assembler objects being linked can be forced to form a debug build by specifying the -debug option to ctr_VertexShaderLinker.
All assembler objects being linked can be forced to form a non-debug build by specifying the -nodebug option to ctr_VertexShaderLinker.
If a mixture of debug build and non-debug build assembler objects are linked and even one of the assembler objects referenced by each main object is a debug build, that main object also results in a debug build.
This page describes error messages output by the linker. Errors are output in the following format.
Input file name (line number of error): Error level (error code): Error description
Errors have an error level of either "warning" or "error." Execution can continue in the case of a "warning" level error. The input file name and/or error line number may not be displayed, depending on the type of error.
This table gives the error codes output by the linker and their description.
| Error Code | Message / Description | 80080001 | Input file is not specified. |
|---|---|
|
Input file not specified. |
|
| 80080005 | “argument” is not found. |
|
Input file could not be found. |
|
| 80080006 | Exceeded maximum number of long swizzle masks/patterns. |
|
The number of swizzling patterns for map exceeds the upper limit. |
|
| 80080007 | Exceeded maximum number of swizzle masks/patterns. |
|
The number of swizzling patterns exceeds the upper limit. |
|
| 8008000f | Label “label name” is duplicated. |
|
The same label name is defined more than once in the subroutine object. |
|
| 80080012 | Cannot open output file. |
|
An executable file could not be generated. Check whether a read-only file having the same name already exists. |
|
| 80080014 | “input file name” is invalid file format. |
|
The input file is not an object file. |
|
| 80080015 | Some input files are the same name. |
|
Input files having the same name have been specified. |
|
| 8008001d | “label name” is not subroutine. |
An ret instruction has not been set for a label called as a subroutine by the call instruction.
|
|
| 8008001f | “label name” cannot be found in input object files. |
|
The label referenced in the input file cannot be found. |
|
| 80080020 | Vertex shader size is over the limit. |
|
The number of instructions in the shader exceeds the upper limit. Shaders consisting of up to 512 instructions can be linked. |
|
| 80080022 | “register name” is duplicately defined in “object name” and “object name”. |
A given register is defined as having different values by more than one object through use of the def, defi, or defb instructions. |
|
| 80080024 | “register name” is duplicately defined in “object name” and “object name”. |
A given output register is mapped to different output data attributes by more than one object through the use of a #pragma output_map statement. |
|
| 80080025 | symbol “symbol name” is duplicately defined in “object name” and “object name”. |
A given symbol name is bound to different registers by more than one object through the use of #pragma bind_symbol definitions.
|
|
| 8008002a | symbol “symbol name” in “object name” and “symbol name” in “object name” are bound to the same register. |
A given symbol in an object is bound to the same input register as another symbol in an object through the use of #pragma bind_symbol definitions.
|
|
| 8008002b | “label name” is duplicately defined in “subroutine object name”. |
|
A label name in the main object is also defined in a subroutine object. |
|
| 8008002c | “output data attribute name” is duplicately defined in “object name” and “object name”. |
A given output data attribute is mapped to different output registers by more than one object through the use of a #pragma output_map statement. |
|
| 8008002d | Main routine cannot be found. |
An object that includes both main and endmain labels is not included among input files. |
|
| 8008002e | Cannot open map file. |
|
A map file cannot be generated. Check whether a read-only file having the same name already exists. |
|
| 8008002f | No input attribute is defined. |
|
No input attributes are defined. |
|
| 80080030 | No output map is defined. |
|
No output attributes are defined. |
|
| 80080031 | -debug and -nodebug cannot be specified together. |
The -debug and -nodebug options cannot be specified at the same time.
|
|
| 80080032 | def(bi) in ***.obj and bind_symbol in ***.obj specify the same register **. |
The same register cannot be defined by both a def instruction and bind_symbol.
|
|
| 80080033 | texture1 and texture2 need to be mapped to same register, if 4 textures are mapped. |
|
If four textures have been defined using an output_map statement, texture1 and texture2 must be mapped to the same register. |
| Error Code | Message / Description |
|---|---|
| 40090001 | end instruction is not found. |
An end instruction could not be found.
|
|
| 40090002 | end instruction is found in loop statement. |
An end instruction was found between a pair of loop-endloop instructions.
|
|
| 40090003 | end instruction is found in only one of if and else statement. |
An end instruction was found in only one pair of if-else and else-endif statements.
|
|
| 40090004 | input register "label name" is not used. |
The input register defined in a #pragma bind_symbol statement may not be usable. |
|
| 40090005 | The access patterns of input registers are different between if and else statement. |
The input register being used differs between a pair of if-else and a pair of else-endif statements.
|
|
| 40090006 | output register "register name" is not set. |
The output register defined by a #pragma output_map statement may not be written. |
|
| 40090007 | output register is set in loop statement. |
The output register is written to between a pair of loop-endloop statements.
|
|
| 40090008 | The access patterns of output registers are different between if and else statement. |
The output register being written differs between a pair of if-else and a pair of else-endif statements.
|
|
| 40090009 | output register "register name" is already set before. |
|
The output register is being written to more than once. |
|
| 4009000a | Recursive call is found, and skipped. |
A subroutine is being called recursively. The call statement causing the subroutine to be called recursively is skipped by the consistency check feature. |
|
| 4009000b | Cannot open file for performance report. |
|
A file for storing output results of the performance check feature cannot be created. |
|
| 4009000c | mova instruction both before and after returning from subroutine might cause hardware hang-up. |
A malfunction may occur if a mova instruction occurs both immediately before and after a subroutine called by the call, callb, or callc instructions.See "Malfunctions Caused by the mova Instruction." |
CONFIDENTIAL