ctr_VertexShaderLinker

Contents

Introduction

ctr_VertexShaderLinker links object files output by ctr_VertexShaderAssembler, and outputs an executable file.

How to Use

Commands

% ctr_VertexShaderLinker32 <input file name> [options...]

Input files must be object files that include at least one main function. Options can be omitted. Help is displayed if the assembler is executed without any arguments.

Input Files

Specify the object files to enter in place of <input filename>.
If input files include more than one main object, the application must specify the same number of shader objects as there are main objects to the glShaderBinary function.
All shader objects access main objects in the order specified by arguments to ctr_VertexShaderLinker32 at this time.
For the filename, specify a string of 128 or fewer alphanumeric characters that contains no spaces or any of the following symbols: \ / : * ? " < > |.

Options

The following options can be specified in place of [options...].

Options Description
-I<file path> Specifies the file path of the input file.
Input files are looked for in the current directory, and in the order directories are specified using this option.
-O <File Name> Specifies the output filename.
The default is shader.bin.
-M Outputs a map file.
This file has the same name as the executable file, with its extension changed to .map. For more information about map files, see Map File Output Features.
-debug Links all object files with debug information included.
-nodebug Links all object files without including debug information.
-check_consistency Checks the consistency of all main objects being linked.
For details on the consistency check, see Consistency Check Feature.
-check_performance Checks the performance of all main objects being linked.
For more information about performance checks, see Performance Check Features.
-? or -help Displays help.

Map File Output Feature

Information about executable files linked by executing ctr_VertexShaderLinker32 with the -M option specified. The output file is called a map file.
The map file is generated using the same name as the executable file, with the extension changed to .map.


The following information is generated: The loading objects order, image sizes, program code information, object information, and swizzle pattern data.

Loading Objects Order

This item shows the order in which main objects have been linked.
The order in which main objects are linked is in the same order that they were specified as arguments to ctr_VertexShaderLinker32. The order of objects shown here is the same as the order for each reference in the shader object that was specified by the glShaderBinary function. The main objects that are debug builds are also shown.

Image Sizes

This item shows the size of the linked object files.

Program Code Information

This item shows the storage location for program code inside the executable file.
"Program code offset" represents the offset (in bytes) from the start of the file to the address at which program data is stored in the executable file. "Program code size" shows the size (in bytes) of program code data.

Object Information

This item shows the symbol information, output data attribute information, and program start address for each main object that has been linked.
Each of these, respectively, represent settings specified using a #pragma bind_symbol or #pragma output_map statement, and the program address set by the main label.

Swizzle Pattern Data

This item shows the swizzle pattern data of the executable file.

Consistency Check Feature

The consistency check feature checks that the assembler has been implemented correctly.
This feature is implemented in the linker and can be enabled by specifying the -check_consistency option to ctr_VertexShaderLinker32.exe.
Several items are checked for each main object being linked. The check traverses each instruction in the main object from the main label up to the endmain label. A warning is output if an implementation of the item being checked is found.

Each instruction is checked according to the following conditions.

The consistency check checks the items listed below. Each item is described in detail below. The term "if instruction" used in this section, includes both ifb and ifc instructions.

Checks for Execution of an end Instruction

This feature checks whether an end instruction has been called correctly. A warning is output for the following checked items.

Checks Reads from Input Registers

This feature checks whether input registers are being used correctly. This check is performed on the input registers and components specified by using a #pragma bind_symbol statement. A warning is output for the following checked items.

Checks Writes to Output Registers

This feature checks whether output registers are being used correctly. Output registers specified using #pragma output_map statements are checked. All xyzw components are also checked. A warning is output for the following checked items.

Performance Check Features

Performance check features include a feature for detecting instructions that cause execution to stall and a feature that estimates the number of clock cycles required per vertex when executing from a shader assembler implementation.
This feature is implemented in the linker, and can be enabled by specifying the -check_performance option to ctr_VertexShaderLinker32.exe.
Each main object being linked is checked. The check traverses each instruction in the main object from the main label up to an end instruction or endmain label. Results of the check are output to the file and the command prompt that executed the linker. The output file is generated under the name of the binary file created by the linker, with its extension replaced by "perf.txt." (It is generated in the same location as the binary file.)

Each instruction is checked according to the following conditions.

The following items are included in output results.
When calculating the total number of clock cycles, the number of clock cycles that execution is stalled is added as one clock cycle per instruction when a stall occurs.
The reasons output for stalling do not correspond to all of the actual hardware reasons that might result in a stall. Note that the total number of clock cycles calculated should only be used as a guideline, as it does not, necessarily, exactly match the clock cycles executed in actual hardware.
This feature is described in detail below.

Detectable Reasons for Stalling

You can use this feature to detect several reasons for stalling. Each associated item is described in detail below.

Stalling Caused by Instruction Dependencies

You can detect stalling caused by dependency on the register used by an instruction.
With two instructions where the second instruction references the calculation result of the first instruction, a stall occurs if the second instruction is called before the first instruction has completed its calculation and must wait (stall) to reference the result.

The number of clock cycles where execution stalls depends on the latency of the first instruction called and number of instructions executed between these two instructions. For more information about the latency of each instruction, see Instruction Latency. The register may be a temporary register or status register. In the case of status registers, ifc, callc, and breakc may be dependent on cmp.

*
Example:

dp4 r0 , r1 , r2
add r3 , r4 , r0

In this example, the result calculated by dp4 for r0 is used by the add instruction that immediately follows. Because the latency of dp4 is 5 clock cycles, a stall of 4 clock cycles is the detected result when add executes.


Some instructions depend on more than one other instruction. In this case, the number of clock cycles for the instruction with the greatest stall time is used as the stall count for the dependent instruction.

Example:

dp4 r0.x , r1 , r2
mov r0.y, c0
add r3 , r4 , r0

In this example, the third add instruction depends on both dp4 and mov for r0. add stalls for 3 clock cycles because of dp4 and 1 clock cycle because of mov. Because the highest stall time of the two is 3 clock cycles, a stall of 3 clock cycles is used as the detected result.

Stalling Caused by Calling the mova Instruction

A 3-clock cycle stall results unconditionally when mova is called.

Stalling Caused by Branching

A 2-clock cycle stall results if the program counter varies by other than +1 while executing a call or if instruction. A 2-clock cycle stall results at the instruction immediately after ret when execution returns from the source that called call.

When Multiple Reasons for Stalling Occur

Some instructions may stall for multiple reasons. In this case, the stall clock cycle count for the reason having the longest stall time is used.

Example:

main:
call label0 // Call to label0
...
end

label0:
dp4 r0.x , r1, r2
mova a0.x, r0.x
ret

In this example, when the mova instruction at the end of the subroutine named label0 is reached, a 4-clock cycle stall occurs because of a dependency on the immediately preceding dp4 instruction, a 3-clock cycle stall occurs when calling the mova itself, and a 2-clock cycle stall occurs due to branching. Because the longest stall time out of all these reasons for stalling is 4 clock cycles, the calculated result is a 4-clock cycle stall.

Result Output

Results of the performance checks are output to a file and the command prompt that executed the linker.
The output file is generated under the name of the binary file created by the linker, with its extension replaced by perf.txt.

The following content is output to the file.

Main object:obj¥VShader.o

Total executed clock count 14 clock
Total executed instruction count 7 instructions

Detail of stall
============================================================

VShader.vsh(26):2 clock stall is caused by branch.

VShader.vsh(36):4 clock stall is caused.
|
+--- 3 clock stall is by mova instruction.
|
+--- 2 clock stall is by branch.
|
+--- VShader.vsh(35):4 clock is to wait for this instruction to finish writing r0.

VShader.vsh(30):1 clock stall is caused by reading temporary register.
|
+--- VShader.vsh(29):To wait for this instruction to finish writing r1.
...

"Main object" is the name of the main object being checked. Each check result is output for all main objects being linked.
"Total executed clock count" is the total number of clock cycles required for execution per vertex.
"Total executed instruction count" is the number of instructions executed.
The location where a stall occurred, the reason for the stall, and the number of clock cycles that execution stalled are listed under "Detail of stall." In the case of stalling due to instruction dependencies, the location of the instruction being depended upon and the register indicating the reason are shown.
If there is more than one reason for a stall, the stall count for each separate reason and the stall clock cycle count of all results are taken together and shown.

The check results are output in the order instructions are executed, starting from the main label. If the same subroutine is called more than once, the same stall details are shown multiple times because a check is made for each instance the subroutine executes.

Details nearly identical to those output to file are also output to the command prompt where the linker was run. (Information about dependencies is indented.)
If the linker is executed from an environment such as Microsoft Visual Studio, you can jump to the shader assembler source file by clicking the output result of a reason for stalling in the output window. This is useful when you want to look at the location of a problem.

Debug Build

All assembler objects being linked can be forced to form a debug build by specifying the -debug option to ctr_VertexShaderLinker.
All assembler objects being linked can be forced to form a non-debug build by specifying the -nodebug option to ctr_VertexShaderLinker.


If a mixture of debug build and non-debug build assembler objects are linked and even one of the assembler objects referenced by each main object is a debug build, that main object also results in a debug build.

Error Codes (Linker)

Error Message Format

This page describes error messages output by the linker. Errors are output in the following format.

Input file name (line number of error): Error level (error code): Error description

The error level is either "warning" or "error." Execution can continue in the case of a "warning" level error. The input filename or error line number may not be displayed, depending on the type of error.

Error Message

This table gives the error codes output by the linker and their description.

8008xxxx

Error Code Message / Description
80080001 Input file is not specified.
Input file not specified.
80080005 “argument” is not found.
Input file could not be found.
80080006 Exceeded maximum number of long swizzle masks/patterns.
The number of swizzling patterns for map exceeds the upper limit.
80080007 Exceeded maximum number of swizzle masks/patterns.
The number of swizzling patterns exceeds the upper limit.
8008000f Label “label name” is duplicate.
The same label name is defined more than once in the subroutine object.
80080012 Cannot open output file.
An executable file could not be generated.
Check whether a read-only file having the same name already exists.
80080014 “input file name” is invalid file format.
The input file is not an object file.
80080015 Some input files are the same name.
Input files having the same name have been specified.
8008001d “label name” is not subroutine.
An ret instruction has not been set for a label called as a subroutine by the call instruction.
8008001f “label name” cannot be found in input object files.
The label referenced in the input file cannot be found.
80080020 Vertex shader size is over the limit.
The number of instructions in the shader exceeds the upper limit.
Shaders consisting of up to 512 instructions can be linked.
80080022 “register name” is duplicately defined in “object name” and “object name”.
A register is defined as having different values by more than one object through use of the def, defi, or defb instructions.
80080024 “register name” is duplicately defined in “object name” and “object name”.
An output register is mapped to different output data attributes by more than one object through the use of a #pragma output_map statement.
80080025 symbol “symbol name” is duplicately defined in “object name” and “object name”.
A symbol name is bound to different registers by more than one object through the use of #pragma bind_symbol definitions.
8008002a symbol “symbol name” in “object name” and “symbol name” in “object name” are bound to the same register.
A symbol in an object is bound to the same input register as another symbol in an object through the use of #pragma bind_symbol definitions.
8008002b “label name” is duplicately defined in “subroutine object name”.
A label name in the main object is also defined in a subroutine object.
8008002c “output data attribute name” is duplicately defined in “object name” and “object name”.
An output data attribute is mapped to different output registers by more than one object through the use of a #pragma output_map statement.
8008002d Main routine cannot be found.
An object that includes both main and endmain labels is not included among input files.
8008002e Cannot open map file.
A map file cannot be generated.
Check whether a read-only file having the same name already exists.
8008002f No input attribute is defined.
No input attributes are defined.
80080030 No output map is defined.
No output attributes are defined.
80080031 -debug and -nodebug cannot be specified together.
The -debug and -nodebug options cannot be specified at the same time.
80080032 def(bi) in ***.obj and bind_symbol in ***.obj specify the same register **.
The same register cannot be defined by both a def instruction and bind_symbol.
80080033 texture1 and texture2 need to be mapped to same register, if 4 textures are mapped.
If four textures have been defined using an output_map statement, texture1 and texture2 must be mapped to the same register.

4009xxxx

Error Code Message / Description
40090001 end instruction is not found.
An end instruction could not be found.
40090002 end instruction is found in loop statement.
An end instruction was found between a pair of loop-endloop instructions.
40090003 end instruction is found in only one of if and else statement.
An end instruction was found in only one pair of if-else and else-endif statements.
40090004 input register "label name" is not used.
The input register defined in a #pragma bind_symbol statement may not be usable.
40090005 The access patterns of input registers are different between if and else statement.
The input register being used differs between a pair of if-else and a pair of else-endif statements.
40090006 output register "register name" is not set.
The output register defined by a #pragma output_map statement may not be written.
40090007 output register is set in loop statement.
The output register is written to between a pair of loop-endloop statements.
40090008 The access patterns of output registers are different between if and else statement.
The output register being written differs between a pair of if-else and a pair of else-endif statements.
40090009 output register "register name" is already set before.
The output register is being written to more than once.
4009000a Recursive call is found, and skipped.
A subroutine is being called recursively. The call statement causing the subroutine to be called recursively is skipped by the consistency check feature.
4009000b Cannot open file for performance report.
A file for storing output results of the performance check feature cannot be created.
4009000c mova instruction both before and after returning from subroutine might cause hardware hang-up.
A malfunction may occur if a mova instruction occurs both immediately before and after a subroutine called by the call, callb, or callc instructions.
See "Malfunctions Caused by the mova Instruction."

Revision History

2012/01/31
Corrected the error code messages for error codes 800800{22, 24, 25, 2b, 2c}.
2011/12/20
Initial version.

CONFIDENTIAL