ctr_VertexShaderLinker

Table of Contents

Introduction

ctr_VertexShaderLinker links object files output by ctr_VertexShaderAssembler, and outputs an executable file.

How to Use

Commands

% ctr_VertexShaderLinker32 <input file name> [options...]

Input files must be object files that include at least one main function. Options can be omitted. Help is displayed if the assembler is executed without any arguments.

Input Files

Specify the object files to be input in place of <input file name>.
If input files include more than one main object, the application must specify the same number of shader objects as there are main objects to the glShaderBinary function.
All shader objects will access main objects in the order specified by arguments to ctr_VertexShaderLinker32 at this time.
For the file name, specify a string of 128 or fewer characters that contains no spaces, using ASCII alphanumeric characters and any symbols other than the \ / : * ? " < > and | symbols.

Options

The following options can be specified in place of [options...].

Options Description
-I<file path> Specifies the file path of the input file.
Input files are looked for in the current directory, and in the order directories are specified using this option.
-O <File Name> Specifies the output filename.
The default is "shader.bin".
-M Outputs a map file.
This file has the same name as the executable file, with its extension changed to ".map". See Map File Output Features for more details about map files.
-debug Links all object files with debug information included.
-nodebug Links all object files without including debug information.
-check_consistency Checks the consistency of all main objects being linked.
For details on the consistency check, see Consistency Check Feature.
-check_performance Checks the performance of all main objects being linked.
See Performance Check Features for more information about performance checks.
-? or -help Displays help.

Map File Output Feature

Information about executable files linked by executing ctr_VertexShaderLinker32 with the -M option specified. The output file is called a map file.
The map file is generated using the same name as the executable file, with the extension changed to ".map".

The following information is generated: Loading objects order, image sizes, program code information, object information, and swizzle pattern data.

Loading Objects Order

This item shows the order in which main objects have been linked.
The order in which main objects are linked is in the same order that they were specified as arguments to ctr_VertexShaderLinker32. The order of objects shown here is the same as the order each reference in the shader object was specified by the glShaderBinary function. The main objects that are debug builds are also shown.

Image Sizes

This item shows the size of the linked object files.
"Instruction" represents the number of assembler instructions, and "Swizzle" represents the number of swizzling and masking patterns. (For details, see the heading "Number of Swizzling and Masking Patterns" under "Notes and Limitations."))
"Total" represents the total size after linking.

Program Code Information

This item shows the storage location for program code inside the executable file.
"Program code offset" represents the offset (in bytes) from the start of the file to the address at which program data is stored in the executable file. "Program code size" shows the size (in bytes) of program code data.

Object Information

This item shows the symbol information, output data attribute information, and program start address for each main object that has been linked.
Each of these, respectively, represent settings specified using a #pragma bind_symbol or #pragma output_map statement, and the program address set by the main label.

Swizzle Pattern Data

This item shows the swizzle pattern data of the executable file.

Consistency Check Feature

The consistency check feature checks that the assembler has been implemented correctly.
This feature is implemented in the linker and can be enabled by specifying the -check_consistency option to ctr_VertexShaderLinker32.exe.
Several items are checked for each main object being linked. The check traverses each instruction in the main object from the main label up to the endmain label. A warning is output if an implementation of the item being checked is found.

Each instruction is checked according to the following conditions.

The consistency check checks the items listed below. A detailed description of each item is given below. The term "if instruction" used in this section, includes both ifb and ifc instructions.

Checks for Execution of an end Instruction

This feature checks whether an end instruction has been called correctly.

A warning is output for the following checked items.

Checks Reads from Input Registers

This feature checks whether input registers are being used correctly.
This check is performed on the input registers and components specified using a #pragma bind_symbol statement.

A warning is output for the following checked items.

Checks Writes to Output Registers

This feature checks whether output registers are being used correctly.
Output registers specified using #pragma output_map statements are checked. All xyzw components are also checked.

A warning is output for the following checked items.

Performance Check Features

Performance check features include a feature for detecting instructions that cause execution to stall and a feature that estimates the number of clock cycles required per vertex when executing from a shader assembler implementation.
This feature is implemented in the linker, and can be enabled by specifying the -check_performance option to ctr_VertexShaderLinker32.exe.
Each main object being linked is checked. The check traverses each instruction in the main object from the main label up to an end instruction or endmain label. Results of the check are output to the file and the command prompt that executed the linker. The output file is generated under the name of the binary file created by the linker, with its extension replaced by "perf.txt." (It is generated in the same location as the binary file.)

Each instruction is checked according to the following conditions.

The following items are included in output results.
When calculating the total number of clock cycles, the number of clock cycles that execution is stalled is added as one clock cycle per instruction when a stall occurs.
The reasons output for stalling do not correspond to all of the actual hardware reasons that might result in a stall. Note that the total number of clock cycles calculated should only be used as a guideline, as it does not, necessarily, exactly match the clock cycles executed in actual hardware.
A detailed description of this feature is given below.

Detectable Reasons for Stalling

You can use this feature to detect several reasons for stalling. A detailed description of each associated item is given below.

Stalling Due to Instruction Dependencies

You can detect stalling due to dependency on the register used by an instruction.
Given two instructions where the second instruction references the calculation result of the first instruction, a stall occurs if the second instruction is called before the first instruction has completed its calculation and so must wait (stall) to reference the result.

The number of clock cycles that execution stalls depends on the latency of the instruction called first and on the number of instructions executed between these two instructions. For details on the latency of each instruction, see Instruction Latency. The register in question may be a temporary register or status register. In the case of status registers, ifc, callc, and breakc may be dependent on cmp.

* Example:

dp4 r0 , r1 , r2
add r3 , r4 , r0

In the example above, the result calculated by dp4 for r0 is used by the add instruction that immediately follows. Since the latency of dp4 is 5 clock cycles, a stall of 4 clock cycles is the detected result when add executes.

Some instructions depend on more than one other instruction. In this case, the number of clock cycles for the instruction with the greatest stall time is used as the stall count for the dependent instruction.

Example:

dp4 r0.x , r1 , r2
mov r0.y, c0
add r3 , r4 , r0

In the example above, the third add instruction depends on both dp4 and mov for r0. add stalls for 3 clock cycles due to dp4 and 1 clock cycle due to mov. Since the highest stall time of the two is 3 clock cycles, a stall of 3 clock cycles is used as the detected result.

Stalling Due to Calling the mova Instruction

A 3-clock cycle stall results unconditionally when mova is called.

Stalling Due to Branching

A 2-clock cycle stall results if the program counter varies by other than +1 due to executing a call or if instruction. A 2-clock cycle stall results at the instruction immediately after ret when execution returns from the source that called call.

When Multiple Reasons for Stalling Occur

With some instructions, there may be multiple reasons for stalling. In this case, the stall clock cycle count for the reason having the longest stall time is used.

Example:

main:
call label0 // Call to label0
...
end

label0:
dp4 r0.x , r1, r2
mova a0.x, r0.x
ret

In the above example, when the mova instruction at the end of the subroutine named label0 is reached, a 4-clock cycle stall occurs due to dependence on the immediately preceding dp4 instruction, a 3-clock cycle stall occurs due to calling the mova itself, and a 2-clock cycle stall occurs due to branching. Since the longest stall time out of all these reasons for stalling is 4 clock cycles, the calculated result is a 4-clock cycle stall.

Result Output

Results of the performance checks are output to a file and the command prompt that executed the linker.
The output file is generated under the name of the binary file created by the linker, with its extension replaced by "perf.txt."

The following content is output to the file:

Main object:obj¥VShader.o

Total executed clock count 14 clock
Total executed instruction count 7 instructions

Detail of stall
============================================================

VShader.vsh(26):2 clock stall is caused by branch.

VShader.vsh(36):4 clock stall is caused.
|
+--- 3 clock stall is by mova instruction.
|
+--- 2 clock stall is by branch.
|
+--- VShader.vsh(35):4 clock is to wait for this instruction to finish writing r0.

VShader.vsh(30):1 clock stall is caused by reading temporary register.
|
+--- VShader.vsh(29):To wait for this instruction to finish writing r1.
...

"Main object" is the name of the main object being checked. Each check result is output for all main objects being linked.
"Total executed clock count" is the total number of clock cycles required for execution per vertex.
"Total executed instruction count" is the number of instructions executed.
The location where a stall occurred, the reason for the stall, and the number of clock cycles that execution stalled are listed under "Detail of stall." In the case of stalling due to instruction dependencies, the location of the instruction being depended upon and the register indicating the reason are shown.
If there is more than one reason for a stall, the stall count for each separate reason and the stall clock cycle count of all results taken together are shown.

Check results are output in the order instructions are executed, starting from the main label. If the same subroutine is called more than once, the same stall details are shown multiple times, because a check is made for each instance the subroutine executes.

Details nearly identical to those output to the file are also output to the command prompt that executed the linker. (Information regarding dependencies is indented.)
If the linker is executed from an environment such as Microsoft Visual Studio, you can jump to the shader assembler source file by clicking the output result of a reason for stalling in the output window. This is useful when you want to look at the location of a problem.

Debug Build

All assembler objects being linked can be forced to form a debug build by specifying the -debug option to ctr_VertexShaderLinker.
All assembler objects being linked can be forced to form a non-debug build by specifying the -nodebug option to ctr_VertexShaderLinker.

If a mixture of debug build and non-debug build assembler objects are linked and even one of the assembler objects referenced by each main object is a debug build, that main object also results in a debug build.

Error Codes (Linker)

Error Message Format

This page describes error messages output by the linker. Errors are output in the following format.

Input file name (line number of error): Error level (error code): Error description

Errors have an error level of either "warning" or "error." Execution can continue in the case of a "warning" level error. The input file name and/or error line number may not be displayed, depending on the type of error.

Error Message

This table gives the error codes output by the linker and their description.

8008xxxx

Error Code Message / Description
80080001 Input file is not specified.
Input file not specified.
80080005 “argument” is not found.
Input file could not be found.
80080006 Exceeded maximum number of long swizzle masks/patterns.
The number of swizzling patterns for map exceeds the upper limit.
80080007 Exceeded maximum number of swizzle masks/patterns.
The number of swizzling patterns exceeds the upper limit.
8008000f Label “label name” is duplicated.
The same label name is defined more than once in the subroutine object.
80080012 Cannot open output file.
An executable file could not be generated.
Check whether a read-only file having the same name already exists.
80080014 “input file name” is invalid file format.
The input file is not an object file.
80080015 Some input files are the same name.
Input files having the same name have been specified.
8008001d “label name” is not subroutine.
An ret instruction has not been set for a label called as a subroutine by the call instruction.
8008001f “label name” cannot be found in input object files.
The label referenced in the input file cannot be found.
80080020 Vertex shader size is over the limit.
The number of instructions in the shader exceeds the upper limit.
Shaders consisting of up to 512 instructions can be linked.
80080022 “register name” is duplicately defined in “object name” and “object name”.
A given register is defined as having different values by more than one object through use of the def, defi, or defb instructions.
80080024 “register name” is duplicately defined in “object name” and “object name”.
A given output register is mapped to different output data attributes by more than one object through the use of a #pragma output_map statement.
80080025 symbol “symbol name” is duplicately defined in “object name” and “object name”.
A given symbol name is bound to different registers by more than one object through the use of #pragma bind_symbol definitions.
8008002a symbol “symbol name” in “object name” and “symbol name” in “object name” are bound to the same register.
A given symbol in an object is bound to the same input register as another symbol in an object through the use of #pragma bind_symbol definitions.
8008002b “label name” is duplicately defined in “subroutine object name”.
A label name in the main object is also defined in a subroutine object.
8008002c “output data attribute name” is duplicately defined in “object name” and “object name”.
A given output data attribute is mapped to different output registers by more than one object through the use of a #pragma output_map statement.
8008002d Main routine cannot be found.
An object that includes both main and endmain labels is not included among input files.
8008002e Cannot open map file.
A map file cannot be generated.
Check whether a read-only file having the same name already exists.
8008002f No input attribute is defined.
No input attributes are defined.
80080030 No output map is defined.
No output attributes are defined.
80080031 -debug and -nodebug cannot be specified together.
The -debug and -nodebug options cannot be specified at the same time.
80080032 def(bi) in ***.obj and bind_symbol in ***.obj specify the same register **.
The same register cannot be defined by both a def instruction and bind_symbol.
80080033 texture1 and texture2 need to be mapped to same register, if 4 textures are mapped.
If four textures have been defined using an output_map statement, texture1 and texture2 must be mapped to the same register.

4009xxxx

Error Code Message / Description
40090001 end instruction is not found.
An end instruction could not be found.
40090002 end instruction is found in loop statement.
An end instruction was found between a pair of loop-endloop instructions.
40090003 end instruction is found in only one of if and else statement.
An end instruction was found in only one pair of if-else and else-endif statements.
40090004 input register "label name" is not used.
The input register defined in a #pragma bind_symbol statement may not be usable.
40090005 The access patterns of input registers are different between if and else statement.
The input register being used differs between a pair of if-else and a pair of else-endif statements.
40090006 output register "register name" is not set.
The output register defined by a #pragma output_map statement may not be written.
40090007 output register is set in loop statement.
The output register is written to between a pair of loop-endloop statements.
40090008 The access patterns of output registers are different between if and else statement.
The output register being written differs between a pair of if-else and a pair of else-endif statements.
40090009 output register "register name" is already set before.
The output register is being written to more than once.
4009000a Recursive call is found, and skipped.
A subroutine is being called recursively. The call statement causing the subroutine to be called recursively is skipped by the consistency check feature.
4009000b Cannot open file for performance report.
A file for storing output results of the performance check feature cannot be created.
4009000c mova instruction both before and after returning from subroutine might cause hardware hang-up.
A malfunction may occur if a mova instruction occurs both immediately before and after a subroutine called by the call, callb, or callc instructions.
See "Malfunctions Caused by the mova Instruction."

Revision History

2012/01/31
Corrected the error code messages for error codes 800800{22, 24, 25, 2b, 2c}.
2011/12/20
Initial version.

CONFIDENTIAL