Following is a description of the hardware specifications for the vertex shader processor.
Below are the main features of the vertex shader processor.
- Four vertex shader units built into the GPU (of which one is shared with the geometry shader processor).
- Operations are conducted using 24-bit, floating-point numbers: 1 sign bit, 7 exponent bits and 16 mantissa bits.
- 32-bit fixed length instructions.
- Support for flow control instructions as well as instructions for indexing.
- Support for masking of output components and rearranging (swizzling) of input components.
- Cannot read or write any data other than register data.
The vertex shader processor has the following stages.
| Stage name |
Symbol in timetable |
Description |
| Prefetch |
|
Prefetches instruction from program RAM into cache.
|
| Fetch |
|
Fetches instruction from instruction cache.
|
| Decode |
|
Decodes fetched instruction.
|
| Read |
|
Reads data from register. This stage may not exist for some instructions.
|
| Execute |
etc.
|
Executes the instruction. Conducts such tasks as flow control, copying of registers, and computations in the arithmetic units. Some instructions use multiple arithmetic units for computations, so sometimes this stage can take 2 or more clock cycles.
|
| Post-processing |
|
Performs post-processing on the instruction. This stage may not exist for some instructions.
|
| Write-back |
|
Writes the result of instruction execution to a register.
|
The instruction timetables in the Assembler Reference show the stages from read through write-back.
Execution latency is less than the clock ticks shown in the timetable. Processing does not stall even when write-back of instruction takes place at the same clock time as the read stage of the next instruction.
The vertex shader processor incorporates the following arithmetic units.
| Unit name |
Number installed |
Clock cycles required for operation |
Symbol in timetable |
Description |
| MUL |
4 |
1 |
|
Calculates the product of two values. |
| ADD |
4 |
1 |
|
Calculates the sum of two values. |
| RCP / RSQ |
2 |
2 |
|
Calculates the reciprocal, reciprocal square root. |
| FLOOR |
4 |
1 |
|
Calculates the greatest integer less than or equal to the specified value. |
| LOG |
1 |
2 |
|
Calculates a binary logarithm. |
| EXP |
1 |
2 |
|
Calculates a power of two. |
| MAX |
4 |
1 |
|
Selects the larger of two given values. |
| MIN |
4 |
1 |
|
Selects the smaller of two given values. |
| SGE |
4 |
1 |
|
Compares two values to determine if the one is greater than or equal to the other. |
| SLT |
4 |
1 |
|
Compares two values to determine if the one is less than than other. |
| CMP |
2 |
2 |
|
Compares two values. |
The vertex shader processor incorporates the following register set.
| Register type |
No. of components |
Number |
R/W |
Bit width |
Description |
| Input registers |
4 |
16 |
R |
24 |
Floating-point registers storing vertex attribute data.
|
| Temporary registers |
4 |
16 |
RW |
24 |
Reusable floating-point registers for temporarily holding the results of calculations. The content is maintained until overwritten.
|
| Floating-point constant registers |
4 |
96 |
R |
24 |
Floating-point registers storing constants for operations. Supports specification of register index offset by address register and loop counter register.
|
| Address registers |
2 |
1 |
RW |
8 |
Integer-type register for specifying the register index offset for the floating-point constant registers.
|
| Boolean register |
1 |
16 |
R |
1 |
Boolean registers used for conditional branching and jumping. One of these registers (b15) is reserved for use by the geometry shader.
|
| Integer registers |
1 |
4 |
R |
24 |
Integer registers used for controlling loop instructions.
|
| Loop-counter register |
1 |
1 |
R |
8 |
Integer-type register that stores the counter value for loop instructions. Can be used for specifying the register index offset for the floating-point constant registers.
|
| Output registers |
4 |
16 |
W |
24 |
Floating-point registers for storing the vertex attribute data at the completion of processing by the vertex shader processor. The contents in these registers are output to later stages in the graphics pipeline (or to the geometry shader processor).
|
| Status registers |
1 |
2 |
RW |
1 |
Floating-point registers storing vertex attribute data.
|
The post-vertex cache functions to cache the result of the processing of vertex data by the vertex shader processor.
If the vertex buffer and vertex index are being used for the input of vertex data, the processing result of the vertex shader processor is stored in cache, and this processed data in cache is output when the same vertex index is input. Cache hits boost performance because processing by the vertex shader processor is skipped.
Cache for 32 entries is available. When cache is not hit, data is ejected from the cache starting from the oldest of the last-referenced data.
The probability of cache hits is higher when there are 32 or fewer sets of vertex data to be repeatedly referenced, and depending on how the vertex index has been created, rendering using TRIANGLES is more efficient in some cases than using TRIANGLE_STRIP.