Hardware Specifications

This section describes the vertex shader processor and the related hardware specifications.

Vertex Shader Processor

Following is a description of the hardware specifications for the vertex shader processor.

Features

Below are the main features of the vertex shader processor.


Stage Structure

The vertex shader processor has the following stages.

Stage name Symbol in timetable Description
Prefetch
p.fetch
Prefetches instruction from program RAM into cache.
Fetch
fetch
Fetches instruction from instruction cache.
Decode
decode
Decodes fetched instruction.
Read
read
Reads data from register.
This stage may not exist for some instructions.
Execute
ifb mov
MUL MAX
etc.
Executes the instruction.
Conducts such tasks as flow control, copying of registers, and computations in the arithmetic units.
Some instructions use multiple arithmetic units for computations, so sometimes this stage can take 2 or more clock cycles.
Post-processing
post
Performs post-processing on the instruction.
This stage may not exist for some instructions.
Write-back
write
Writes the result of instruction execution to a register.

The instruction timetables in the Assembler Reference show the stages from read through write-back.
Execution latency is less than the clock ticks shown in the timetable. Processing does not stall even when write-back of instruction takes place at the same clock time as the read stage of the next instruction.


Arithmetic Unit Structure

The vertex shader processor incorporates the following arithmetic units.

Unit name Number installed Clock cycles required for operation Symbol in timetable Description
MUL 4 1
MUL
Calculates the product of two values.
ADD 4 1
ADD
Calculates the sum of two values.
RCP / RSQ 2 2
RCP / RSQ
Calculates the reciprocal, reciprocal square root.
FLOOR 4 1
FLOOR
Calculates the greatest integer less than or equal to the specified value.
LOG 1 2
LOG
Calculates a binary logarithm.
EXP 1 2
EXP
Calculates a power of two.
MAX 4 1
MAX
Selects the larger of two given values.
MIN 4 1
MIN
Selects the smaller of two given values.
SGE 4 1
SGE
Compares two values to determine if the one is greater than or equal to the other.
SLT 4 1
SLT
Compares two values to determine if the one is less than than other.
CMP 2 2
CMP
Compares two values.


Register Set

The vertex shader processor incorporates the following register set.

Register type No. of components Number R/W Bit width Description
Input registers 4 16 R 24 Floating-point registers storing vertex attribute data.
Temporary registers 4 16 RW 24 Reusable floating-point registers for temporarily holding the results of calculations.
The content is maintained until overwritten.
Floating-point constant registers 4 96 R 24 Floating-point registers storing constants for operations.
Supports specification of register index offset by address register and loop counter register.
Address registers 2 1 RW 8 Integer-type register for specifying the register index offset for the floating-point constant registers.
Boolean register 1 16 R 1 Boolean registers used for conditional branching and jumping.
One of these registers (b15) is reserved for use by the geometry shader.
Integer registers 1 4 R 24 Integer registers used for controlling loop instructions.
Loop-counter register 1 1 R 8 Integer-type register that stores the counter value for loop instructions.
Can be used for specifying the register index offset for the floating-point constant registers.
Output registers 4 16 W 24 Floating-point registers for storing the vertex attribute data at the completion of processing by the vertex shader processor.
The contents in these registers are output to later stages in the graphics pipeline (or to the geometry shader processor).
Status registers 1 2 RW 1 Floating-point registers storing vertex attribute data.


Post-Vertex Cache

The post-vertex cache functions to cache the result of the processing of vertex data by the vertex shader processor.

If the vertex buffer and vertex index are being used for the input of vertex data, the processing result of the vertex shader processor is stored in cache, and this processed data in cache is output when the same vertex index is input. Cache hits boost performance because processing by the vertex shader processor is skipped.

Cache for 32 entries is available. When cache is not hit, data is ejected from the cache starting from the oldest of the last-referenced data.
The probability of cache hits is higher when there are 32 or fewer sets of vertex data to be repeatedly referenced, and depending on how the vertex index has been created, rendering using TRIANGLES is more efficient in some cases than using TRIANGLE_STRIP.


Revision History

2011/12/20
Initial version.

CONFIDENTIAL