GXSetGPMetric

Syntax

#include <revolution/gx.h>

void GXSetGPMetric( GXPerf0 perf0, GXPerf1 perf1 );

#define GXSetGP0Metric(perf0)    GXSetGPMetric((perf0), GX_PERF1_NONE)
#define GXSetGP1Metric(perf1)    GXSetGPMetric(GX_PERF0_NONE, (perf1))

Arguments

perf0 Counter 0 for a performance metric.
perf1 Counter 1 for a performance metric.

Return Values

None.

Description

The graphics processor (GP) can count many internal events that give detailed information on performance. This function sets two performance metrics, perf0 and perf1. The first targets measured are GX_PERF0_NONE and GX_PERF1_NONE, and the first call to the GXReadGPMetric function returns zero as a count value.

Because this function reads results from CPU-accessible registers in the GP, this command must not be used in a display list. Furthermore, in some cases the performance counters are triggered by sending tokens through the graphics FIFO to the GP. This implies that the function should only be used in immediate mode (when the graphics FIFO is connected simultaneously to the CPU and the GP). It may also be necessary to send a rendering synchronization token using the GXSetDrawSync function, or call the GXSetDrawDone function after the GXReadGPMetric function, to ensure that the state has actually been processed by the GP.

GXSetGPMetric( GX_PERF0_VERTICES, GX_PERF1_TEXELS );
GXClearGPMetric( );
drawSphere();
GXSetDrawSync(0xbabe);
while (0xbabe != GXReadDrawSync())
    ;
GXReadGPMetric(&verts, &texels);
OSReport("vertices in sphere %d, texels %d\n", verts, texels);

The GXReadGPMetric and GXClearGPMetric functions can be used in the callback associated with the render synchronization interrupt. See the GXSetDrawSyncCallback function. The GXSetGPMetric function should not be used in the render synchronization callback, because it will randomly insert tokens in the GP command stream.

#define OBJECTS 3
u32 count[OBJECTS][2]
void myDrawSyncCallback( u16 token )
{
    GXReadGPMetric(&count[token-1][0], &count[token-1][1]);
    GXClearGPMetric();
}
void myDraw( void )
{
    GXSetDrawSyncCallback( myDrawSyncCallback );
    GXSetGPMetric( GX_PERF0_VERTICES, GX_PERF1_TEXELS );
    drawSphere();
    GXSetDrawSync(1);
    drawCube();
    GXSetDrawSync(2);
    drawCylinder();
    GXSetDrawSync(3);
    GXDrawDone();
    for (i = 0; i < OBJECTS; i++)
        OSReport("object %d: vertices %d, texels %d\n", i, count[i][0],
            count[i][1]);
}

Each performance counter has a unique set of events or ratios that it can count. In some cases the same metric can be counted using both counters. For example, there are GX_PERF0_VERTICES and GX_PERF1_VERTICES. Ratios (metric names ending in _RATIO) are multiplied by 1000 (1000 = all misses/clips, etc., 0 = no misses/clips, etc.).

Convenient Functions

The GXSetGP0Metric/GXReadGP0Metric and GXSetGP1Metric/GXReadGP1Metric functions can be used when you need to read only one counter. Be careful not to call the GXSetGP0Metric/GXReadGP0Metric and GXSetGP1Metric/GXReadGP1Metric functions simultaneously. Call the GXReadGPMetric function instead.

Counter 0 Details

GX_PERF0_CLIP_VTX

Returns the number of vertices clipped by the GP.

GX_PERF0_CLIP_CLKS

Returns the number of GP clock cycles spent clipping.

GX_PERF0_XF_WAIT_IN
GX_PERF0_XF_WAIT_OUT
GX_PERF0_XF_XFRM_CLKS
GX_PERF0_XF_LIT_CLKS
GX_PERF0_XF_BOT_CLKS
GX_PERF0_XF_REGLD_CLKS
GX_PERF0_XF_REGRD_CLKS

The GP transform engine (XF) is a pipeline with an input stage, parallel transform, and lighting stages, and a 'bottom of pipe' processor which merges the results of lighting and texture coordinate generation. These performance counters measure how many cycles are spent in each stage of the XF.

GX_PERF0_XF_WAIT_IN measures how many cycles the XF spends waiting on input.
If the XF is waiting for a large percentage of the total time, it may indicate that the CPU is not supplying data fast enough to keep the GP busy.

GX_PERF0_XF_WAIT_OUT measures how many cycles the XF spends waiting to send its output to the rest of the GP pipeline.
If the XF cannot output, it may indicate that the GP is limited by the current fill-rate.

GX_PERF0_XF_XFRM_CLKS indicates the number of cycles during which the conversion engine is busy.
GX_PERF0_XF_LIT_CLKS indicates the number of cycles during which the lighting engine is busy.
GX_PERF0_BOT_CLKS indicates the number of cycles during which the bottom of the pipe is busy.

The XF contains state registers that control its processing. These registers are normally set using various GX API functions.
GX_PERF0_XF_REGLD_CLKS measures how many cycles are spent loading XF registers and GX_PERF0_XF_REGRD_CLKS measures for how many cycles the XF reads the state registers.

GX_PERF0_TRIANGLES*

The triangle metrics allow for the counting of triangles under specific conditions or with specific attributes.
GX_PERF0_TRIANGLES counts all triangles. 
GX_PERF0_TRIANGLES_CULLED counts triangles that failed the front/back-face culling test.
GX_PERF0_TRIANGLES_PASSED counts triangles that passed that test.
GX_PERF0_TRIANGLES_SCISSORED counts the scissored triangles.

The GX_PERF0_TRIANGLES_*TEX metrics count triangles based on the number of texture coordinates supplied.
The GX_PERF0_TRIANGLES_*CLR metrics count triangles based on the number of colors supplied.

GX_PERF0_QUAD*

The rectangle metrics allow you to count the number of rectangles (2x2 pixel units, or quads) the GP processed.
Coverage is used to indicate how many pixels in the quad are actually part of triangles being rasterized. For example, a coverage of 4 means all pixels in the quad intersect triangles.
A coverage of 1 indicates that only 1 pixel in the quad intersects a triangle.

GX_PERF0_QUAD_0CVG indicates the number of quads with 0 coverage.
GX_PERF0_QUAD_NON0CVG counts the number of quads that have greater than zero coverage.
GX_PERF0_QUAD_[1-4]CVG counts the quads with a given coverage.

GX_PERF0_AVG_QUAD_CNT indicates the average quad count.

GX_PERF0_CLOCKS

GX_PERF0_CLOCKS counts the number of GP clock cycles that have elapsed since the previous GXReadGP0Metric function call.

GX_PERF0_NONE

This metric is used to disable counting on GP counter 0 and clears the current count.

Counter 1 Details

GX_PERF1_TEXELS

This metric returns the number of texels processed by the GP.

GX_PERF1_TX_IDLE

Returns the number of clock cycles for which the texture unit (TX) is idle.

GX_PERF1_TX_REGS

Returns the number of GP clock cycles spent writing to state registers in the TX unit.

GX_PERF1_TX_MEMSTALL

Returns the number of GP clock cycles the TX unit is stalled while waiting for main memory.

GX_PERF1_TC_CHECK1_2
GX_PERF1_TC_CHECK3_4
GX_PERF1_TC_CHECK5_6
GX_PERF1_TC_CHECK7_8
GX_PERF1_TC_MISS

These metrics can be used to compute the texture cache (TC) miss rate. The TC_CHECK* arguments count how many texture cache lines are accessed for each pixel. In the worst case, for a mipmap, up to 8 cache lines may be accessed to produce one textured pixel. GX_PERF1_TC_MISS counts the number of accesses with the texture cache. 

To compute the cache miss rate, compute:

GX_PERF1_TC_MISS / (GX_TC_PERF1_TC_CHECK1_2 + GX_PERF1_TC_CHECK3_4 + GX_PERF1_TC_CHECK_5_6 + GX_PERF1_TC_CHECK_7_8)

GX_PERF1_VC_ELEMQ_FULL
GX_PERF1_VC_MISSQ
GX_PERF1_VC_MEMREQ_FULL
GX_PERF1_VC_STATUS7
GX_PERF1_VC_MISSREP_FULL
GX_PERF1_VC_STREAMBUF_LOW
GX_PERF1_VC_ALL_STALLS

These metrics count different vertex cache stall conditions.

GX_PERF1_VERTICES

This metric returns the number of vertices processed by the GP.

GX_PERF1_FIFO_REQ

This metric counts the number of (32-byte) lines read from the GP FIFO.

GX_PERF1_CALL_REQ

This metric counts the number of (32-byte) lines read from called display lists (GXCallDisplayList).

GX_PERF1_VC_MISS_REQ

This metric counts the number of vertex cache miss requests. Each miss request is in the form of a 32-byte transfer from main memory.

GX_PERF1_CP_ALL_REQ

This metric counts all requests (32 bytes per request) from the GP command processor (CP).
It should be equal to the sum of counts returned by GX_PEF1_FIFO_REQ, GX_PERF1_CALL_REQ and GX_PERF1_VC_MISS_REQ.

GX_PERF1_CLOCKS

GX_PERF1_CLOCKS counts the number of GP clock cycles that have elapsed since the previous call to the GXReadGP1Metric function.

GX_PERF1_NONE

This metric is used to disable counting on GP counter 1 and clears the current count.

See Also

GXReadMemMetric, GXClearMemMetric, GXReadPixMetric, GXClearPixMetric, GXSetVCacheMetric, GXReadVCacheMetric, GXClearVCacheMetric

Revision History

2006/03/01 Initial version.


CONFIDENTIAL