Command-Buffer Jumps

Overview
Features of Command-Buffer Jumps

Benefits
Drawbacks

Functions That Support Command-Buffer Jumps

Only the nngx API
GR Library and nngx API
GD Library
Directly Generating 3D Commands

Comments

Placement of Command Buffers
Flushing the CPU Cache When Inserting Split Commands (When Using nngx API Functions)
Execution Status of the Command List

Revision History

You can use command buffer addresses and sizes and kick commands to make execution jump to command buffers at different addresses.

The libraries provided by the SDK have API functions not only for normal, unidirectional command-buffer jumps, but also for jumping to command buffers in different locations in the form of subroutines that execute and then jump back to resume execution of subsequent commands in the original command buffer.

There are both benefits and drawbacks to the use of command-buffer jumps, and these are both considered below.

Benefits

Enables the reuse of command buffers without adding and duplicating command requests, thereby reducing the load on the CPU.
Destination commands can be referenced directly from the GPU without copying to the current command buffer, so you can optimize the size of buffer assigned to the command list.

Drawbacks

The application must not only create the destination command buffers but also place them and perform all other related tasks.
When you execute a lot of jumping it becomes difficult to track the causes of rendering-related bugs.
The jumping process itself places a higher load on the GPU, so heavy use of command-buffer jumps can have an overall negative effect.

Only the `nngx` API

There is the nngxAddJumpCommand function for unidirectional jumps, and the nngxAddSubroutineCommand function for performing command-buffer jumps as subroutines.

The nngx API functions operate internally and automatically to adjust the byte alignment and size.
The subroutine made for the command-buffer jump must add a Channel 1 kick command to the end of the command buffer. (For this, you can use the nn::gr::MakeChannelKickCommand function, which is described later.)

If you are going to add 3D rendering command requests (to add split commands) including a jump, use the following functions:

nngxFlush3DCommand function. (The nngxSplitDrawCmdlist function is also fine, but not recommended.)
nngxFlush3DCommandNoCacheFlush function.

If using this later function, you need to explicitly flush the CPU cache for the command buffer.
The nngxFlush3DCommandPartially function does not perform this flush internally.

GR Library and `nngx` API

For unidirectional jumps there is the nn::gr::MakeChannel0JumpCommand function and the nn::gr::MakeChannel1JumpCommand function. For subroutines, there is the nn::gr::MakeChannel0SubroutineCommand function and the nn::gr::MakeChannel1SubroutineCommand function.
In addition, the nn::gr::MakeChannelKickCommand function is supported for adding the kick command itself.

The GR library does not take the command buffer size and alignment into consideration when adding jump-related commands.
You must make adjustments based on the size of commands added by the API functions.
Alternatively, you can select the channels to use and the commands to add. Also, the nn::gr::CommandBufferJumpHelper class helps you to adjust command sizes and create kick commands, but does not allow you to select channels.

API	Size (in Bytes) of the Added Command
`MakeChannelKickCommand`	8
`MakeChannel0SubroutineCommand`	24
`MakeChannel1SubroutineCommand`	32
`MakeChannel0JumpCommand`	24
`MakeChannel1JumpCommand`	24

To add a split command you must use the nngxFlush3DCommandPartially function.
For the argument, specify the size from the start of the command buffer to the (first) kick command.
If you are using the GR library, you must explicitly flush the CPU cache for the command buffer.
The nngxFlush3DCommandPartially function does not perform this flush internally.

GD Library

If you use the GD library, you do not need to make direct calls to the nngx functions for command-buffer jumping. The library acts internally to call the necessary functions.

Specify RECORD_3D_COMMAND_BUFFER_FOR_JUMP for the usage parameter of the nn::gd::System::StartRecordingPackets function and create the 3D command buffer you want as a subroutine. After the necessary commands are created, call the nn::gd::System::StopRecordingPackets function. The jump command is added internally when you specify the saved RecordedPacketId to the nn::gd::System::ReplayPackets function.

Directly Generating 3D Commands

You can also jump by directly creating commands to send to the GPU. Use registers 0x238 to 0x23d. For more information, see the documentation.

However, there is really no benefit to creating your own commands, so we recommend normally using one of the other methods.

The following supplemental information describes cautions to take when implementing command-buffer jumps, and ways to boost efficiency.

Placement of Command Buffers

Access to command buffers from the GPU is faster when the subroutines of commands are stored in VRAM rather than in main memory. If access to command buffers in main memory becomes a bottleneck, you can expect an overall boost in processing speed by saving to VRAM.

To place commands in VRAM, use the nngxAddVramDmaCommand function or the nngxAddVramDmaCommandNoCacheFlush function.

Flushing the CPU Cache When Inserting Split Commands (When Using `nngx` API Functions)

If you implement a subroutine using the nngx API functions, calling the nngxFlush3DCommand function when adding a split command increases the load on the CPU because the CPU cache is flushed each time it is called.

For this reason, it is more efficient to keep the 3D rendering command requests together without splitting if they include a subroutine call.
If you need to have 3D rendering command requests containing multiple subroutines, what you can do is use the nngxFlush3DCommandNoCacheFlush function when adding the split command and then later call the nn::gx::UpdateBuffer function for the entire required region so that all flush operations are done together.

Execution Status of the Command List

If there are no unexecuted command requests in the command list called by the nngxRunCmdlist function, that command list enters the "waiting to run" state.
In this state, if a new command request is added to the list, that request begins to run. During command request processing, the command list is in the "running" state.

When the command list is in the running state, some nngx API functions generate an error and interrupt the processing.
You must be particularly careful with the previously-mentioned nngxFlush3DCommandPartially function.

For implementations similar to the following example, depending on the timing, the intended commands may not be generated and the GPU may hang.

Example of a bad implementation (using one command list)

// Bad implementation.
// One command list is used while it remains in the "waiting to run" state.

Draw()
{
    // Add a command request to clear the render buffer. ... (A)
    nngxAddMemoryFillCommand(...);

    // Create some rendering command. (Assume that the GR library is internally applying the jump.)
    DrawObjects();
    // Add a split command. (Add a 3D rendering command request.)... (B)
    nngxFlush3DCommandPartially(buffersize);
    // Flush the CPU cache for the command buffer.
    nngxUpdateBuffer(...);

    // Add a command request to transfer data to the display buffer. ... (C)
    nngxTransferRenderImage(...);

    // Wait for execution to complete.
    nngxWaitCmdlistDone();
    // Swap buffers.
    Swap();
    // Clear the command list.
    nngxClearCmdlist();
}

If the command list is in the "waiting to run" state, it transitions to the "running" state when the command list is added (A).
If (B) takes place before command request (A) completes, the nngxFlush3DCommandPartially function generates the error GL_ERROR_80AD_DMP.
The split command is not generated when the error occurs in (B), but when the nngxTransferRenderImage function is called in (C), a split command is added if there is any unprocessed command buffer, and a 3D rendering command request created.
Because the execution size is not the size that was intended from the settings (the size from the start to the first kick command), the correct render result is not obtained and, in some cases, the GPU may hang.

There are several possible workarounds.

Duplicate the command list.
Immediately before step (B), add the nngxWaitCmdlistDone function to wait for the command list to finish running.

2012/06/26: Added a note about the nn::gr::CommandBufferJumpHelper class.
2012/02/17: Added a table of contents and information about the GD library.
2012/02/08: Initial version.