1<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
2<html xml:lang="en-US" lang="en-US" xmlns="http://www.w3.org/1999/xhtml">
3  <head>
4    <meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
5    <meta http-equiv="Content-Style-Type" content="text/css" />
6    <link rel="stylesheet" href="../css/manpage.css" type="text/css" />
7    <link rel="stylesheet" href="../css/timetable.css" type="text/css" />
8    <title>Instruction Latency</title>
9  </head>
10  <body>
11    <h1><a name="top">Instruction Latency</a></h1>
12    <div class="section">
13      <p>
14        The latency associated with execution of each instruction is given below.<BR> <br>
15      </p>
16      <table>
17        <thead>
18        <tr>
19          <td>Instructions</td>
20          <td>Latency (in clock cycles)</td>
21        </tr>
22        </thead>
23        <tbody>
24        <tr>
25          <th><CODE>add</CODE></th>
26          <td>3</td>
27        </tr>
28        <tr>
29          <th><CODE>dp3</CODE></th>
30          <td>5</td>
31        </tr>
32        <tr>
33          <th><CODE>dp4</CODE></th>
34          <td>5</td>
35        </tr>
36        <tr>
37          <th><CODE>dph</CODE></th>
38          <td>5</td>
39        </tr>
40        <tr>
41          <th><CODE>dst</CODE></th>
42          <td>3</td>
43        </tr>
44        <tr>
45          <th><CODE>exp</CODE></th>
46          <td>4</td>
47        </tr>
48        <tr>
49          <th><CODE>flr</CODE></th>
50          <td>2</td>
51        </tr>
52        <tr>
53          <th><CODE>litp</CODE></th>
54          <td>4</td>
55        </tr>
56        <tr>
57          <th><CODE>log</CODE></th>
58          <td>4</td>
59        </tr>
60        <tr>
61          <th><CODE>mad</CODE></th>
62          <td>4</td>
63        </tr>
64        <tr>
65          <th><CODE>max</CODE></th>
66          <td>2</td>
67        </tr>
68        <tr>
69          <th><CODE>min</CODE></th>
70          <td>2</td>
71        </tr>
72        <tr>
73          <th><CODE>mov</CODE></th>
74          <td>2</td>
75        </tr>
76        <tr>
77          <th><CODE>mova</CODE></th>
78          <td>4</td>
79        </tr>
80        <tr>
81          <th><CODE>mul</CODE></th>
82          <td>3</td>
83        </tr>
84        <tr>
85          <th><CODE>nop</CODE></th>
86          <td>1</td>
87        </tr>
88        <tr>
89          <th><CODE>rcp</CODE></th>
90          <td>4</td>
91        </tr>
92        <tr>
93          <th><CODE>rsq</CODE></th>
94          <td>4</td>
95        </tr>
96        <tr>
97          <th><CODE>sge</CODE></th>
98          <td>2</td>
99        </tr>
100        <tr>
101          <th><CODE>slt</CODE></th>
102          <td>2</td>
103        </tr>
104        <tr>
105          <th><CODE>cmp</CODE></th>
106          <td>4</td>
107        </tr>
108        <tr>
109          <th>Other Branch Instructions</th>
110          <td>1 &ndash; 3</td>
111        </tr>
112        </tbody>
113      </table>
114    </div>
115
116    <h2><a name="arithmetic_cmp">Latency of Arithmetic Instructions and the <CODE>cmp</CODE> Instruction</a></h2>
117    <div class="section">
118      <p>
119        The clock cycles given in the table above for arithmetic instructions and the <CODE>cmp</CODE> instruction are approximate values. Actual latency depends on the instructions that come before and after each instruction. Latency can be reduced by inserting instructions unrelated to registers used in arithmetic calculations.<BR> <br> <br> Example where latency increases:<br>
120<pre class="definition">
121mul     r0, r1, r2
122mad     r1, r0, r4, r5
123add     r8, r9, r10
124mad     r9, r8, r10, r11
125add     r7, c1, r12
126</pre>
127        <br>
128        <table class="timetable">
129          <tr>
130            <th></th>
131            <th>1</th><th>2</th><th>3</th><th>4</th><th>5</th><th>6</th><th>7</th><th>8</th><th>9</th><th>10</th>
132            <th>11</th><th>12</th><th>13</th><th>14</th>
133          </tr>
134          <tr>
135            <th>mul</th>
136            <td class="read">read</td>
137            <td class="MUL">MUL</td>
138            <td class="post">post</td>
139            <td class="write">write</td>
140          </tr>
141          <tr>
142            <th>mad</th>
143            <td class="empty" colspan="1"></td>
144            <td class="stall" colspan="2">STALL</td>
145            <td class="read">read</td>
146            <td class="MUL">MUL</td>
147            <td class="ADD">ADD</td>
148            <td class="post">post</td>
149            <td class="write">write</td>
150          </tr>
151          <tr>
152            <th>add</th>
153            <td class="empty" colspan="4"></td>
154            <td class="read">read</td>
155            <td class="stall" colspan="1">STALL</td>
156            <td class="ADD">ADD</td>
157            <td class="post">post</td>
158            <td class="write">write</td>
159          </tr>
160          <tr>
161            <th>mad</th>
162            <td class="empty" colspan="6"></td>
163            <td class="stall" colspan="2">STALL</td>
164            <td class="read">read</td>
165            <td class="MUL">MUL</td>
166            <td class="ADD">ADD</td>
167            <td class="post">post</td>
168            <td class="write">write</td>
169          </tr>
170          <tr>
171            <th>add</th>
172            <td class="empty" colspan="9"></td>
173            <td class="read">read</td>
174            <td class="stall" colspan="1">STALL</td>
175            <td class="ADD">ADD</td>
176            <td class="post">post</td>
177            <td class="write">write</td>
178            <td class="dummy"></td>
179          </tr>
180        </table>
181        <br> <br> Example where latency decreases:<br>
182<pre class="definition">
183mul     r0, r1, r2
184add     r8, r9, r10
185add     r7, c1, r12
186mad     r1, r0, r4, r5
187mad     r9, r8, r10, r11
188</pre>
189        <table class="timetable">
190          <tr>
191            <th></th>
192            <th>1</th><th>2</th><th>3</th><th>4</th><th>5</th><th>6</th><th>7</th><th>8</th><th>9</th>
193          </tr>
194          <tr>
195            <th>mul</th>
196            <td class="read">read</td>
197            <td class="MUL">MUL</td>
198            <td class="post">post</td>
199            <td class="write">write</td>
200          </tr>
201          <tr>
202            <th>add</th>
203            <td class="empty" colspan="1"></td>
204            <td class="read">read</td>
205            <td class="ADD">ADD</td>
206            <td class="post">post</td>
207            <td class="write">write</td>
208          </tr>
209          <tr>
210            <th>add</th>
211            <td class="empty" colspan="2"></td>
212            <td class="read">read</td>
213            <td class="ADD">ADD</td>
214            <td class="post">post</td>
215            <td class="write">write</td>
216          </tr>
217          <tr>
218            <th>mad</th>
219            <td class="empty" colspan="3"></td>
220            <td class="read">read</td>
221            <td class="MUL">MUL</td>
222            <td class="ADD">ADD</td>
223            <td class="post">post</td>
224            <td class="write">write</td>
225          </tr>
226          <tr>
227            <th>mad</th>
228            <td class="empty" colspan="4"></td>
229            <td class="read">read</td>
230            <td class="MUL">MUL</td>
231            <td class="ADD">ADD</td>
232            <td class="post">post</td>
233            <td class="write">write</td>
234            <td class="dummy"></td>
235          </tr>
236        </table>
237        <br>
238      </p>
239    </div>
240
241    <h2><a name="branch_latency">Latency of Branch Instructions</a></h2>
242    <div class="section">
243      <p>
244        A latency of one to three clock cycles is shown for branch instructions in the above table. A branch instruction has a latency of 1 if it causes the program counter to change by +1; 2 if it causes the program counter to change change by +2; and 3 if it causes the program counter to change by +3.<BR> This is because if the program changes by other than +1, the previously read instruction and the instruction scheduled to execute next are canceled and an instruction check is performed again. Note, however, that only the instruction scheduled to execute next is canceled if the program counter changes by +2. The previously read instruction will be unaffected.<BR> <br> Example:<br>
245<pre class="definition">
246ifb     b0
247  add     r0, r1, r2
248endif
249mul     r0, r1, r2
250mul     r1, r2, r3
251</pre>
252        If b0 is true<br> <br>
253        <table class="timetable">
254          <tr>
255            <th></th>
256            <th>1</th><th>2</th><th>3</th><th>4</th><th>5</th><th>6</th><th>7</th>
257          </tr>
258          <tr>
259            <th>ifb</th>
260            <td class="flow">ifb</td>
261          </tr>
262          <tr>
263            <th>add</th>
264            <td class="empty" colspan="1"></td>
265            <td class="read">read</td>
266            <td class="ADD">ADD</td>
267            <td class="post">post</td>
268            <td class="write">write</td>
269          </tr>
270          <tr>
271            <th>mul</th>
272            <td class="empty" colspan="2"></td>
273            <td class="read">read</td>
274            <td class="MUL">MUL</td>
275            <td class="post">post</td>
276            <td class="write">write</td>
277          </tr>
278          <tr>
279            <th>mul</th>
280            <td class="empty" colspan="3"></td>
281            <td class="read">read</td>
282            <td class="MUL">MUL</td>
283            <td class="post">post</td>
284            <td class="write">write</td>
285            <td class="dummy"></td>
286          </tr>
287        </table>
288        <BR>If b0 is false<br> <br>
289        <table class="timetable">
290          <tr>
291            <th></th>
292            <th>1</th><th>2</th><th>3</th><th>4</th><th>5</th><th>6</th><th>7</th>
293          </tr>
294          <tr>
295            <th>ifb</th>
296            <td class="flow">ifb</td>
297            <td class="stall" colspan="1">STALL</td>
298          </tr>
299          <tr>
300            <th>mul</th>
301            <td class="empty" colspan="2"></td>
302            <td class="read">read</td>
303            <td class="MUL">MUL</td>
304            <td class="post">post</td>
305            <td class="write">write</td>
306          </tr>
307          <tr>
308            <th>mul</th>
309            <td class="empty" colspan="3"></td>
310            <td class="read">read</td>
311            <td class="MUL">MUL</td>
312            <td class="post">post</td>
313            <td class="write">write</td>
314            <td class="dummy"></td>
315          </tr>
316        </table>
317      </p>
318    </div>
319
320
321    <h2><a name="output_result">Output Order of Calculation Results</a></h2>
322    <div class="section">
323      <p>
324        The result of an instruction that executes later never affects the result of an instruction that executes first. Regardless of the duration of latency associated with instructions coming before and after, this is always guaranteed to happen without stalling because the registers used by the instruction coming first are always read before those used by an instruction that comes later.<BR> <br> For example, if a write is made by a later instruction to a source register used by an earlier instruction, the result of the write by the later instruction will never be used as input to the earlier instruction.<br> <br> Example 1<br>
325<pre class="definition">
326exp     r0, r1.x
327mov     r1, c0
328</pre>
329        In the above example, the source register of the earlier instruction, which has high latency, and the destination register of the later instruction, which has low latency, are the same register. (The latency of <CODE>exp</CODE> is 4 clock cycles and that of <CODE>mov</CODE> is 2 clock cycles.)<br>When code of this type is executed, the output result of <CODE>mov</CODE> is not used as input to <CODE>exp</CODE>. (The result of <CODE>mov</CODE> therefore does not affect the calculation made by <CODE>exp</CODE>.)<br> <br> In addition, calculated results are guaranteed to be output in the order that instructions execute.<br> <br> <br>Example 2<br>
330<pre class="definition">
331exp     r0, r1.x
332mov     r0, c0
333</pre>
334        In the above example, the same register is used as a destination register of both the earlier instruction, which has high latency, and the later instruction, which has low latency. When code of this type is executed, the output of <CODE>exp</CODE>, which has high latency, is never performed after the output of <CODE>mov</CODE>.<BR> When this type of register dependency is present, operations are guaranteed without any stalling because the write operation being made by the earlier instruction will be canceled at the point the later instruction is decoded. The write operation of the earlier instruction is canceled when the registers being written to by the earlier and later instructions are found to be the same. Detection and cancellation are performed separately for each component. In other words, write operations of the earlier instruction are cancelled only if both instructions are writing to the same register.<BR> <br>
335        <table class="timetable">
336          <tr>
337            <th></th>
338            <th>1</th><th>2</th><th>3</th><th>4</th><th>5</th>
339          </tr>
340          <tr>
341            <th>exp</th>
342            <td class="read">read</td>
343            <td class="EXP" colspan="2">EXP</td>
344            <td class="post">post</td>
345            <td class="cancel">cancel</td>
346            <td class="dummy"></td>
347          </tr>
348          <tr>
349            <th>mov</th>
350            <td class="empty" colspan="1"></td>
351            <td class="read">read</td>
352            <td class="MOV">mov</td>
353            <td class="write">write</td>
354          </tr>
355        </table>
356        <br> <br> Stalling will not occur when consecutively writing to the same register, as in the above example, even when the register components are the same.<br> <br> Example 3<br>
357<pre class="definition">
358exp     r0.x, r1.x
359mov     r0.y, c0
360</pre>
361        In the above example, r0 is written to twice in a row, but the components written to (r0.x and r0.y) are different so there is no overlap. Although the write to r0.x by <CODE>exp</CODE> is canceled, execution does not stall.<BR> <br>
362        <table class="timetable">
363          <tr>
364            <th></th>
365            <th>1</th><th>2</th><th>3</th><th>4</th><th>5</th>
366          </tr>
367          <tr>
368            <th>exp</th>
369            <td class="read">read</td>
370            <td class="EXP" colspan="2">EXP</td>
371            <td class="post">post</td>
372            <td class="write">write</td>
373            <td class="dummy"></td>
374          </tr>
375          <tr>
376            <th>mov</th>
377            <td class="empty" colspan="1"></td>
378            <td class="read">read</td>
379            <td class="MOV">mov</td>
380            <td class="write">write</td>
381          </tr>
382        </table>
383        <br> <br> Example 4<br>
384<pre class="definition">
385exp     r0.xyz,  r1.x
386mov     r0.xyzw, c0
387</pre>
388        In the example above, the write to r0.xyz by <CODE>exp</CODE> is cancelled. Execution does not stall.<BR> <br>
389        <table class="timetable">
390          <tr>
391            <th></th>
392            <th>1</th><th>2</th><th>3</th><th>4</th><th>5</th>
393          </tr>
394          <tr>
395            <th>exp</th>
396            <td class="read">read</td>
397            <td class="EXP" colspan="2">EXP</td>
398            <td class="post">post</td>
399            <td class="cancel">cancel</td>
400            <td class="dummy"></td>
401          </tr>
402          <tr>
403            <th>mov</th>
404            <td class="empty" colspan="1"></td>
405            <td class="read">read</td>
406            <td class="MOV">mov</td>
407            <td class="write">write</td>
408            <td class="dummy"></td>
409          </tr>
410        </table>
411        <br>
412      </p>
413    </div>
414
415    <h2><a name="output_stall">Stalling Due to Calculation Result Output Timing Conflicts</a></h2>
416    <div class="section">
417      <p>
418        If an instruction having low latency is executed after an instruction having high latency and the two instructions complete at the same time, the calculation result of the instruction that executed later will be output delayed by 1 clock cycle. More than one register cannot be written to at the same time.<BR> <br> Example:<br>
419<pre class="definition">
420exp     r0, r1.x
421mul     r2, c3, r4
422</pre>
423        If the above code is executed, the result calculated for r0 and r2 would appear to be output at the same time, but output to r2 will actually be delayed by 1 clock cycle.<BR> <br>
424        <table class="timetable">
425          <tr>
426            <th></th>
427            <th>1</th><th>2</th><th>3</th><th>4</th><th>5</th><th>6</th>
428          </tr>
429          <tr>
430            <th>exp</th>
431            <td class="read">read</td>
432            <td class="EXP" colspan="2">EXP</td>
433            <td class="post">post</td>
434            <td class="write">write</td>
435            <td class="dummy"></td>
436          </tr>
437          <tr>
438            <th>mul</th>
439            <td class="empty" colspan="1"></td>
440            <td class="read">read</td>
441            <td class="MUL">MUL</td>
442            <td class="post">post</td>
443            <td class="stall" colspan="1">STALL</td>
444            <td class="write">write</td>
445            <td class="dummy"></td>
446          </tr>
447        </table>
448        <br>
449      </p>
450    </div>
451
452    <h2><a name="compete_stall">Stalling Due to Arithmetic Unit Conflicts</a></h2>
453    <div class="section">
454      <p>
455        The <CODE>mad</CODE>, <CODE>dp3</CODE>, <CODE>dp4</CODE>, <CODE>dph</CODE>, and <CODE>add</CODE> instructions contend for access to the arithmetic unit.<br> <br> If these instructions are executed consecutively, if the time they use the arithmetic unit overlaps, latency may increase because the instructions that execute later must wait for earlier instructions to complete execution.<br> The arithmetic unit is used for the first cycle of an <CODE>add</CODE> instruction, the second cycle of a <CODE>mad</CODE> instruction, and the second and third cycles of a <CODE>dp3</CODE>, <CODE>dp4</CODE>, or <CODE>dph</CODE> instruction.<br> <br>
456        <table class="timetable">
457          <tr>
458            <th></th>
459            <th>1</th><th>2</th><th>3</th><th>4</th><th>5</th><th>6</th><th>7</th><th>8</th>
460          </tr>
461          <tr>
462            <th>dp3</th>
463            <td class="read">read</td>
464            <td class="MUL">MUL</td>
465            <td class="ADD">ADD</td>
466            <td class="ADD">ADD</td>
467            <td class="post">post</td>
468            <td class="write">write</td>
469          </tr>
470          <tr>
471            <th>mad</th>
472            <td class="empty"></td>
473            <td class="read">read</td>
474            <td class="MUL">MUL</td>
475            <td class="stall" colspan="1">STALL</td>
476            <td class="ADD">ADD</td>
477            <td class="post">post</td>
478            <td class="write">write</td>
479          </tr>
480          <tr>
481            <th>add</th>
482            <td class="empty" colspan="2"></td>
483            <td class="read">read</td>
484            <td class="stall" colspan="2">STALL</td>
485            <td class="ADD">ADD</td>
486            <td class="post">post</td>
487            <td class="write">write</td>
488            <td class="dummy"></td>
489          </tr>
490        </table>
491        <br> <br>Note, however, that consecutive execution of the <CODE>dp3</CODE>, <CODE>dp4</CODE>, and <CODE>dph</CODE> instructions is an exception. Stalling due to arithmetic unit conflicts does not occur even when the instructions are called consecutively. Stalling does not occur due to arithmetic unit conflicts when multiple <CODE>dp3</CODE> instructions (or <CODE>dp4</CODE> or <CODE>dph</CODE> instructions) are called consecutively, or when some combination of <CODE>dp3</CODE>, <CODE>dp4</CODE>, and <CODE>dph</CODE> are called consecutively.<BR> <br>
492        <table class="timetable">
493          <tr>
494            <th></th>
495            <th>1</th><th>2</th><th>3</th><th>4</th><th>5</th><th>6</th><th>7</th><th>8</th>
496          </tr>
497          <tr>
498            <th>dp3</th>
499            <td class="read">read</td>
500            <td class="MUL">MUL</td>
501            <td class="ADD">ADD</td>
502            <td class="ADD">ADD</td>
503            <td class="post">post</td>
504            <td class="write">write</td>
505          </tr>
506          <tr>
507            <th>dp4</th>
508            <td class="empty" colspan="1"></td>
509            <td class="read">read</td>
510            <td class="MUL">MUL</td>
511            <td class="ADD">ADD</td>
512            <td class="ADD">ADD</td>
513            <td class="post">post</td>
514            <td class="write">write</td>
515          </tr>
516          <tr>
517            <th>dph</th>
518            <td class="empty" colspan="2"></td>
519            <td class="read">read</td>
520            <td class="MUL">MUL</td>
521            <td class="ADD">ADD</td>
522            <td class="ADD">ADD</td>
523            <td class="post">post</td>
524            <td class="write">write</td>
525            <td class="dummy"></td>
526          </tr>
527        </table>
528        <br>
529      </p>
530    </div>
531
532    <h2><a name="dependence_stall">Stalling Due to Instruction Dependencies</a></h2>
533    <div class="section">
534      <p>
535        Sometimes execution stalls due to dependency relationships among the instructions being invoked. This problem occurs when the register storing the calculation result of a given instruction is used as a source register by the instruction that immediately follows. <br> Example:<br>
536<pre class="definition">
537add     r0, r1, r2
538mul     r4, r0, r3
539</pre>
540        If this type of code is executed, execution will stall because the result output to r0 is being used as a source register by the instruction that immediately follows.<BR> <br>
541        <table class="timetable">
542          <tr>
543            <th></th>
544            <th>1</th><th>2</th><th>3</th><th>4</th><th>5</th><th>6</th><th>7</th>
545          </tr>
546          <tr>
547            <th>add</th>
548            <td class="read">read</td>
549            <td class="ADD">ADD</td>
550            <td class="post">post</td>
551            <td class="write">write</td>
552          </tr>
553          <tr>
554            <th>mul</th>
555            <td class="empty" colspan="1"></td>
556            <td class="stall" colspan="2">STALL</td>
557            <td class="read">read</td>
558            <td class="MUL">MUL</td>
559            <td class="post">post</td>
560            <td class="write">write</td>
561            <td class="dummy"></td>
562          </tr>
563        </table>
564        <br> Execution will stall if the registers are the same, even if the components differ.<br> <br> Example:<br>
565<pre class="definition">
566add     r0.x, r1,   r2
567mul     r4,   r0.y, r3
568</pre>
569        With code of this type, the result output to r0.x by the earlier instruction is not accessed by the next instruction, but execution stalls because r0 itself is being accessed (through the use of r0.y).<BR> <br>
570        <table class="timetable">
571          <tr>
572            <th></th>
573            <th>1</th><th>2</th><th>3</th><th>4</th><th>5</th><th>6</th><th>7</th>
574          </tr>
575          <tr>
576            <th>add</th>
577            <td class="read">read</td>
578            <td class="ADD">ADD</td>
579            <td class="post">post</td>
580            <td class="write">write</td>
581          </tr>
582          <tr>
583            <th>mul</th>
584            <td class="empty" colspan="1"></td>
585            <td class="stall" colspan="2">STALL</td>
586            <td class="read">read</td>
587            <td class="MUL">MUL</td>
588            <td class="post">post</td>
589            <td class="write">write</td>
590            <td class="dummy"></td>
591          </tr>
592        </table>
593        <br> <br>If successive writes are made to the same register, the write made by the first instruction will be cancelled (see <a href="#output_result">Output Order of Calculation Results</a> for details) and any subsequent instruction that tries to read the result written by the cancelled instruction later may stall.<br> <br> Example:<br>
594<pre class="definition">
595dp4     r0.x, r1, r2
596mov     r0.x, r1
597mul     r4, r0, r3
598</pre>
599        Here, the write by <CODE>dp4</CODE> will be cancelled because <CODE>dp4</CODE> and <CODE>mov</CODE> both write to the same register and execution of <CODE>mul</CODE> will stall due to the <CODE>dp4</CODE> and <CODE>mov</CODE> instructions.<BR> Execution of <CODE>mul</CODE> stalls until execution of <CODE>dp4</CODE> completes because, as seen from <CODE>mul</CODE>, the latency of <CODE>dp4</CODE>, occurring two instructions before, is larger than that of <CODE>mov</CODE>, occurring one instruction before.<BR> <br>
600        <table class="timetable">
601          <tr>
602            <th></th>
603            <th>1</th><th>2</th><th>3</th><th>4</th><th>5</th><th>6</th><th>7</th><th>8</th><th>9</th>
604          </tr>
605          <tr>
606            <th>dp4</th>
607            <td class="read">read</td>
608            <td class="MUL">MUL</td>
609            <td class="ADD">ADD</td>
610            <td class="ADD">ADD</td>
611            <td class="post">post</td>
612            <td class="cancel">cancel</td>
613          </tr>
614          <tr>
615            <th>mov</th>
616            <td class="empty" colspan="1"></td>
617            <td class="read">read</td>
618            <td class="MOV">mov</td>
619            <td class="write">write</td>
620          </tr>
621          <tr>
622            <th>mul</th>
623            <td class="empty" colspan="2"></td>
624            <td class="stall" colspan="3">STALL</td>
625            <td class="read">read</td>
626            <td class="MUL">MUL</td>
627            <td class="post">post</td>
628            <td class="write">write</td>
629            <td class="dummy"></td>
630          </tr>
631        </table>
632        <br>
633      </p>
634    </div>
635
636    <h2><a name="force_stall">Unconditional Stalls</a></h2>
637    <div class="section">
638      <p>
639        Calling the <CODE>mova</CODE> instruction results in an unconditional stall of 3 clock cycles.<BR> <br> Unlike stalls due to instruction dependencies, stalling occurs unconditionally when <CODE>mova</CODE> is called. Stalling cannot be avoided when a <CODE>mova</CODE> instruction and an instruction that reads an address register written to by that <CODE>mova</CODE> instruction occur consecutively by placing an unrelated instruction (an instruction that uses a register unrelated to either instruction) between them.<br> <br> Example:<br>
640<pre class="definition">
641mova    a0.x, r0
642nop
643nop
644nop
645mov     r1, c[a0.x]
646</pre>
647        Here, a <CODE>mova</CODE> instruction is followed by three consecutive <CODE>nop</CODE> instructions, in turn followed by a <CODE>mov</CODE> instruction that reads the address register that the <CODE>mova</CODE> instruction writes to. Execution stalls at the <CODE>mova</CODE> instruction whether the <CODE>nop</CODE> instructions are included or not.<BR> <br>
648        <table class="timetable">
649          <tr>
650            <th></th>
651            <th>1</th><th>2</th><th>3</th><th>4</th><th>5</th><th>6</th><th>7</th><th>8</th><th>9</th><th>10</th>
652          </tr>
653          <tr>
654            <th>mova</th>
655            <td class="read">read</td>
656            <td class="MOVA">mova</td>
657          </tr>
658          <tr>
659            <th>nop</th>
660            <td class="empty" colspan="1"></td>
661            <td class="stall" colspan="3">STALL</td>
662            <td class="NOP">NOP</td>
663          </tr>
664          <tr>
665            <th>nop</th>
666            <td class="empty" colspan="5"></td>
667            <td class="NOP">NOP</td>
668          </tr>
669          <tr>
670            <th>nop</th>
671            <td class="empty" colspan="6"></td>
672            <td class="NOP">NOP</td>
673          </tr>
674          <tr>
675            <th>mov</th>
676            <td class="empty" colspan="7"></td>
677            <td class="read">read</td>
678            <td class="MOV">mov</td>
679            <td class="write">write</td>
680            <td class="dummy"></td>
681          </tr>
682        </table>
683        <br>
684      </p>
685    </div>
686
687
688  <h2>Revision History</h2>
689  <div class="section">
690    <dl class="history">
691      <dt>2011/12/20</dt>
692      <dd>Initial version.<br />
693      </dd>
694    </dl>
695  </div>
696
697  <hr><p>CONFIDENTIAL</p></body>
698</html>
699