Although the first
LDW
instruction do
not load the
A4
register correctly
while the
ADD
is executed, the
D1
functional unit becomes available
in the clock cycle right after the one in which
LDW
is executed.
To clarify the execution of instructions with delay slots,
let's think of the following example of the
LDW
instruction. Let's assume
A10 = 0x0100
A2=1
,
and your intent is loading
A9
with the
32-bit word at the address
0x0104
. The
3
MV
instructions are not related to
the
LDW
instruction. They do something
else.
1 LDW .D1 *A10++[A2], A92 MV .L1 A10, A8
3 MV .L1 A1, A104 MV .L1 A1, A2
5 ...
We can ask several interesting questions at this point:
- What is the value loaded to
A8
? That is, in which clock cycle, the address pointer isupdated? - Can we load the address offset register
A2
before theLDW
instruction finishes the actual loading? - Is it legal to load to
A10
before the firstLDW
finishes loading the memory content toA9
? That is, can we change the address pointer before the 4 delay slotselapse?
- Although it takes an extra 4 clock cycles for the
LDW
instruction to load the memory content toA9
, the address pointer and offset registers (A10
andA2
) are read and updated in the clock cycle theLDW
instruction is issued. Therefore, in line 2,A8
is loaded with the updatedA10
, that isA10 = A8 = 0x104
. - Because the
LDW
reads theA10
andA2
registers in the first clock cycle, you are free to change these registers and do not affect the operationof the firstLDW
. - This was already answered above.
Similar theory holds for
MPY
and
B
(when using a register as a branch
address) instructions. The
MPY
reads
in the source values in the first clock cycle and loads themultiplication result after the 2nd clock cycle. For
B
, the address pointer is read in the
first clock cycle, and the actual branching occurs after the5th clock cycle. Thus, after the first clock cycle, you are
free to modify the source or the address pointer registers.For more details, refer Table 3-5 in the instruction set
description or read the description of the individualinstruction.
Addition, subtraction and multiplication
There are several instructions for addition, subtraction and
multiplication on the C6x CPU. The basic instructions are
ADD
,
SUB
, and
MPY
.
ADD
and
SUB
have 0 delay slots (meaning the
results of the operation are immediately available), but the
MPY
has 1 delay slot (the result of the
multiplication is valid after an additional 1 clock cycle).
(Add, subtract, and multiply): Write an assembly program
to compute
( 0000 ef35h + 0000 33dch - 0000
1234h ) * 0000 0007h
Branching and conditional operations
Often you need to control the flow of the program execution
by branching to another block of code. The
B
instruction does the job in the C6x
CPU. The address of the branch can be specified either bydisplacement or stored in a register to be used by the
B
instruction. The
B
instruction has 5 delay slots,
meaning that the actual branch occurs in the 5th clock cycleafter the instruction is executed.