This page is optimized for mobile devices, if you would prefer the desktop version just click here

C6x assembly programming  (Page 10/11)

On the C6x processor, the instruction fetch consists of 4 phases; generate fetch address (F1), send address to memory(F2), wait for data (F3), and read opcode from memory (F4). Decoding consists of 2 phases; dispatching to functional units(D1) and decoding (D2). The execution step may consist of up to 6 phases (E1 to E6) depending on the instructions. Forexample, the multiply ( MPY ) instructions has 1 delay resulting in 2 execution phases. Similarly, load( LDx ) and branch ( B ) instructions have 4 and 5 delays respectively.

When the outcome of an instruction is used by the next instruction, an appropriate number of NOP s (no operation or delay) must be added after multiply (one NOP ), load (four NOP s, or NOP 4 ), and branch (five NOP s, or NOP 5 ) instructions in order to allow the pipeline to operate properly. Otherwise, before the outcomeof the current instruction is available (which is to be used by the next instruction), the next instructions are executedby the pipeline, generating undesired results. The following code is an example of pipelined code with NOP s inserted:

1 MVK 40,A2 2 loop: LDH *A5++,A03 LDH *A6++,A1 4 NOP 45 MPY A0,A1,A3 6 NOP7 ADD A3,A4,A4 8 SUB A2,1,A29 [A2] B loop10 NOP 5 11 STH A4,*A7

In line 4, we need 4 NOP s because the A1 is loaded by the LDH instruction in line 3 with 4 delays. After 4 delays, the value of A1 is available to be used in the MPY A0,A1,A3 in line 5. Similarly, we need 5 delays after the [A2] B loop instruction in line 9 to prevent the execution of STH A4,*A7 before branching occurs.

The C6x Very Large Instruction Word (VLIW) architecture, several instructions are captured and processedsimultaneously. This is referred to as a Fetch Packet (FP). This Fetch Packet allows C6x to fetch eight instructionssimultaneously from on-chip memory. Among the 8 instructions fetched at the same time, multiple of them can be executed atthe same time if they do not use same CPU resources at the same time. Because the CPU has 8 separate functional units,maximum 8 instructions can be executed in parallel, although the type of parallel instructions are limited because theymust not conflict each other in using CPU resources. In assembly listing, parallel instructions are indicated bydouble pipe symbols ( || ). When writing assembly code, by designing code to maximize parallel execution ofinstructions (through proper functional unit assignments, etc. ) the execution cycle of the code can be reduced.

Parallel instructions and constraints

We have seen that C62x CPU has 8 functional units. Each assembly instruction is executed in one of these 8 functionalunits, and it takes exactly one clock cycle for the execution. Then, while one instruction is being executed inone of the functional units, what are other 7 functional units doing? Can other functional units execute other instructionsat the same time?

<< Chapter < Page Page > Chapter >>

Read also:

OpenStax, Dsp lab with ti c6x dsp and c6713 dsk. OpenStax CNX. Feb 18, 2013 Download for free at http://cnx.org/content/col11264/1.6
Google Play and the Google Play logo are trademarks of Google Inc.
Jobilize.com uses cookies to ensure that you get the best experience. By continuing to use Jobilize.com web-site, you agree to the Terms of Use and Privacy Policy.