<< Chapter < Page Chapter >> Page >

Then, why didn't the designer of the CPU make such that LDW instruction takes 5 clock cycles to begin with, rather than let the programmer insert 4 NOPs ? The answer is that you can insert other instructions other than NOPs as far as those instructions do not use the result of the LDW instruction above. By doing this, the CPU can execute additional instructions while waiting for the result of the LDW instruction to be valid, greatly reducing the total execution time of the entire program.

Delay slots

In the C6x CPU, it takes exactly one CPU clock cycle to execute each instruction. However, the instructions such as LDW need to access the slow external memory and the results of the load are not availableimmediately at the end of the execution. This delay of the execution results iscalled delay slots .

For example, let's consider loading up the content of memory content at address pointed by A10 to A1 and then moving the loaded data to A2 . You might be tempted to write simple 2 line assembly codeas follows:

1 LDW .D1 *A10, A1 2 MV .D1 A1,A2

What is wrong with the above code? The result of the LDW instruction is not available immediately after LDW is executed. As a consequence, the MV instruction does not copy the desired value of A1 to A2 . To prevent this undesirable execution, we need to make the CPU wait until the resultof the LDW instruction is correctly loaded to A1 before executing the MV instruction. For load instructions, we need extra 4 clock cycles until the loadresults are valid. To make the CPU wait for 4 clock cycles, we need to insert 4 NOP (no operations) instructions between LDW and MV . Each NOP instruction makes the CPU idle for one clock cycle. The resulting code will be likethis:

1 LDW .D1 *A10, A1 2 NOP3 NOP 4 NOP5 NOP 6 MV .D1 A1,A2

or simply you can write

1 LDW .D1 *A10, A1 2 NOP 43 MV .D1 A1,A2

Why didn't the designer of the CPU make such that LDW instruction takes 5 clock cycles to begin with, rather than let the programmer insert 4 NOPs ? The answer is that you can insert other instructions other than NOPs as far as those instructions do not use the result of the LDW instruction above. By doing this, the CPU can execute additional instructions while waiting for the result of the LDW instruction to be valid, greatly reducing the total execution time of the entire program.

Delay slots
Description Instructions Delay slots
Single Cycle All instructions except following 0
Multiply MPY, SMPY etc. 1
Load LDB, LDH, LDW 4
Branch B 5

The functional unit latency indicates how many clock cycles each instruction actually uses afunctional unit. All C6x instructions have 1 functional unit latency, meaning that each functional unit is ready toexecute the next instruction after 1 clock cycle regardless of the delay slots of the instructions. Therefore, thefollowing instructions are valid:

1 LDW .D1 *A10, A4 2 ADD .D1 A1,A2,A3

Get Jobilize Job Search Mobile App in your pocket Now!

Get it on Google Play Download on the App Store Now




Source:  OpenStax, Dsp lab with ti c6x dsp and c6713 dsk. OpenStax CNX. Feb 18, 2013 Download for free at http://cnx.org/content/col11264/1.6
Google Play and the Google Play logo are trademarks of Google Inc.

Notification Switch

Would you like to follow the 'Dsp lab with ti c6x dsp and c6713 dsk' conversation and receive update notifications?

Ask