<< Chapter < Page Chapter >> Page >

Like the Intel, this is not a load-store architecture. The fadds instruction adds a value from memory to a value in a register ( fp0 ) and leaves the result of the addition in the register. Unlike the Intel 8088, we have enough registers to store quite a few of the values used throughout the loop ( I , N , the address of A , B , and C ) in registers to save memory operations.

C on the mc68020

In the next example, we compiled the C version of the loop with the normal optimization ( -O ) turned on. We see the C perspective on arrays in this code. C views arrays as extensions to pointers in C; the loop index advances as an offset from a pointer to the beginning of the array:


! d3 = I ! d1 = Address of A! d2 = Address of B ! d0 = Address of C! a6@(20) = N moveq #0,d3 ! Initialize Ibras L5 ! Jump to End of the loop L1: movl d3,a1 ! Make copy of Imovl a1,d4 ! Again asll #2,d4 ! Multiply by 4 (word size)movl d4,a1 ! Put back in an address register fmoves a1@(0,d2:l),fp0 ! Load B(I)movl a6@(16),d0 ! Get address of C fadds a1@(0,d0:l),fp0 ! Add C(I)fmoves fp0,a1@(0,d1:l) ! Store into A(I) addql #1,d3 ! Increment IL5: cmpl a6@(20),d3bits L1

We first see the value of I being copied into several registers and multiplied by 4 (using a left shift of 2, strength reduction). Interestingly, the value in register a1 is I multiplied by 4. Registers d0 , d1 , and d2 are the addresses of C , B , and A respectively. In the load, add, and store, a1 is the base of the address computation and d0 , d1 , and d2 are added as an offset to a1 to compute each address.

This is a simplistic optimization that is primarily trying to maximize the values that are kept in registers during loop execution. Overall, it’s a relatively literal translation of the C language semantics from C to assembly. In many ways, C was designed to generate relatively efficient code without requiring a highly sophisticated optimizer.

More optimization

In this example, we are back to the FORTRAN version on the MC68020. We have compiled it with the highest level of optimization ( -OLM ) available on this compiler. Now we see a much more aggressive approach to the loop:


! a0 = Address of C(I) ! a1 = Address of B(I)! a2 = Address of A(I) L3:fmoves a1@,fp0 ! Load B(I) fadds a0@,fp0 ! Add C(I)fmoves fp0,a2@ ! Store A(I) addql #4,a0 ! Advance by 4addql #4,a1 ! Advance by 4 addql #4,a2 ! Advance by 4subql #1,d0 ! Decrement I tstl d0bnes L3

First off, the compiler is smart enough to do all of its address adjustment outside the loop and store the adjusted addresses of A , B , and C in registers. We do the load, add, and store in quick succession. Then we advance the array addresses by 4 and perform the subtraction to determine when the loop is complete.

Get Jobilize Job Search Mobile App in your pocket Now!

Get it on Google Play Download on the App Store Now




Source:  OpenStax, High performance computing. OpenStax CNX. Aug 25, 2010 Download for free at http://cnx.org/content/col11136/1.5
Google Play and the Google Play logo are trademarks of Google Inc.

Notification Switch

Would you like to follow the 'High performance computing' conversation and receive update notifications?

Ask