This page is optimized for mobile devices, if you would prefer the desktop version just click here

Assembly language  (Page 3/8)

Like the Intel, this is not a load-store architecture. The fadds instruction adds a value from memory to a value in a register ( fp0 ) and leaves the result of the addition in the register. Unlike the Intel 8088, we have enough registers to store quite a few of the values used throughout the loop ( I , N , the address of A , B , and C ) in registers to save memory operations.

C on the mc68020

In the next example, we compiled the C version of the loop with the normal optimization ( -O ) turned on. We see the C perspective on arrays in this code. C views arrays as extensions to pointers in C; the loop index advances as an offset from a pointer to the beginning of the array:


! d3 = I ! d1 = Address of A! d2 = Address of B ! d0 = Address of C! a6@(20) = N moveq #0,d3 ! Initialize Ibras L5 ! Jump to End of the loop L1: movl d3,a1 ! Make copy of Imovl a1,d4 ! Again asll #2,d4 ! Multiply by 4 (word size)movl d4,a1 ! Put back in an address register fmoves a1@(0,d2:l),fp0 ! Load B(I)movl a6@(16),d0 ! Get address of C fadds a1@(0,d0:l),fp0 ! Add C(I)fmoves fp0,a1@(0,d1:l) ! Store into A(I) addql #1,d3 ! Increment IL5: cmpl a6@(20),d3bits L1

We first see the value of I being copied into several registers and multiplied by 4 (using a left shift of 2, strength reduction). Interestingly, the value in register a1 is I multiplied by 4. Registers d0 , d1 , and d2 are the addresses of C , B , and A respectively. In the load, add, and store, a1 is the base of the address computation and d0 , d1 , and d2 are added as an offset to a1 to compute each address.

This is a simplistic optimization that is primarily trying to maximize the values that are kept in registers during loop execution. Overall, it’s a relatively literal translation of the C language semantics from C to assembly. In many ways, C was designed to generate relatively efficient code without requiring a highly sophisticated optimizer.

More optimization

In this example, we are back to the FORTRAN version on the MC68020. We have compiled it with the highest level of optimization ( -OLM ) available on this compiler. Now we see a much more aggressive approach to the loop:


! a0 = Address of C(I) ! a1 = Address of B(I)! a2 = Address of A(I) L3:fmoves a1@,fp0 ! Load B(I) fadds a0@,fp0 ! Add C(I)fmoves fp0,a2@ ! Store A(I) addql #4,a0 ! Advance by 4addql #4,a1 ! Advance by 4 addql #4,a2 ! Advance by 4subql #1,d0 ! Decrement I tstl d0bnes L3

First off, the compiler is smart enough to do all of its address adjustment outside the loop and store the adjusted addresses of A , B , and C in registers. We do the load, add, and store in quick succession. Then we advance the array addresses by 4 and perform the subtraction to determine when the loop is complete.

<< Chapter < Page Page > Chapter >>

Read also:

OpenStax, High performance computing. OpenStax CNX. Aug 25, 2010 Download for free at http://cnx.org/content/col11136/1.5
Google Play and the Google Play logo are trademarks of Google Inc.
Jobilize.com uses cookies to ensure that you get the best experience. By continuing to use Jobilize.com web-site, you agree to the Terms of Use and Privacy Policy.