<< Chapter < Page Chapter >> Page >

In this appendix, we take a look at the assembly language produced by a number of different compilers on a number of different architectures. In this survey we revisit some of the issues of CISC versus RISC, and the strengths and weaknesses of different architectures.

For this survey, two roughly identical segments of code were used. The code was a relatively long loop adding two arrays and storing the result in a third array. The loops were written both in FORTRAN and C.

The FORTRAN loop was as follows:


SUBROUTINE ADDEM(A,B,C,N) REAL A(10000),B(10000),C(10000)INTEGER N,I DO 10 I=1,NA(I) = B(I) + C(I) ENDDOEND

The C version was:


for(i=0;i<n;i++) a[i] = b[i]+ c[i];

We have gathered these examples over the years from a number of different compilers, and the results are not particularly scientific. This is not intended to review a particular architecture or compiler version, but rather just to show an example of the kinds of things you can learn from looking at the output of the compiler.

Intel 8088

The Intel 8088 processor used in the original IBM Personal Computer is a very traditional CISC processing system with features severely limited by its transistor count. It has very few registers, and the registers generally have rather specific functions. To support a large memory model, it must set its segment register leading up to each memory operation. This limitation means that every memory access takes a minimum of three instructions. Interestingly, a similar pattern occurs on RISC processors.

You notice that at one point, the code moves a value from the ax register to the bx register because it needs to perform another computation that can only be done in the ax register. Note that this is only an integer computation, as the Intel


mov word ptr -2[bp],0 # bp is I$11: mov ax,word ptr -2[bp]# Load I cmp ax,word ptr 18[bp]# Check I>=N bge $10shl ax,1 # Multiply I by 2 mov bx,ax # Done - now move to bxadd bx,word ptr 10[bp] # bx = Address of B + Offsetmov es,word ptr 12[bp] # Top part of addressmov ax,es: word ptr [bx] # Load B(i)mov bx,word ptr -2[bp] # Load Ishl bx,1 # Multiply I by 2 add bx,word ptr 14[bp]# bx = Address of C + Offset mov es,word ptr 16[bp]# Top part of address add ax,es: word ptr [bx]# Load C(I) mov bx,word ptr -2[bp]# Load I shl bx,1 # Multiply I by 2add bx,word ptr 6[bp] # bx = Address of A + Offsetmov es,word ptr 8[bp] # Top part of addressmov es: word ptr [bx],ax # Store$9: inc word ptr -2[bp]# Increment I in memory jmp $11$10:

Because there are so few registers, the variable I is kept in memory and loaded several times throughout the loop. The inc instruction at the end of the loop actually updates the value in memory. Interestingly, at the top of the loop, the value is then reloaded from memory.

Get Jobilize Job Search Mobile App in your pocket Now!

Get it on Google Play Download on the App Store Now




Source:  OpenStax, High performance computing. OpenStax CNX. Aug 25, 2010 Download for free at http://cnx.org/content/col11136/1.5
Google Play and the Google Play logo are trademarks of Google Inc.

Notification Switch

Would you like to follow the 'High performance computing' conversation and receive update notifications?

Ask