<< Chapter < Page Chapter >> Page >
A = -B + C * D / E

Taken all at once, this statement has four operators and four operands: / , * , + , and - (negate), and B , C , D , and E . This is clearly too much to fit into one quadruple. We need a form with exactly one operator and, at most, two operands per statement. The recast version that follows manages to do this, employing temporary variables to hold the intermediate results:


T1 = D / E T2 = C * T1T3 = -B A = T3 + T2

A workable intermediate language would, of course, need some other features, like pointers. We’re going to suggest that we create our own intermediate language to investigate how optimizations work. To begin, we need to establish a few rules:

  • Instructions consist of one opcode, two operands, and a result. Depending on the instruction, the operands may be empty.
  • Assignments are of the form X := Y op Z , meaning X gets the result of op applied to Y and Z .
  • All memory references are explicit load from or store to “temporaries” t n .
  • Logical values used in branches are calculated separately from the actual branch.
  • Jumps go to absolute addresses.

If we were building a compiler, we’d need to be a little more specific. For our purposes, this will do. Consider the following bit of C code:


while (j<n) { k = k + j * 2;m = j * 2; j++;}

This loop translates into the intermediate language representation shown here:


A:: t1 := j t2 := nt3 := t1<t2 jmp (B) t3jmp (C) TRUEB:: t4 := k t5 := jt6 := t5 * 2 t7 := t4 + t6k := t7 t8 := jt9 := t8 * 2 m := t9t10 := j t11 := t10 + 1j := t11 jmp (A) TRUEC::

Each C source line is represented by several IL statements. On many RISC processors, our IL code is so close to machine language that we could turn it directly into object code. See [link] for some examples of machine code translated directly from intermediate language. Often the lowest optimization level does a literal translation from the intermediate language to machine code. When this is done, the code generally is very large and performs very poorly. Looking at it, you can see places to save a few instructions. For instance, j gets loaded into temporaries in four places; surely we can reduce that. We have to do some analysis and make some optimizations.

Basic blocks

After generating our intermediate language, we want to cut it into basic blocks . These are code sequences that start with an instruction that either follows a branch or is itself a target for a branch. Put another way, each basic block has one entrance (at the top) and one exit (at the bottom). [link] represents our IL code as a group of three basic blocks. Basic blocks make code easier to analyze. By restricting flow of control within a basic block from top to bottom and eliminating all the branches, we can be sure that if the first statement gets executed, the second one does too, and so on. Of course, the branches haven’t disappeared, but we have forced them outside the blocks in the form of the connecting arrows — the flow graph .

Intermediate language divided into basic blocks

This figure is comprised of three blocks containing lines of code. The first reads A : : as a title, then the first column reads t1, t2, t3, jmp. The second column reads := j, := n, := t1 less than t2, (B) t3. The second block contains two items that fit the same columns as in the first block. In the first column is jmp, and in the second, (C) TRUE. In the third block are the same columns, this time headed as B : :. The first column reads, t4, t5, t6, t7, k, t8, t9, m, t10, t11, j, jmp. The second column reads := k, := j, := t5 * 2, := t4 +t6, := t7, := j, := t8 * 2, := t9, := j, := t10 + 1, := t11, (A) TRUE. There is an arrow pointing from the first block to the second block, and another arrow pointing from the first block to the third block. There is an arrow pointing from the end of the third block to the beginning of the first block. And there is a dashed arrow pointing out from the right of the second block straight down away from the blocks.

We are now free to extract information from the blocks themselves. For instance, we can say with certainty which variables a given block uses and which variables it defines (sets the value of ). We might not be able to do that if the block contained a branch. We can also gather the same kind of information about the calculations it performs. After we have analyzed the blocks so that we know what goes in and what comes out, we can modify them to improve performance and just worry about the interaction between blocks.

Get Jobilize Job Search Mobile App in your pocket Now!

Get it on Google Play Download on the App Store Now




Source:  OpenStax, High performance computing. OpenStax CNX. Aug 25, 2010 Download for free at http://cnx.org/content/col11136/1.5
Google Play and the Google Play logo are trademarks of Google Inc.

Notification Switch

Would you like to follow the 'High performance computing' conversation and receive update notifications?

Ask