<< Chapter < Page Chapter >> Page >

% prof stuff>stuff.prof

To explore how the prof command works, we have created the following ridiculous little application, loops.c . It contains a main routine and three subroutines for which you can predict the time distribution just by looking at the code.


main () { int l;for (l=0;l<1000;l++) { if (l == 2*(l/2)) foo ();bar(); baz();} }foo (){ int j;for (j=0;j<200;j++) }bar () { int i;for (i=0;i<200;i++); }baz () { int k;for (k=0;k<300;k++); }

Again, you need to compile and link loops with the –p flag, run the program, and then run the prof utility to extract a profile, as follows:


% cc loops.c -p -o loops % ./loops% prof loops>loops.prof

The following example shows what a loops.prof should look like. There are six columns.


%Time Seconds Cumsecs #Calls msec/call Name 56.8 0.50 0.50 1000 0.500 _baz27.3 0.24 0.74 1000 0.240 _bar 15.9 0.14 0.88 500 0.28 _foo0.0 0.00 0.88 1 0. _creat 0.0 0.00 0.88 2 0. _profil0.0 0.00 0.88 1 0. _main 0.0 0.00 0.88 3 0. _getenv0.0 0.00 0.88 1 0. _strcpy 0.0 0.00 0.88 1 0. _write

The columns can be described as follows:

  • %Time Percentage of CPU time consumed by this routine
  • Seconds CPU time consumed by this routine
  • Cumsecs A running total of time consumed by this and all preceding routines in the list
  • Calls The number of times this particular routine was called
  • msec/call Seconds divided by number of calls giving the average length of time taken by each invocation of the routine
  • Name The name of this routine

The top three routines listed are from loops.c itself. You can see an entry for the “main” routine more than halfway down the list. Depending on the vendor, the names of the routines may contain leading or trailing underscores, and there will always be some routines listed you don’t recognize. These are contributions from the C library and possibly the FORTRAN libraries, if you are using FORTRAN. Profiling also introduces some overhead into the run, and often shows up as one or two subroutines in the prof output. In this case, the entry for _profil represents code inserted by the linker for collecting runtime profiling data.

If it was our intention to tune loops , we would consider a profile like the one in the figure above to be a fairly good sign. The lead routine takes 50% of the runtime, so at least there is a chance we could do something with it that would have a significant impact on the overall runtime. (Of course with a program as trivial as loops , there is plenty we can do, since loops does nothing.)

Gprof

Just as it’s important to know how time is distributed when your program runs, it’s also valuable to be able to tell who called who in the list of routines. Imagine, for instance, if something labeled _exp showed up high in the list in the prof output. You might say: “Hmmm, I don’t remember calling anything named exp() . I wonder where that came from.” A call tree helps you find it.

Get Jobilize Job Search Mobile App in your pocket Now!

Get it on Google Play Download on the App Store Now




Source:  OpenStax, High performance computing. OpenStax CNX. Aug 25, 2010 Download for free at http://cnx.org/content/col11136/1.5
Google Play and the Google Play logo are trademarks of Google Inc.

Notification Switch

Would you like to follow the 'High performance computing' conversation and receive update notifications?

Ask