Listing 3. Part of the psprocess output from the optimized version of the loop. The Processor and System Information and Cache Information sections are the same. Index Description Counter Value =================================================== 1 Conditional branch instructions........ 49627213 2 Branch instructions.................... 49971420 3 Conditional branch ins mispredicted....... 97630 4 Conditional branch ins taken........... 49089592 5 Branch target address cache misses......... 3816 6 Requests for excl access to clean cache ln. 820 7 Requests for cache line invalidation.......... 0 8 Requests for cache line intervention....... 2796 9 Requests for excl access to shared cache ln. 494 10 Floating point multiply instructions.......... 0 11 Floating point divide instructions............ 0 12 Floating point instructions........... 189564951 13 Hardware interrupts........................ 2577 14 Total cycles......................... 2471179766 15 Instructions issued................... 513936102 16 Instructions completed................ 509580537 17 Vector/SIMD instructions...................... 0 18 Level 1 data cache accesses........... 372965600 19 Level 1 data cache misses.............. 23010188 20 Level 1 instruction cache accesses... 2769671237 21 Level 1 instruction cache misses........... 2369 22 Level 1 instruction cache reads...... 2746595553 23 Level 1 load misses.................... 25980065 24 Level 1 store misses........................ 995 25 Level 1 cache misses................... 25772544 26 Level 2 data cache reads.............. .25617201 27 Level 2 data cache writes................... 935 28 Level 2 instruction cache accesses......... 2405 29 Level 2 instruction cache reads............ 2652 30 Level 2 cache misses................... 25287572 31 Cycles stalled on any resource....... 2199590592 32 Instruction TLB misses........................ 0 Statistics ================================================== Counting domain.............................. user Multiplexed................................... yes Graduated floating point ins per cycle...... 0.077 Vector ins per cycle.........................0.000 Floating point ins per graduated ins........ 0.372 Vector ins per graduated ins................ 0.000 Floating point ins per L1 data cache access. 0.508 Graduated ins per cycle......................0.206 Issued ins per cycle.........................0.208 Graduated ins per issued ins................ 0.992 Issued ins per L1 ins cache miss....... 216942.213 Graduated ins per L1 ins cache miss.... 215103.646 Level 1 ins cache miss ratio................ 0.000 Level 1 data cache access per graduated ins. 0.732 % floating point ins of all graduated ins.. 37.200 % cycles stalled on any resource........... 89.010 Level 1 ins cache misses per issued ins..... 0.000 Level 1 cache read miss ratio (instruction). 0.000 Level 1 cache miss ratio (data)............. 0.062 Level 1 cache miss ratio (instruction)...... 0.000 Bandwidth used to level 1 cache (MB/s).... 332.792 Bandwidth used to level 2 cache (MB/s).... 326.530 MFLIPS (cycles)............................ 76.493 MFLIPS (wall clock)........................ 66.787 MVOPS (cycles).............................. 0.000 MVOPS (wall clock).......................... 0.000 MIPS (cycles)............................. 205.626 MIPS (wall clock)......................... 179.533 CPU time (seconds).......................... 2.478 Wall clock time (seconds)................... 2.838 % CPU utilization.......................... 87.310