This chapter describes a sample session when using some of the instrumentation tools. Refer to the command descriptions for details on each individual command.
As a basis for this example, the Coremark www.eembc.org/coremark/ benchmark will be run and analyzed in Simics. The source code is downloadable with an Apache license and very easy to compile and run. In this example Coremark will be run on a x86 QSP system running Linux.
The Coremark binary was compiled on standard x86 Linux host system and in this example we are using the simics-agent to transfer the binary to the target system. The script in figure 12 shows how the binary is uploaded to the target, enabling instrumentation only for the coremark executable and starting coremark and waiting for it to be finished.
run-command-file "targets/qsp-x86/firststeps.simics" script-branch { local $con = board.serconsole.con local $bin = coremark.exe local $params = " 0x0 0x0 0x66 10000 7 1 2000" # run1 parameters (10000 iterations) local $prompt = "# " # Wait for a linux prompt to appear and then upload our binary # to the target system using simics-agent bp.console_string.wait-then-write $con $prompt "simics-agent &\n" $am = (start-agent-manager) $m = ($am.connect-to-agent) $m.change-directory "/root" $j = ($m.upload -executable $bin) $m.wait-for-job $j # Create and connect instrumentation tools $ihn = (new-instruction-histogram -connect-all view = x86-normalized) $ihs = (new-instruction-histogram -connect-all view = size) $bp = (new-x86-branch-profiler -connect-all) $mp = (new-memory-profiler -connect-all -read-logical -write-logical -execute-logical) # Only let the instrumentation tools monitor our process $pattern ="name='" + $bin + "'" $pf = (new-process-filter software-component = board.software pattern = $pattern) $ihn.add-filter filter = $pf $ihs.add-filter filter = $pf $bp.add-filter filter = $pf $mp.add-filter filter = $pf # Start the process, wait for it to finish, and then stop simulation $con.input "\n" bp.console_string.wait-then-write $con $prompt ("./" + $bin + $params + "\n") $con.bp-wait-for-string $prompt stop } run
The instrumentation data gathered by these tools will be analyzed a bit in the next chapters. To get a bit more readable big numbers, the following commands splits the numbers with underscores on logical units; digit-grouping 10 3 for decimal values and digit-grouping 16 4 for hexadecimal values.
First lets take a look of which instructions that are most commonly executed shown in figure 13.
simics> ihist0.histogram max = 10 ┌─────────┬────────────────────────────────┬─────────────┬───────┬───────────┐ │ Row # │ x86 normalized │ Count │ Count%│Accumulated│ │ │ │ │ │ Count% │ ├─────────┼────────────────────────────────┼─────────────┼───────┼───────────┤ │ 1│mov r32,r32 │ 330_467_974│ 9.60%│ 9.60%│ │ 2│jne imm │ 308_597_223│ 8.96%│ 18.56%│ │ 3│mov r64,r64 │ 204_178_395│ 5.93%│ 24.49%│ │ 4│je imm │ 190_302_166│ 5.53%│ 30.01%│ │ 5│add r32,r32 │ 142_190_315│ 4.13%│ 34.14%│ │ 6│test r64,r64 │ 140_221_077│ 4.07%│ 38.21%│ │ 7│mov r64,qword ptr [r64] │ 139_282_002│ 4.04%│ 42.26%│ │ 8│and r32,imm │ 130_455_258│ 3.79%│ 46.05%│ │ 9│cmp r8,imm │ 123_723_347│ 3.59%│ 49.64%│ │ 10│movsx r32,word ptr [r64+r64*imm]│ 123_120_000│ 3.58%│ 53.22%│ ├─────────┼────────────────────────────────┼─────────────┼───────┼───────────┤ │# 10/425 │ │ │ │ │ │Sum (all)│ │3_443_579_955│100.00%│ │ │Sum shown│ │1_832_537_757│ 53.22%│ │ └─────────┴────────────────────────────────┴─────────────┴───────┴───────────┘
simics> ihist0.histogram sort-on-column = "x86 normalized" instruction-regexp = ".*x128" Table reduced from 425 to 29 rows ┌─────┬──────────────────────────────────┬─────┐ │Row #│ x86 normalized │Count│ ├─────┼──────────────────────────────────┼─────┤ │ 1│cvtsi2sd(64) x128,r64 │ 5│ │ 2│divsd x128,[rip+imm] │ 4│ │ 3│divsd x128,x128 │ 1│ │ 4│movapd x128,x128 │ 1│ │ 5│movaps xmmword ptr [r64+imm],x128 │ 16│ │ 6│movd r64,x128 │ 10│ │ 7│movd x128,r32 │ 42│ │ 8│movd x128,r64 │ 6│ │ 9│movdqa x128,x128 │ 71│ │ 10│movdqa x128,xmmword ptr [r64+imm] │ 1│ │ 11│movdqu x128,xmmword ptr [r64+imm] │ 45│ │ 12│movdqu x128,xmmword ptr [r64] │ 48│ │ 13│movsd [r64+imm],x128 │ 1│ │ 14│movsd [r64-imm],x128 │ 2│ │ 15│movsd x128,[r64+imm] │ 1│ │ 16│movsd x128,[r64] │ 2│ │ 17│movsd x128,[rip+imm] │ 1│ │ 18│pcmpeqb x128,x128 │ 178│ │ 19│pcmpeqb x128,xmmword ptr [r64+imm]│ 9│ │ 20│pcmpeqb x128,xmmword ptr [r64] │ 1│ │ 21│pminub x128,[r64+imm] │ 3│ │ 22│pmovmskb r32,x128 │ 103│ │ 23│por x128,x128 │ 85│ │ 24│pshufd x128,x128,imm │ 41│ │ 25│punpcklbw x128,x128 │ 41│ │ 26│punpcklwd x128,x128 │ 41│ │ 27│pxor x128,x128 │ 68│ │ 28│ucomisd x128,[rip+imm] │ 1│ │ 29│ucomisd x128,x128 │ 1│ ├─────┼──────────────────────────────────┼─────┤ │Sum │ │ 829│ └─────┴──────────────────────────────────┴─────┘
simics> ihist1.histogram ┌─────┬───────┬─────────────┬───────┬───────────┐ │Row #│ Size │ Count │ Count%│Accumulated│ │ │(bytes)│ │ │ Count% │ ├─────┼───────┼─────────────┼───────┼───────────┤ │ 1│ 3│1_552_922_442│ 45.10%│ 45.10%│ │ 2│ 2│ 867_469_439│ 25.19%│ 70.29%│ │ 3│ 4│ 575_089_349│ 16.70%│ 86.99%│ │ 4│ 5│ 253_701_408│ 7.37%│ 94.35%│ │ 5│ 6│ 145_314_324│ 4.22%│ 98.57%│ │ 6│ 1│ 34_766_864│ 1.01%│ 99.58%│ │ 7│ 7│ 9_451_053│ 0.27%│ 99.86%│ │ 8│ 8│ 4_273_215│ 0.12%│ 99.98%│ │ 9│ 9│ 550_962│ 0.02%│ 100.00%│ │ 10│ 10│ 40_428│ 0.00%│ 100.00%│ │ 11│ 11│ 456│ 0.00%│ 100.00%│ │ 12│ 12│ 14│ 0.00%│ 100.00%│ ├─────┼───────┼─────────────┼───────┼───────────┤ │Sum │ │3_443_579_954│100.00%│ │ └─────┴───────┴─────────────┴───────┴───────────┘ Average size of executed instruction: 3.187071 (25 bits)
The branch-profiler will report more details on the branch instructions shown in figure 16.
simics> bprof0.histogram max = 15 ┌─────────┬───────────┬───────────┬───────────┬───────────┬───────┬───────────┐ │ Row # │ Branch │ Taken │ Non-taken│ Total │ Total%│Accumulated│ │ │Instruction│ │ │ │ │ Total% │ ├─────────┼───────────┼───────────┼───────────┼───────────┼───────┼───────────┤ │ 1│jne │273_402_780│ 35_194_443│308_597_223│ 47.44%│ 47.44%│ │ 2│je │ 10_877_292│179_424_874│190_302_166│ 29.25%│ 76.69%│ │ 3│jmp │ 34_532_902│ -│ 34_532_902│ 5.31%│ 82.00%│ │ 4│ja │ 6_862_422│ 15_880_527│ 22_742_949│ 3.50%│ 85.50%│ │ 5│call │ 20_884_324│ -│ 20_884_324│ 3.21%│ 88.71%│ │ 6│ret │ 20_884_320│ -│ 20_884_320│ 3.21%│ 91.92%│ │ 7│jbe │ 14_880_208│ 2_120_535│ 17_000_743│ 2.61%│ 94.53%│ │ 8│jg │ 6_640_020│ 6_320_100│ 12_960_120│ 1.99%│ 96.52%│ │ 9│jb │ 7_021_647│ 880_696│ 7_902_343│ 1.21%│ 97.74%│ │ 10│js │ 3_991_228│ 2_330_079│ 6_321_307│ 0.97%│ 98.71%│ │ 11│jle │ 2_665_772│ 2_280_227│ 4_945_999│ 0.76%│ 99.47%│ │ 12│jae │ 360_746│ 3_080_521│ 3_441_267│ 0.53%│ 100.00%│ │ 13│jns │ 29│ 10│ 39│ 0.00%│ 100.00%│ │ 14│jl │ 14│ 4│ 18│ 0.00%│ 100.00%│ │ 15│jcxz │ 0│ 0│ 0│ 0.00%│ 100.00%│ ├─────────┼───────────┼───────────┼───────────┼───────────┼───────┼───────────┤ │# 15/22 │ │ │ │ │ │ │ │Sum (all)│ │403_003_704│247_512_016│650_515_720│100.00%│ │ │Sum shown│ │403_003_704│247_512_016│650_515_720│100.00%│ │ └─────────┴───────────┴───────────┴───────────┴───────────┴───────┴───────────┘
simics> mprof0.profile view = execute-logical max = 10 ┌─────────┬────────────────┬────────────────┬─────────────┬───────┬───────────┐ │ Row # │ Start │ Stop │ Count │ Count%│Accumulated│ │ │ │ │ │ │ Count% │ ├─────────┼────────────────┼────────────────┼─────────────┼───────┼───────────┤ │ 1│ 0x0040_1000│ 0x0040_1fff│2_454_100_308│ 71.27%│ 71.27%│ │ 2│ 0x0040_2000│ 0x0040_2fff│ 909_181_060│ 26.40%│ 97.67%│ │ 3│ 0x0040_0000│ 0x0040_0fff│ 80_070_862│ 2.33%│ 99.99%│ │ 4│0x7ff1_b614_4000│0x7ff1_b614_4fff│ 56_044│ 0.00%│ 100.00%│ │ 5│0x7ff1_b65d_c000│0x7ff1_b65d_cfff│ 48_656│ 0.00%│ 100.00%│ │ 6│0x7ff1_b65d_e000│0x7ff1_b65d_efff│ 29_808│ 0.00%│ 100.00%│ │ 7│0x7ff1_b65e_b000│0x7ff1_b65e_bfff│ 22_112│ 0.00%│ 100.00%│ │ 8│0x7ff1_b65d_b000│0x7ff1_b65d_bfff│ 15_855│ 0.00%│ 100.00%│ │ 9│0x7ff1_b65e_3000│0x7ff1_b65e_3fff│ 8_683│ 0.00%│ 100.00%│ │ 10│0x7ff1_b609_b000│0x7ff1_b609_bfff│ 7_780│ 0.00%│ 100.00%│ ├─────────┼────────────────┼────────────────┼─────────────┼───────┼───────────┤ │# 10/90 │ │ │ │ │ │ │Sum (all)│ │ │3_443_579_954│100.00%│ │ │Sum shown│ │ │3_443_541_168│100.00%│ │ └─────────┴────────────────┴────────────────┴─────────────┴───────┴───────────┘
simics> mprof0.profile view = execute-logical granularity = 1 max = 10 ┌──────────┬───────────┬───────────┬─────────────┬───────┬───────────┐ │ Row # │ Start │ Stop │ Count │ Count%│Accumulated│ │ │ │ │ │ │ Count% │ ├──────────┼───────────┼───────────┼─────────────┼───────┼───────────┤ │ 1│0x0040_1333│0x0040_1333│ 61_200_000│ 1.78%│ 1.78%│ │ 2│0x0040_1336│0x0040_1336│ 61_200_000│ 1.78%│ 3.55%│ │ 3│0x0040_1339│0x0040_1339│ 61_200_000│ 1.78%│ 5.33%│ │ 4│0x0040_133c│0x0040_133c│ 61_200_000│ 1.78%│ 7.11%│ │ 5│0x0040_133f│0x0040_133f│ 61_200_000│ 1.78%│ 8.89%│ │ 6│0x0040_1330│0x0040_1330│ 59_160_000│ 1.72%│ 10.60%│ │ 7│0x0040_112a│0x0040_112a│ 29_160_000│ 0.85%│ 11.45%│ │ 8│0x0040_112d│0x0040_112d│ 29_160_000│ 0.85%│ 12.30%│ │ 9│0x0040_1950│0x0040_1950│ 29_160_000│ 0.85%│ 13.14%│ │ 10│0x0040_1953│0x0040_1953│ 29_160_000│ 0.85%│ 13.99%│ ├──────────┼───────────┼───────────┼─────────────┼───────┼───────────┤ │# 10/12542│ │ │ │ │ │ │Sum (all) │ │ │3_443_579_954│100.00%│ │ │Sum shown │ │ │ 481_800_000│ 13.99%│ │ └──────────┴───────────┴───────────┴─────────────┴───────┴───────────┘
For the same coremark run, now examine a bit on how many data accesses that happened, using the memory-profile tool again shown in figure 19.
simics> mprof0.profile view = read-logical max = 10 ┌─────────┬────────────────┬────────────────┬─────────────┬───────┬───────────┐ │ Row # │ Start │ Stop │ Count │ Count%│Accumulated│ │ │ │ │ │ │ Count% │ ├─────────┼────────────────┼────────────────┼─────────────┼───────┼───────────┤ │ 1│ 0x0060_5000│ 0x0060_5fff│2_064_749_345│ 76.15%│ 76.15%│ │ 2│0x7fff_15e5_e000│0x7fff_15e5_efff│ 646_513_938│ 23.84%│ 99.99%│ │ 3│ 0x0000│ 0x0fff│ 79_652│ 0.00%│ 99.99%│ │ 4│0x7ff1_b67f_3000│0x7ff1_b67f_3fff│ 48_740│ 0.00%│ 99.99%│ │ 5│0x7fff_15e5_d000│0x7fff_15e5_dfff│ 24_501│ 0.00%│ 99.99%│ │ 6│0x7ff1_b67f_6000│0x7ff1_b67f_6fff│ 12_934│ 0.00%│ 99.99%│ │ 7│0x7ff1_b67f_0000│0x7ff1_b67f_0fff│ 12_375│ 0.00%│ 100.00%│ │ 8│0x7ff1_b63c_6000│0x7ff1_b63c_6fff│ 10_768│ 0.00%│ 100.00%│ │ 9│0x7ff1_b67f_5000│0x7ff1_b67f_5fff│ 5_678│ 0.00%│ 100.00%│ │ 10│0x7ff1_b63c_4000│0x7ff1_b63c_4fff│ 4_988│ 0.00%│ 100.00%│ ├─────────┼────────────────┼────────────────┼─────────────┼───────┼───────────┤ │# 10/88 │ │ │ │ │ │ │Sum (all)│ │ │2_711_569_260│100.00%│ │ │Sum shown│ │ │2_711_462_919│100.00%│ │ └─────────┴────────────────┴────────────────┴─────────────┴───────┴───────────┘
simics> mprof0.profile view = write-logical max = 10 ┌─────────┬─────────────────────┬─────────────────────┬─────────────┬───────┬───────────┐ │ Row # │ Start │ Stop │ Count │ Count%│Accumulated│ │ │ │ │ │ │ Count% │ ├─────────┼─────────────────────┼─────────────────────┼─────────────┼───────┼───────────┤ │ 1│ 0x0060_5000│ 0x0060_5fff│ 595_005_622│ 53.23%│ 53.23%│ │ 2│ 0x7fff_15e5_e000│ 0x7fff_15e5_efff│ 522_602_567│ 46.75%│ 99.99%│ │ 3│0xffff_8801_f35a_f000│0xffff_8801_f35a_ffff│ 81_720│ 0.01%│ 99.99%│ │ 4│ 0x7fff_15e5_d000│ 0x7fff_15e5_dfff│ 22_052│ 0.00%│ 100.00%│ │ 5│ 0x7ff1_b63c_6000│ 0x7ff1_b63c_6fff│ 5_924│ 0.00%│ 100.00%│ │ 6│ 0x7ff1_b602_3000│ 0x7ff1_b602_3fff│ 4_093│ 0.00%│ 100.00%│ │ 7│ 0x7ff1_b67f_3000│ 0x7ff1_b67f_3fff│ 3_690│ 0.00%│ 100.00%│ │ 8│ 0x7ff1_b65d_2000│ 0x7ff1_b65d_2fff│ 2_893│ 0.00%│ 100.00%│ │ 9│ 0x7ff1_b63c_4000│ 0x7ff1_b63c_4fff│ 2_712│ 0.00%│ 100.00%│ │ 10│ 0x7ff1_b67e_e000│ 0x7ff1_b67e_efff│ 2_512│ 0.00%│ 100.00%│ ├─────────┼─────────────────────┼─────────────────────┼─────────────┼───────┼───────────┤ │# 10/31 │ │ │ │ │ │ │Sum (all)│ │ │1_117_752_782│100.00%│ │ │Sum shown│ │ │1_117_733_785│100.00%│ │ └─────────┴─────────────────────┴─────────────────────┴─────────────┴───────┴───────────┘
Not very surprisingly, the writes goes to the same two pages, but writes are less frequent than reads. And finally say that we are interested in the distribution of the writes on the most commonly read stack-page, this is shown in figure 21.
simics> mprof0.profile view = read-logical start = 0x7fff_15e5_e000 stop = 0x7fff_15e5_efff granularity = 128 sort-on-column = Start ┌─────┬────────────────┬────────────────┬───────────┐ │Row #│ Start │ Stop │ Count │ ├─────┼────────────────┼────────────────┼───────────┤ │ 1│0x7fff_15e5_e000│0x7fff_15e5_e07f│ 700│ │ 2│0x7fff_15e5_e080│0x7fff_15e5_e0ff│ 9_029│ │ 3│0x7fff_15e5_e100│0x7fff_15e5_e17f│ 14_585│ │ 4│0x7fff_15e5_e180│0x7fff_15e5_e1ff│ 17_138│ │ 5│0x7fff_15e5_e200│0x7fff_15e5_e27f│ 10_315│ │ 6│0x7fff_15e5_e280│0x7fff_15e5_e2ff│414_573_166│ │ 7│0x7fff_15e5_e300│0x7fff_15e5_e37f│171_369_107│ │ 8│0x7fff_15e5_e380│0x7fff_15e5_e3ff│ 56_832_627│ │ 9│0x7fff_15e5_e400│0x7fff_15e5_e47f│ 1_004_298│ │ 10│0x7fff_15e5_e480│0x7fff_15e5_e4ff│ 2_681_185│ │ 11│0x7fff_15e5_e500│0x7fff_15e5_e57f│ 746│ │ 12│0x7fff_15e5_e580│0x7fff_15e5_e5ff│ 408│ │ 13│0x7fff_15e5_e600│0x7fff_15e5_e67f│ 240│ │ 14│0x7fff_15e5_e680│0x7fff_15e5_e6ff│ 220│ │ 15│0x7fff_15e5_e700│0x7fff_15e5_e77f│ 76│ │ 16│0x7fff_15e5_e780│0x7fff_15e5_e7ff│ 98│ ├─────┼────────────────┼────────────────┼───────────┤ │Sum │ │ │646_513_938│ └─────┴────────────────┴────────────────┴───────────┘