When QEMU executes a guest using software emulation (“TCG”), it translates blocks of guest code to native code and then executes them (the TCG translation process is described in the talk “Towards multi-threaded TCG” by Alex Bennée and Frederic Konrad). If you’re interested in tracing guest code — perhaps in order to look at what code is being run or to benchmark it — it should be possible to instrument the translated blocks. And in fact it is. However I thought I’d document this since it took me ages to work out and it’s not exactly obvious how to do it.
Firstly you have to compile QEMU from source. Before compiling it, read docs/tracing.txt
carefully. Also edit trace-events
and remove the disable
keyword from the following lines in that file:
# TCG related tracing (mostly disabled by default) # cpu-exec.c -disable exec_tb(void *tb, uintptr_t pc) "tb:%p pc=0x%"PRIxPTR -disable exec_tb_nocache(void *tb, uintptr_t pc) "tb:%p pc=0x%"PRIxPTR -disable exec_tb_exit(void *next_tb, unsigned int flags) "tb:%p flags=%x" +exec_tb(void *tb, uintptr_t pc) "tb:%p pc=0x%"PRIxPTR +exec_tb_nocache(void *tb, uintptr_t pc) "tb:%p pc=0x%"PRIxPTR +exec_tb_exit(void *next_tb, unsigned int flags) "tb:%p flags=%x"
Add those trace events to your /tmp/events
file. Also it’s useful to put your full qemu command line into a script, with the additional -trace events=/tmp/events
parameter, so you have a way to rerun the trace.
What you end up with after you’ve done this and run your guest under the tracing test conditions is an enormous trace file. My trace file, simply from a kernel boot, was 3.9 GB.
You can now analyze the log using the scripts/simpletrace.py
script, as described in the QEMU tracing documentation. Again, the output will be enormous. Mine begins like this, I’ve aligned the output to make it a bit easier to read:
$ ./scripts/simpletrace.py trace-events trace-4491 | head exec_tb 0.000 pid=4491 tb=0x7fa869afe010 pc=0xfffffff0 exec_tb_exit 1.953 pid=4491 next_tb=0x0 flags=0x0 exec_tb 15.525 pid=4491 tb=0x7fa869afe090 pc=0xfe05b exec_tb_exit 0.800 pid=4491 next_tb=0x7fa869afe090 flags=0x0 exec_tb 7.215 pid=4491 tb=0x7fa869afe110 pc=0xfe066 exec_tb_exit 0.234 pid=4491 next_tb=0x0 flags=0x0 exec_tb 5.940 pid=4491 tb=0x7fa869afe190 pc=0xfe06a exec_tb_exit 0.222 pid=4491 next_tb=0x0 flags=0x0 exec_tb 2.945 pid=4491 tb=0x7fa869afe210 pc=0xfe070 exec_tb_exit 0.222 pid=4491 next_tb=0x0 flags=0x0
The pid
and *tb
fields are not very interesting, being the QEMU PID and the internal address of the translated blocks.
However the pc
(program counter) field and the timestamp (µs delta from previous trace event) are useful: Notice that the first translated block of guest code is located at guest address 0xffff_fff0
, which is the linear address where x86 CPUs boot from, and the second at 0xf_e05b
(segmented address F000:E05B
) which is the start of the BIOS ROM, so that’s encouraging.
Assuming you’ve now decoded the whole file (the decoded trace takes 5.5GB for me!), how can we turn these raw timestamp deltas and raw linear addresses into useful information? At this point we’re on our own, and I ended up writing Perl scripts to process and analyze the data.
The first Perl script is simple enough, and is just used to associate absolute timestamps with the program counter:
#!/usr/bin/perl -w use warnings; use strict; my $ts = 0; while (<>) { my $ts_delta; my $pc; if (m{^exec_tb(_nocache)? ([-\d.]+).*pc=0x([a-fA-F0-9]+)}) { $ts_delta = $2; $pc = "$3"; } elsif (m{^exec_tb_exit ([-\d.]+)}) { $ts_delta = $1; } elsif (m{^Dropped_Event ([-\d.]+)}) { $ts_delta = $1; } else { die "could not parse output: $_" } $ts += $ts_delta; if (defined $pc) { print "$ts $pc\n"; } }
How do we know what program counter corresponds to what code? For this it’s helpful to know some magic numbers associated with booting PCs:
Address | Meaning |
---|---|
0xfffffff0 | Initial PC |
0xfe05b | BIOS ROM entry point |
0x7c00 | Bootloader entry point Note: not used if you load the kernel using the -kernel option. |
0x1000000 | 64 bit kernel entry point This may be different for your kernel. Use readelf -e vmlinux |
With that knowledge I can put together a timeline of my kernel boot by hand:
Time | Stage |
---|---|
0 | BIOS |
1.16s | Enter kernel |
38.4s | Power off |
Not too interesting. The next step is to reverse the PC addresses into kernel symbols. There used to be a useful script called ksymoops to do this, but it seems to have disappeared, so I wrote my own:
#!/usr/bin/perl -w # # Find everything that looks like a kernel address in the input # and turn it into a symbol using gdb. # # Usage: # ksyms.pl vmlinux < input > output # where 'vmlinux' is the kernel image which must contain debug # symbols (on Fedora, find this in kernel-debuginfo). use warnings; use strict; my $vmlinux = shift; my %cache = (); while (<>) { s{(^|\s)([0-9a-f]{6,16})(\s|$)}{ $1 . lookup ($2) . $3 }gei; print } sub lookup { local $_; my $addr = $_[0]; return $cache{$addr} if exists $cache{$addr}; # Run gdb to lookup this symbol. my $cmd = "gdb -batch -s '$vmlinux' -ex 'info symbol 0x$addr'"; open PIPE, "$cmd 2>&1 |" or die "$cmd: $!"; my $r = <PIPE>; close PIPE; chomp $r; if ($r =~ m/^No symbol/) { # No match, just return the original string, but add the original # string to the cache so we don't do the lookup again. $cache{$addr} = $addr; return $addr; } # Memoize the match and return it. $cache{$addr} = $r; return $r; }
You can run it like this:
$ ksyms.pl /usr/lib/debug/lib/modules/4.4.4-301.fc23.x86_64/vmlinux times.log > symbols.log
Come back tomorrow for further analysis …
Pingback: Tracing QEMU guest execution part 2 | Richard WM Jones
hello, i followed your introduction “tracing QEMU guest excution” and i noticed nothing printed in my trace file. did i miss something ? i used this instruction “qemu-system-x86_64 -m 2G -trace events=/tmp/events ../qemu/test.img” to get trace. i checked my /tmp/events file and configured with trace-backends=simple option. what is wrong my system ? plz kindly help me. Thanks. !
Here is a minimal fully automated example: https://github.com/cirosantilli/linux-kernel-module-cheat/tree/467923860b78bb5d0c787f1433682dfc9c83223a#count-instructions
Also explained at: https://stackoverflow.com/questions/39149446/qemu-trace-backend/46497873#46497873
Could any one help me to understand how to add trace events to the /tmp/events file ??
I used “echo memory_region_ops_read > /tmp/events ” but I didn’t find any trace file after launching qemu !
Hey. Nice post. But how do I switch off the tb caching so that the exec_tb_nocache logs the traces? Any idea?
You don’t need to disable caching – but force nochain by adding -d nochain to your invocation.
what i understand that pc returns the address of tb so the address of *tb did match the adress of what .
other question please tc-ptr adress is the adress of host code in all cases (i want to say before translating or after it match to the code of the host architecture.thanks
Could any one help me to understand how to add trace events to the /tmp/events file ??
I used “echo memory_region_ops_read > /tmp/events ” but I didn’t find any trace file after launching qemu !
Patching trace-events is not needed anymore since 5caa262fdadae08f7c5af7f1328d424823811aaa (Jul 17) I think.
EDIT: actually as of 2.11 they moved to: accel/tcg/trace-events as in https://github.com/cirosantilli/qemu/commit/8dbff6d3728ff78c4da74d667bb4833edade06aa have to use the generated trace-events-all file as in: https://github.com/cirosantilli/linux-kernel-module-cheat/tree/0c2bd829d47f6815500420789ad765c7a125b54c#count-boot-instructions
Hello again!!! 🙂
I have created a “one-liner” addr2line based alternative to ksyms.pl that achieves good performance without memoization: https://github.com/cirosantilli/linux-kernel-module-cheat/blob/f5065183a881664ca8304667b71f970b9502d13f/trace2line
Usage at: https://github.com/cirosantilli/linux-kernel-module-cheat/tree/f5065183a881664ca8304667b71f970b9502d13f#trace-source-lines
Awesome articles. Do you know how to print instruction flow into a file?
Improved version of ksyms.pl script:
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
ksyms_improved.pl
hosted with ❤ by GitHub
It have super-fast performance and more descriptive source code positions output 🙂
Thank you for making this post. It’s super interesting! I tried but it’s slow so I wonder if you have any ways to speed it up?