Tag Archives: linux

Tracing QEMU guest execution part 4

The final two questions that I posed last time were to do with constructing a timeline of what this guest is spending time on.

We can easily see system calls in the trace log, and we can also see when a kernel function is entered the first time (indicating that a new bit of the kernel is now running), and I wrote a Perl script to analyze that. That gave me a 115K line log file from which I did the rest of the analysis by hand to generate a timeline.

I would reproduce it here, but the results aren’t very enlightening. In particular I doubt it’s more interesting that what you can get by reading the kernel printk’s from a boot log.

What is my conclusion after using these techniques to analyze a booting guest? Here I go:

  • It’s clunky and undocumented. Hopefully this series should help a little.
  • It would be much more powerful with stack traces. It should be possible to get them from QEMU, at least in theory, but it’s a lot of work to do so.
  • It would be much more powerful if we could analyze into kernel modules and userspace.
  • More tooling around this might make it more bearable.


Filed under Uncategorized

Tracing QEMU guest execution part 3

In the previous post I posed three questions about my detailed function-level trace of the kernel booting under QEMU. The first one was Which kernel functions consume the most time?

We don’t have stack traces, although that would be a good direction for future work. So if a function “A” calls another function “B” like this:

  --> calls B
  <-- B returns

then we’re going to assign just the time in the two parts of “A” to “A”. In other words, “A” doesn’t inherit the time taken running “B”. (Except if “B” is inlined, in which case the trace cannot distinguish which code is in “A” and which is in “B”).

The other limitation is lack of insight into what kernel modules are doing. This is very difficult to solve: compiling kernel modules into the kernel proper would change what I’m trying to measure considerably.

Given those limitations, I wrote a Perl script to do all that from my previous trace. I’m not going to bother reproducing the script here because it’s quite simple, but also very specific to the trace format I’m using. If you got this far, you’ll have no trouble writing your own analysis tools.

The results are at the end of the post. For each function that was called, I had a look into the kernel code to find out what it seems to be doing, and those notes are below.

  • sha256_transform is the SHA256 function. While I’m not sure what it is being used for (some kind of module signature check seems to be the most likely explanation), the more interesting thing is that we’re not using any specialized version of the function (eg. with AVX suppport). That’s easily explained: we’re using TCG, not KVM, so no host processor features are available. However on a real boot we would be using AVX, so the function should take a lot less time, and I think we can discount it.
  • native_safe_halt is the function which halts the processor when it is idle (eg. waiting for an interrupt). Is it worrying that we spend 1/50th of the time not working? Would it help to have more virtual CPUs or is there an inherent lack of parallelism in the boot process?
  • It’s interesting that we spend such a large amount of time in the sort function. It’s used all over the place, eg. for sorting the BIOS E820 map, sorting memory mappings, checking boot parameters, sorting lists of wake-up events …
  • task_tick_fair is part of the Completely Fair Scheduler.
  • If there’s a boot option to disable ftrace, I cannot find it.

My main conclusion is there is no “smoking gun” here. Everything generally points to things that a kernel (or at least, the core, non-module parts of a kernel) should be doing. So this analysis has not been very helpful.

Continue reading

1 Comment

Filed under Uncategorized

Tracing QEMU guest execution part 2

After an overnight 12+ hour run of my Perl scripts I now have a 52 million line file that consists of timestamps, kernel symbols, and other untranslated linear addresses. The only possible way to analyze this will be with yet more scripts, but already there are lots of interesting things.

Here is the kernel code entering userspace:

11409434.8589973 prepare_exit_to_usermode in section .text
11409435.5189973 retint_user + 8 in section .text
11409436.4899973 7fb95ab3cd20
11409447.6899973 7fb95ab4c930
11409469.2169973 7fb95ab577f0

Userspace symbols cannot be decoded because we don’t know which process is being run. More importantly, code in kernel modules cannot be decoded, so we only see core kernel functions.

Handling a timer interrupt:

18000723.5261105 apic_timer_interrupt in section .text
18000725.2241105 smp_apic_timer_interrupt in section .text
18000726.7681105 native_apic_mem_write in section .text
18000729.2691105 smp_apic_timer_interrupt + 46 in section .text
18000729.8361105 irq_enter in section .text
18000730.3941105 rcu_irq_enter in section .text
18000731.1401105 rcu_irq_enter + 92 in section .text
18000731.7111105 irq_enter + 14 in section .text
18000732.3171105 smp_apic_timer_interrupt + 51 in section .text
18000740.9941105 exit_idle in section .text
18000741.5481105 smp_apic_timer_interrupt + 56 in section .text
18000742.0881105 local_apic_timer_interrupt in section .text
18000744.0341105 tick_handle_periodic in section .text
18000744.6341105 _raw_spin_lock in section .text
18000745.9291105 tick_periodic + 67 in section .text
18000747.3941105 do_timer in section .text

Userspace loading a kernel module:

7806760.57896065 40087d
7806765.09696065 4442b0
7806765.65396065 entry_SYSCALL_64 in section .text
7806766.14496065 entry_SYSCALL_64 + 32 in section .text
7806766.46296065 entry_SYSCALL_64 + 36 in section .text
7806788.75296065 sys_init_module in section .text
7806796.76296065 sys_init_module + 62 in section .text
7806797.28296065 sys_init_module + 62 in section .text
7806800.64896065 sys_init_module + 65 in section .text
7806801.94496065 capable in section .text
7806802.91196065 security_capable in section .text
7806804.30796065 cap_capable in section .text
7806804.87796065 security_capable + 72 in section .text
7806805.43596065 ns_capable + 41 in section .text
7806805.92096065 capable + 23 in section .text
7806810.46796065 sys_init_module + 75 in section .text
7806815.59796065 sys_init_module + 86 in section .text
7806821.10196065 sys_init_module + 96 in section .text
7806827.28496065 sys_init_module + 109 in section .text
7806831.23396065 sys_init_module + 129 in section .text
7806839.75396065 security_kernel_module_from_file in section .text

What am I interested in knowing? My overall goal is to find areas in the kernel and userspace that we can optimize to make boot faster. Specifically it seems interesting at first to look at two questions:

  1. Which kernel functions consume the most time? This is “profiling 101”. It may indicate kernel functions which are particularly slow (eg. they contain sleeps).
  2. The time elapsed before each kernel function is called the first time. Because studying the whole trace is impossible (by hand), finding the first time each kernel function is called may give me an insight into major timeline events — eg. the first time one of the ext4 mount functions is called, we know that userspace has got to the point where it mounts the root filesystem.
  3. When does userspace make system calls? This is related to the previous point. It’s also the easiest type of analysis to do because system call entry is easy to detect from the trace.

More to follow …


Filed under Uncategorized

Run Linux on RISC-V in your browser


Previously, running Linux/RISC-V on qemu.

Leave a comment

Filed under Uncategorized

AMD previewing Seattle at the Red Hat Summit

The announcement is here, and according to that announcement the development boards will be available in “second half” 2015.

Leave a comment

Filed under Uncategorized

MIPS Creator (32 bit dual core MIPS)

As I mentioned in passing yesterday I have a MIPS Creator CI20. It’s a tiny single board dual core 32 bit MIPS computer. The cost was £55 including tax and delivery.


It comes with Debian mipsel pre-installed, so all I had to do was boot it the first time with an HDMI monitor and keyboard, configure the WLAN, disable the autologin, reboot and done: I have a tiny MIPS server.

One tip: The hardware clock was way off, so installing NTP at the earliest opportunity is a good idea.

The hardware isn’t going to win any prizes for speed, but it’s quite adequate for development. /proc/cpuinfo is attached at the end.

Also I think Imagination have done a good job seeding the community, setting up a mailing list, IRC, and very extensive wiki pages.

Update: Also, no native OCaml compiler … boo 😦

$ cat /proc/cpuinfo 
system type		: CI20
processor		: 0
cpu model		: Ingenic Xburst V4.15  FPU V0.0
BogoMIPS		: 1196.85
wait instruction	: yes
microsecond timers	: no
tlb_entries		: 32
extra interrupt vector	: yes
hardware watchpoint	: yes, count: 1, address/irw mask: [0x0fff]
microMIPS		: no
ASEs implemented	: mxu
shadow register sets	: 1
kscratch registers	: 0
core			: 0
VCED exceptions		: not available
VCEI exceptions		: not available
Hardware		: CI20
Serial			: 6f1b2731 1181a003 f9080000 8861aa4f

processor		: 1
cpu model		: Ingenic Xburst V4.15  FPU V0.0
BogoMIPS		: 1196.85
wait instruction	: yes
microsecond timers	: no
tlb_entries		: 32
extra interrupt vector	: yes
hardware watchpoint	: yes, count: 1, address/irw mask: [0x0fff]
microMIPS		: no
ASEs implemented	: mxu
shadow register sets	: 1
kscratch registers	: 0
core			: 0
VCED exceptions		: not available
VCEI exceptions		: not available
Hardware		: CI20
Serial			: 6f1b2731 1181a003 f9080000 8861aa4f


Filed under Uncategorized

Not very serious quest to run Linux on every architecture

Just looking at the Debian Ports page, how many of these architectures do I own?

Arch Notes
Alpha Good, working Alpha hardware is getting hard to obtain these days.
amd64 (x86-64) Obviously …
ARM (32 bit) I don’t really care about anything less than ARM version 7 these days, although I do have a Sheevaplug which I think is v5 or v6. My “go-to” 32 bit ARM board is the Cubietruck.
ARM (64 bit) I don’t own, but have under my desk at work, the Applied Micro X-gene (picture), and a lovely piece of kit it is too.
HP-PA I would love to get my hands on some HP-PA RISC hardware, but it’s even harder to get hold of than Alpha. Edit: If you really want HP PA then the HP C8000 workstation is available on eBay for $300 or so.
i386 I own a Lenovo X60s laptop, which is one of the few 32 bit Intel processors with (very slow) hardware virtualization
ia64 HP Itanium RX2620
M68K As I started out my professional career writing 68k assembler, I’d love to buy an MVME crate, but incredibly they go for upwards of $1000 on ebay, even 20 years after they stopped being made.
MIPS ✓/✗ I just bought the MIPS Creator CI20. I don’t own any Big Endian MIPS hardware, nor any 64 bit MIPS hardware.
OpenRISC I have this burned into an FPGA, but in storage.
PPC (32 bit) I recently sent a couple of old 32 bit Mac G4s to recycling.
PPC/POWER (64 bit) ✓/✗ I own a Mac G5 running Linux (picture). However I don’t own, and as far as I know cannot get my hands on any of the more interesting POWER7 or POWER8 hardware, although I have remote access to it through Red Hat.
s390/s390x Similar to ppc64, no possible way to own it, but I am able to remote access it through Red Hat.
SPARC (32 bit) Currently sitting in storage, but it’s there and it used to run Linux, albeit with only 128 MB of RAM
SPARC (64 bit) Is it possible to buy 64 bit SPARC hardware for a reasonable price?

Looks like I’m about half way there!


Filed under Uncategorized