running UserModeLinux under Valgrind(memcheck)
John Reiser, BitWagon Software LLC; December 2007
revised 2008-08-30
Patches have been developed which enable UserModeLinux for i686 to run under the memcheck tool of Valgrind on i686. Thus it is possible to check dynamically the memory accesses made by a running Linux kernel against memcheck's model of allowed behavior. This work was sponsored by Google Inc.
The combined patches are at "alpha" quality. They have memchecked an entire trivial session (boot UML, login, halt), and have identified a couple specific problems in kernel code. The steps necessary to reach "beta" quality (a motivated kernel developer can get useful results) have been outlined and are being pursued.
The patches:
15KB http://bitwagon.com/valgrind+uml/valgrind-3.3.0-2007-12-27.patch.gz
60KB http://bitwagon.com/valgrind+uml/uml-2.6.22.5-2007-12-27.patch.gz
As a convenience for when the official sites are not responding, here are copies of the original unpatched software that is required:
4MB http://bitwagon.com/valgrind+uml/valgrind-3.3.0.tar.bz2
45MB http://bitwagon.com/valgrind+uml/linux-2.6.22.5.tar.bz2
103MB http://bitwagon.com/valgrind+uml/FedoraCore5-x86-root_fs.bz2
Approximately 2.5GB of disk space is required in all.
This page will be updated, rewritten, and re-organized as appropriate.
2008-08-30 Steve VanDeBogart has made significant progress: rebase to linux-2.6.26.2, reduce in the direction of minimal changes, use Ubuntu FeistyFawn, etc. http://uml.jfdi.org/uml/Wiki.jsp?page=ValgrindingUML
2008-01-27 Patches to valgrind have been cleaned for svn revision 7358.
23KB http://bitwagon.com/valgrind+uml/vg330-patches-jreiser0127.tgz
2008-01-23 Two more test cases for modified code.
23KB http://bitwagon.com/valgrind+uml/vg330-patches-jreiser0123.tgz
2008-01-21 Patch cleanup: remove leftover debugging debris, separate into logical groups, add testcases, add documentation.
20KB http://bitwagon.com/valgrind+uml/vg330-patches-jreiser0121.tgz
2008-01-01 Initial posting.
Initial reconnaissance: valgrind did not deal with all flag combinations to the clone system call as used by UML.
Support more flag bit combinations for clone in valgrind, including specifying a new stack pointer for a fork-like clone.
Support switching subroutine stacks in longjmp used by UML. This involved create a new client request VALGRIND_STACK_SWITCH and modifying longjmp for UML.
Steps 1,2,3 were enough to detect a memory overrun (into uninit bits) by UML in the sector bitmap used during bootup. Reported to [uml-devel] on 2007-12-04, and credited to valgrind.
Implement new functionality: a bit in the flag argument to clone requests that the child run natively, and not be tracked by any virtualizer (such as memcheck.) Use by UML for non-kernel code simplifies life because UML need not track the virtualization. [Jeff Dike tip.]
Race found: glibc caches getpid(), and updates unsafely after fork-like clone(). https://bugzilla.redhat.com/show_bug.cgi?id=417521 This confuses the debugging human.
Valgrind supports native child of clone: “let go” of control, with correct environment for signal handlers, and native values for register %gs and register %es.
“Bug” found: UML pipes uninitialized bytes for synchronization only. The fix is trivial.
Annotate linux/mm/slab.c to make memcheck understand the SLAB allocator. Initially this is VALGRIND_MALLOCLIKE_BLOCK and VALGRIND_FREELIKE_BLOCK, plus enough VALGRIND_MAKE_MEM_DEFINED and VALGRIND_MAKE_MEM_UNDEFINED to cover the manipulation of supervisory data that a client should not see. Later, most MALLOCLIKE and FREELIKE were turned into DEFINED and UNDEFINED because of the semantic clash: kmalloc+kfree do not have same semantics as malloc+free.
Bug found: /dev/urandom uses uninitialized bytes. Reported to [linux-kernel] 2007-12-14.
Valgrind remembers both ends of the new stack when switching stacks.
Add size trailing parameter to all callers of kmalloc, in attempt to deal with semantic clash.
Explore new optional mode for memcheck: --complain-asap=yes makes it easier to find the cause of “Conditional jump or move depends on uninitialized data.”
VGUML_TOP=$PWD/vguml-top # a new directory to contain the files
rm -rf $VGUML_TOP
mkdir -p $VGUML_TOP # need 2.5GB eventually
# Download official sources.
cd $VGUML_TOP
wget http://uml.nagafix.co.uk/FedoraCore5/FedoraCore5-x86-root_fs.bz2 # 103MB
wget http://www.kernel.org/pub/linux/kernel/v2.6/linux-2.6.22.5.tar.bz2 # 45MB
wget http://valgrind.org/downloads/valgrind-3.3.0.tar.bz2 # 4.5MB
# Expand the root filesystem that will be used by UML.
bunzip2 -dc < FedoraCore5-x86-root_fs.bz2 > FedoraCore5-x86-root_fs # 1.6GB
# Build Valgrind
cd $VGUML_TOP
tar xfj valgrind-3.3.0.tar.bz2 # 34MB
cd valgrind-3.3.0
zcat $VGUML_TOP/valgrind-3.3.0-2007-12-27.patch.gz | patch -p1 --backup
./configure –prefix=/usr/local/vguml-3.3.0 # prefix will be unused
make # 92.5MB
# Build UserModeLinux (UML)
cd $VGUML_TOP
tar xfj linux-2.6.22.5.tar.bz2 # 300MB
cd linux-2.6.22.5
zcat $VGUML_TOP/uml-2.6.22.5-2007-12-27.patch.gz | patch -p1 --backup
make ARCH=um defconfig
make ARCH=um # 513MB
# Invoke newly-built modified valgrind without instaling it.
export VALGRIND_LAUNCHER=$VGUML_TOP/valgrind-3.3.0/coregrind/valgrind
cd $VGUML_TOP/linux-2.6.22.5
$VGUML_TOP/valgrind-3.3.0/.in_place/x86-linux/memcheck \
./linux nosysemu ubda=../FedoraCore5-x86-root_fs mem=128M
# NOTE: on a 2GHz i686, it will take about 2.5 minutes to reach “login: ”.
# Login as “root” followed by <Enter>.
# Stop the instance via “halt” followed by <Enter>.
# If necessary, stop everything using “killall -KILL memcheck”.
# Restore the xterm with “\n stty sane \n” where '\n' is linefeed (<Control>j).
# The UserModeLinux home page is http://user-mode-linux.sourceforge.net/
Invoking memcheck directly under the debugger gdb-6.6-8 often was not successful because threads and child processes got in the way. Even when using the gdb command set detach-on-fork off in an attempt to capture all children, gdb often would would hang or abort. Instead, it was more productive to use printf, VG_(printf), and VALGRIND_PRINTF. What was effective was deliberately to code an infinite loop for(;;); then use a shell command ps a to see which PID was running up CPU time, and attach gdb directly to that thread. It also helped to keep event histories in static circular buffers, to be examined and correlated later.
Two gdb macros were useful for dealing with the UML code that was being run by memcheck:
define umlsyms
add-symbol-file linux-2.6.22.5/linux 0x08055170
end
define umlregs # one argument: the thread number
p vgPlain_get_ThreadState( $arg0 )->arch.vex
end
where the magic constant 0x08055170 is from
objdump –section-headers linux-2.6.22.5/linux | grep text
When debugging memcheck, often it was handy to construct a hand traceback of the UML code by knowing that register %ebp usually points to the stack data structure:
struct frame {
struct frame *ebp_previous;
void *return_pc;
};
The patched combination of software has reached the “alpha” level of development: it works much of the time, in constrained circumstances, when run by the originator of the patches. Further work is required to reach the “beta” level where a motivated software engineer working on a Linux kernel, but who is not the originator of the patches, can get useful results. This work is estimated to consist of four major elements:
Deal with the semantic clash of kmalloc+kfree versus malloc+free.
Enhance memcheck with the option to complain immediately upon fetching uninitialized bits, instead of waiting until such bits influence flow-of-control, input/output, or array indexing.
Catch up with software evolution of the underlying Valgrind and UML software.
Investigate and fix bugs; support users.
Although similarly named and motivated, the Linux kernel functions kmalloc and kfree have different properties than the malloc and free which are fundamental to memcheck. The basic difference is that an object and its contents persist from kfree to kmalloc, while a block that is passed to free [logically] loses both its contents and its identity. The kernel functions manage a cache of identified objects that are used and re-used, somewhat like books in a lending library. After being returned to the cache by kfree, then the same kernel object (together with its old contents, including any changes) is re-issued to the next borrower by kmalloc. Instead, the memcheck model treats an allocated block as a sized collection of contiguous bytes that are [logically] erased upon free, and possibly re-sized and re-aggregated by malloc. The size is an important parameter to malloc, but to kmalloc the size is implicit: the object which will be re-used keeps its original size.
Despite this semantic difference of implicit size and content carry-over from kfree to kmalloc, some kernel clients would be happy with the semantics of malloc and free. The existence of the options SLAB_POISON and SLAB_RED_ZONE for the SLAB allocator shows this, because SLAB_POISON negates carry-over of contents and SLAB_RED_ZONE facilitates debugging of allocations with varying effective size.
The work required in this area for Valgrind+UML to reach beta usability consists in reconciling the usage models of kmalloc+kfree versus malloc+free. This will involve highlighting, publicizing, and educating about the difference. Identify which kernel clients and usages depend on the difference. Adapt if possible; isolate if not. Some exploration has been done already. The size has been added as a parameter to all calls to kmalloc, and the blocks are marked as DEFINED and UNDEFINED instead of MALLOCLIKE and FREELIKE. The concept of a memory leak needs clarification. A loadable device driver really wants to identify such problems, but the usage model tends to hide them.
One of memcheck's advantages is its low rate of “false positive” complaints about uninitialized bits. Memcheck silently allows fetch, store, and arithmetic on uninitialized data. This covers “holes” for alignment or padding in structures, as well as a multitude of “don't care” conditions. Memcheck complains only when uninitialized bits affect flow of control, input/output, or array indexing. Thus a complaint from memcheck nearly always represents a real bug in user code, and this certainty builds trust with its users. However, responding to a report from memcheck can be difficult because the origin of the problem may be long ago and far away from the complaint. This difficulty is exacerbated by the kernel's general usage model: caches of objects which accumulate history.
An experimental mode has been added to memcheck: “complain immediately upon load of uninitialized bits.” This pinpoints problems much more quickly, at the cost of many more false positive complaints due to uninitialized “holes” in structures and bitfields, uninitialized “don't care” bits that are ignored later, and various optimizations by gcc (such as over-fetching: on i686 it is less expensive to fetch 4 bytes even though the data size is short.) Therefore, employ palliatives such as glibc-audit (http://bitwagon.com/glibc-audit/glibc-audit.html: fix the “intentional” uninitialized bits in glibc), remove holes in structures, analyze and write rules to suppress complaints from known cases.
The initial implementation of a commandline option “--complain-asap=yes” for memcheck is contained in valgrind-3.3.0-2007-12-27.patch.gz and also is available as a separate patch to valgrind-3.3.0, independent of the valgrind+UML work:
10KB http://bitwagon.com/valgrind+uml/mc_main-asap.patch
A glibc that has been audited to be “quiet” with respect to memcheck is described at:
http://bitwagon.com/glibc-audit/glibc-audit.html
One important pending modification for “complain ASAP” mode is to be silent when reloading saved registers (%ebx, %esi, %edi) upon subroutine exit, even if the saved bits might be uninitialized.
Today, the work uses valgrind-3.3.0 and uml-2.6.22.5. The underlying Valgrind is only a couple weeks old, but UML currently is at 2.6.23 and soon will be 2.6.24. The UML user community is thin and sparse; testing coverage suffers. Immediately jumping to a new version is not always a good idea. For example: a patch to make SKAS3 mode work properly on uml-2.6.23 was not released until 2007-12-08, which was two months after the release of linux-2.6.23.1. The coming merge of x86 and x86_64 architectures in linux-2.6.24 may cause unknown problems for the integration of Valgrind and UML. Nevertheless, uml-2.6.22.5 now is four months old. Work should transition quickly to the relatively recent 2.6.22.15 (Dec.14), and then attempt to track current state as closely as possible.
The patches for running the combination Valgrind+UML are not yet robust. Signal handling and timing-dependent races may well reveal or cause problems. Initial hand-holding and some ongoing support probably will be necessary or appropriate.