tunelfso -- faster exec with shared libraries on Linux/x86

Would you like to get 15, 20, or even 30 minutes more useful work out of your CPU each day?  If the machine runs many  ELF-format executables which use shared libraries, such as when compiling large source trees (Linux kernel, glibc, X11, ...) or traversing file systems with shell scripts and utilities, then a surprisingly large amount of time is spent relocating symbolic references related to shared libraries.  The POSIX requirement that an application can override an individual subroutine within a shared library just by defining a visible subroutine with the same name, and its implementation via ``weak'' symbols and module search order rules, costs a lot even though 99% of the time it is not used.

How much does it cost?  Look at the stderr output of

    LD_DEBUG=symbols /bin/date   # /bin/sh or compatible

or

  setenv LD_DEBUG symbols; /bin/date; unsetenv LD_DEBUG   # C shell or compatible

to see that running the date command requires about 860 runtime symbol table lookups under Linux 2.2/glibc 2.1!  Processes with more subroutines or more shared libraries take even longer.  If a symbol table lookup averages 11 microseconds or so (3300 cycles at 300MHz; remember that a cache miss can cost 20 cycles or more, and a page fault starts at 10,000) then 860 lookups takes almost 0.01 seconds.  At 100,000 exec per day, that is about 16 minutes.

And notice that most of the symbols that are looked up begin with an underscore '_' or two '__'.  Those are symbols that an application writer is not supposed to touch without special dispensation.  In particular, looking up such a symbol almost always resolves it within the same shared library.  The answer could be pre-computed once and for all before any exec, then relocated anonymously by additive numerical constant, instead of being resolved by name at runtime.

There are cases where dynamic overriding helps or is necessary.  Its use is inherent in the glibc2 implementation of localization (character classification, etc.), in the global communication of shell environment variables between application and runtime system, and for the magic that connects stdin/stdout/stderr everywhere. Also, many of the advantageous features of LD_PRELOAD would not be possible without runtime overriding.  Still, most of the actual uses of LD_PRELOAD are to override a small, known-in-advance, set of routines, such as the memory allocation routines malloc and free.

The tunelfso package lets you customize an existing ELF shared library by substituting anonymous relocation for named relocation on a symbol-by-symbol basis, and helps in dealing with large numbers of symbols and deciding which ones to substitute.  A binary utility program does the actual changing of relocation, using extended regular expressions to specify sets of symbols to include and exclude from alteration.  A shell script figures out which symbols can be changed without functional effect for a given shared library used by a given set of applications.  You are responsible for picking a representative set of applications.

Applying tunelfso to /lib/ld-linux.so and /lib/libc.so.6 reduced the 860 lookups for /bin/date to about 440, while still running every application that was tried,  including a well-known browser, graphics system, and window manager.  Still, tunelfso is for EXPERTS ONLY.  THERE IS NO WARRANTY OF ANY KIND.

WARNING: IF THERE IS A BUG, OR IF YOU MAKE A MISTAKE, OR IF SOMETHING GOES WRONG WHEN REPLACING EITHER OF THESE TWO CRITICAL FILES, THEN YOU MIGHT NOT BE ABLE TO RUN OR TO BOOT YOUR SYSTEM AT ALL.  YOU MUST HAVE A TESTED, SUCCESSFUL ATERNATE BOOT METHOD (SUCH AS ANOTHER ROOT PARTITION, OR A BOOT+ROOT "RESCUE" FLOPPY DISK) WHICH ALLOWS YOU TO FIX PROBLEMS AND/OR RESTORE PREVIOUS STATE.   BACKUP THE ROOT PARTITION BEFORE YOU TRY.

tunelfso is for Linux 2.x on x86, is distributed in source form under GNU GPLv2, and is available at http://www.BitWagon.com/tunelfso-0.1.tgz  (4.1 Kbytes).