Elfvector is a package for generating and using a transfer vector for subroutine linkage between an ELF executable and an ELF shared library under Linux on x86, in order to save space and application startup time. In an ELF main executable, a call to a subroutine in a shared library requires 16 bytes in the dynamic symbol table .dynsym, 16 bytes in program linkage table .plt, 8 bytes in relocation table .rel.plt, about 7 bytes in .hash (4 bytes of chain, plus about 4*(2/3) bytes of bucket), and the string length of its name in .dynstr. Total: over 50 bytes in the executable to support just one function. And in the shared library, the target function again requires 16 bytes in .dynsym, about 7 bytes in .hash, plus the string length of its name in .dynstr. And if that function is called from another source file that was also linked into the shared library, then there is also 16 bytes of .plt and 8 bytes of .rel.plt. Total: another 50 bytes per function. And the runtime dynamic linker has to hook together the linkages upon each invocation.
All that space and time is used to enable the executable to override symbols in the shared library on a symbol-by-symbol basis, just as if the application had been linked with an archive library (*.a) instead. It also enables changing either the executable or the shared library (or both) with lowest detrimental impact on each other. However, most applications treat a shared library as an integrated package of subroutines. Only very few applications use the ability to override symbols. And some system administrators (such as those with applications on an embedded system) are willing to use additional care to coordinate changes between applications and shared libraries.
In a shared library, Elfvector requires 4 bytes per function, plus 16 bytes of overhead; which is a per function savings of 43 bytes plus name length. In a main program, Elfvector uses 12 bytes per target function, plus 5 bytes per group of upto 33 functions, plus 56 bytes to relocate the addresses at runtime; which is a per function savings of 35 bytes plus name length. The execution cost is one relative jmp per call, faster than the standard PLT linkage which uses an indirect jmp. The first call relocates the table of used pointers many times as fast as PLT relocation. Elfvector is thread safe.
Elfvector contains five main pieces:
expvec
constructs an export vector for a shared library,
impvec
constructs an import vector for an application,
retrovec
squeezes an export vector into an already built shared library,
retroimp
squeezes an import vector into an already built application; and
vectool
manages the use of expvec and impvec to create and maintain a group
of shared libraries with non-overlapping fixed addresses (to reduce
startup time of KDE and its applications, for instance).
In general, the export vector for
expvec contains all the global functions
of the library, the import vector contains just the global functions
that are actually referenced by the application, and a retrovector contains
a subset of frequently used global functions. Both an export
vector and an import vector can be tailored to include or exclude functions,
by using regular expression matches against function names.
expvec [ --rebuild ] incl excl vscript loader_args... > export.vec 3> export.vec.o
incl specifies
which symbols to include in the vector;
commonly . for all symbols
excl
specifies which symbols to exclude from the vector;
often '' for no symbols
vscript
a version script to control the symbol visibility of the output shared
library; often '' for no script
loader_args...
command line arguments to build the shared library (must contain -o
output)
export.vec
text file with name of vector and order of symbols
export.vec.o
binary compiled version of export.vec
If the first character of the incl or excl argument is an at-sign @, then the rest of the argument is a pathname to a file containing regular expressions which are matched against symbol names. If the first character is not @, then the argument is a literal list of regular expressions, for convenience in small cases, or in expert shell usage. Regular expressions follow <regex.h>, and are separated by white space. Order of regular expressions matters in incl, but does not matter for excl. Comments beginning with sharp sign # cause the rest of the line to be ignored.
Output file export.vec begins with the name of the transfer vector, which starts with __V_ and is based on the .soname of the shared library. The four characters __V_ also serve as a magic number in bytes 0 through 3 of export.vec. As generated by expvec, the rest of export.vec contains one symbol per line, giving the actual order in the vector. The user may edit export.vec to add comments, but the __V_ vector name must be very first in the file; and adding, deleting, modifying, or re-arranging any symbols will lead to bad usage later. Output file export.vec.o is the compiled version of the export vector. This is useful when reloading the library without making changes to the export vector, such as when fixing a bug in one of the functions contained in the library.
Optional flag --rebuild facilitates making a new vector which is backward compatible with an old vector. --rebuild causes the patterns in incl to be interpreted as whole literal symbol names, which must match exactly (strcmp instead of regexec). To maintain compatibility with existing uses of the old export vector, the new incl should begin with the entire contents of the old export vector. expvec still generates a new export vector, which will contain all non-excluded new functions at the end. The simplest usage would be --rebuild @copy_of_old_export.vec, which rebuilds export.vec.o, for instance to recover from deleting *.o. Remember to use a copy of the old export vector file (and not the original) when redirecting stdout onto the same filename.
The loader_args... are the command arguments to build the shared library. They must contain -o output_filename so that expvec can identify the shared library file.
expvec sub invokes loader_args, processes the symbols to construct the assembly language source code for the vector, sub invokes gcc to assemble the vector, then sub invokes loader_args again (but modified) to build the final shared library. The vscript argument is the version script to use on the final load. It replaces any -Wl,-version-script= piece of loader_args. A null vscript argument ("" or '') is ignored: no change to the version script (if there is one).
Here the input file foo.order contains the regular expressions which determine the contents and order of symbols in the export vector. The explicit null argument '' says that no symbols are to be excluded from the vector. Input file vscript is the version script for the final load. Output file foo.vec will contain the name of the export vector and the actual symbol order. The part
gcc -shared -Wl,-soname=foo.so.5 -o foo.so.5 foo.o
is a command line to build the shared library with .soname foo.so.5
as output file foo.so.5, using input file foo.o.
impvec xclude export.vec... loader_args... > import.s
xclude
specifies symbols to exclude from the import vector;
often '' to exclude no symbols
export.vec... pathnames to
expvec
.vec
files
loader_args...
arguments to load the application, using the shared libraries built by
expvec
import.s
assembly-language source for import vector
Impvec sub invokes loader_args... with -Wl,--noinhibit-exec. Then impvec sub invokes ldd on the executable to determine the shared libraries, matches undefined symbols in the executable with symbols from the export.vec files, and generates code for the import vector. To complete the construction of a runnable application: assemble the import vector, and load it into the application.
The quoted first argument
'^free$ ^malloc$ ^calloc$ ^realloc$'
excludes those functions from the import vector, so that they can be manipulated ("wrapped") in the future using LD_PRELOAD.
The input file foo.vec is the export vector built by expvec, and output file foo_imp.s is the code for the import vector. The part
gcc -o testmain testmain.o foo.so.5
is the ordinary recipe to construct the testmain executable.
Shared library foo.so.5 must be found in a directory named by
environment variable LD_LIBRARY_PATH, so that impvec
can subinvoke ldd to determine the shared libraries that are used.
retrovec export.vec foo.so > vecfoo.so
export.vec export vector
file
foo.so
existing shared library
vecfoo.so
new shared library with export vector
The export.vec argument must be a vector file containing first the vector name and then the literal names of function names. No regular expression support is available. A subset must be used., and the care necessary to construct a literal list helps to avoid surprises later. Use the same export.vec file in subsequent runs of impvec.
WARNING: ANY BUG OR MISTAKE IN REPLACING /lib/libc.so.* CAN MAKE YOUR SYSTEM UNBOOTABLE, OR CAUSE THE LOSS OF DATA. DO NOT ATTEMPT TO REPLACE /lib/libc.so.* UNLESS YOU HAVE A TESTED, SUCCESSFUL, ALTERNATE ROOT FILESYSTEM, SUCH AS A BOOT _AND_ ROOT BOOTABLE FLOPPY DISK, TO USE FOR RECOVERY. BACKUP THE ROOT FILESYSTEM BEFORE YOU START. And, of course, conduct preliminary tests using LD_LIBRARY_PATH and/or chroot.
The .hash section of an ELF module uses separate chaining to search the runtime dynamic symbol table. Reducing the number of chains by 10% probably will have only a small impact on searches. The utility program readelf from the GNU binutils package prints a histogram of the length of the chains.
Here's one way to help decide which functions to include. The shell script
#!/bin/sh
nm -u --dynamic ` file $* | sed -n
-e '/dynamically/s/:.*//p' ` |
sed '/:$/d' |
sort | uniq -c | sort -rn
determines which argument files are executables that use shared libraries,
extracts their undefined dynamic symbols, and lists such symbols by frequency
of reference. One possibility for glibc is to run script
/bin/* /usr/bin/* and keep only the 100 or so most frequently
referenced functions. This will tend to save the most space and time
for the cost of reducing the number of chains by 100.
retroimp xclude export.vec... a.elf > new_a.elf
xclude
specifies symbols to exclude from the import vector
export.vec... pathnames to
expvec
.vec
files
a.elf already linked application
new_a.elf modified application
which uses the vectors
retroimp rewrites the Program Linkage Table (PLT) of the application,
and takes a small amount of space from the .hash table for the
names of the vectors that are used. retroimp has a limit
of no more than 31 used export.vec files. retroimp
changes the e_entry point so that the rewritten PLT entries are
all relocated at runtime, before transfering control to the original entry
point. The relocation code requires that there be at least 13
+ 5 * nUsedVectors slots in the PLT that have been rewritten.
vectool --mode=[ lib | prog ] --addrfile=file --rpathdir=dir --exclexp="patterns" --inclrel="patterns" --exclrel="patterns" loader_args...
--mode=lib
for building a library
--mode=prog
for building a main program
--addrfile=file
file containing the assignments of fixed address for each .soname
--rpathdir=dir
directory to hold symbolic links to libraries and export vectors, when building
multiple libraries before installing them into their final destinations
--exclexp=
regular expressions of symbol names to exclude from the export vector; default
'^__V_ ^_init$ ^_fini$'
--inclrel=
regular expressions of symbol names to include in relvec processing;
default '.' [matches every symbol]
--exclrel=
regular expressions of symbol names to exclude from relvec processing;
default '^__ti ^__environ$ ^__ctype_'
loader_args...
command line arguments to build the shared library (must contain -o
output and -Wl,-soname)
vectool manages the creation and maintenance of a group of shared libraries with fixed addresses, to speed process startup by reducing the burden of runtime relocation. Even though expvec and impvec already reduce the relocations for external linkage, some shared libraries can still contain many relocations, for instance from the tables which g++ uses to support virtual functions in C++. By using a fixed base address, these relocations can be performed by static binding instead of at runtime. Then the problem becomes creating and managing the assignment of fixed addresses; vectool can help.
The vectool methodology is: Use expvec to build a normal, trial version of the output shared library with variable runtime base address, and construct the export vector. Use ldd to find the dependencies on other libraries. If some unfound dependency lives at a fixed address, then there could be trouble at runtime due to overlap of assigned addresses. Construct an import vector using the export vectors (if any) of the NEEDED first-level dependencies, as revealed by objdump -p. See how much address space this library uses now, and calculate the fixed base address. Find the upper bound from the minimum of the standard ELF main (0x08048000) and all other libraries with fixed base addresses, whether the libraries are dependents or not. (The _next_ module could depend on any subset of the libraries, so no overlap is allowed.) If there is a base address from last time, then see if the library still fits between it and the upper bound. If there is no old base, then calculate a base which allows some room for expansion (33%, plus a few pages), but otherwise abuts the upper bound. Store the base in a loader script and the file of assigned addresses. Then build a shared library with fixed addresses using the loader script. Finalize some of the relocations (in particular, the relocations for the vector itself, and R_386_RELATIVE relocations) using relvec. After installing the library in the final destination directory, remove the construction directories from DT_RPATH by using rpathrm.
Although a library built by vectool lives at a fixed address at runtime, no other module (library or main executable) knows what that address is until it is instantiated in a process. Thus the address assignment can be changed arbitrarily by running vectool again. The only constraint is choosing non-overlapping address assignments, and vectool automates the initial choice and maintenance.
The fixed_addrs file contains one line per library, with the address first and the .soname second. The address must be in decimal because bash versions prior to 2.05 (April 2001) do not understand hexadecimal constants. The addresses must be in decreasing order. A sharp sign # introduces a comment to the end-of-line. vectool appends one line to the file when there is no previous address assignment for the .soname.
As long as each library fits within its existing assignment, then vectool handles normal development: recompilation, bug fixing, adding new functions. Deleting a function from a vector is detected as an error, because some client may be relying on the slot number, or any larger slot number. (If you know that this is not the case, then edit the export vector by hand before invoking vectool.) When a library no longer fits (vectool detects this situation), then manual intervention is required. The easiest fix is to delete the assignment from fixed_addrs, and also delete the .lds loader script file. Then rerun vectool, which will assign a new address lower than all existing addresses. This will leave a "hole" at the old base address, but the hole is also more room to expand for the library at the next lower address. An alternate fix is to take enough unused expansion space from the library at the next lower address, edit the fixed_addrs file and the .lds loader script, then rerun vectool.
glibc-2.x (libc.so.6): first build with the usual procedure
such as rpm --recompile libc*.src.rpm (or cd glibc-build; make), and save
the transcript output in order to capture the command which builds
libc.so. Also preserve the .lds loader script.
Initialize the fixed address list (here /usr/lib/SHLIB_ADDRS) with
132120576 libc.so.6 # 0x07e00000
then run
vectool --mode=lib --addrfile=/usr/lib/SHLIB_ADDRS --rpathdir=/usr/lib/RPATHDIR
--exclexp='^__V_ ^_init$ ^_fini$ ^free$ ^cfree$ ^malloc$ ^calloc$ ^realloc$ ^memalign$ pthread_ ^__strtoq_internal$ ^__strtouq_internal$ ^res_init$ ^atexit$' --inclrel='^__ ^_[A-Z]' gcc -shared ...
Here atexit, res_init, __strtou*q_internal are excluded because of versioned symbol conflicts; free/malloc/etc because of frequent LD_PRELOAD for allocation debugging, and pthread_ because of intentional internal overriding. The symbols for --inclrel= are the ones which ISO C89 reserves for library implementors. They should never be overridden, whereas many other symbols can be.
XFree86 4.x: Again, build the usual way first. Then in file xc/config/cf/lnxLib.rules, change every line containing SHLIBLDFLAGS by adding vectool --mode=lib --addrfile=/usr/lib/SHLIB_ADDRS --rpathdir=/usr/lib/RPATHDIR before $(CC). Remove all output shared library files from the first build. Then go to directory xc, issue make -f xmakefile -w Makefiles, and finally go to directory xc/lib and perform make.
libqt: In qt-2.3.0/src/Makefile, change to SYSCONF_LINK_SHLIB = vectool --mode=lib --addrfile=/usr/lib/SHLIB_ADDRS --rpathdir=/usr/lib/RPATHDIR g++
KDE: Still experimental. Most kdelibs-2.2 libraries use functions from libqt but do not mention -lqt when built, and this is a problem. Also see the Notes; several C++ vs. linker issues remain.
The use of relvec is still subject to trial and error. (Comment out that line of the vectool script to avoid this.) For instance, even with only libc.so.6 being processed by relvec, one of the early commands in the startkde script crashes in nl_langinfo(). But if just one R_386_RELATIVE relocation persists in libc.so.6, then it works. I do not understand this.
Any Common block or multiply-defined function which depends on the C++ one definition rule must be excluded from relvec processing. relvec is equivalent to a partial -Bsymbolic. In fact, using -Bsymbolic would be a good strategy except that it forces the definition of Common blocks, and labels the DYNAMIC section with DT_SYMBOLIC. A utility to erase DT_SYMBOLIC would be easy, but forcing the postponement of the definition of Common blocks is something that the GNU ld linker cannot do under -Bsymbolic. The use of relvec speeds runtime invocation, but the present implementation does not save any space.
For greatest speed benefit, the programmer, C++ language, and linker must get together and cooperate to reduce the costs that are presently mandated by the combination of intentional multiple definitions (especially runtime type info rtti, and inline functions with static data), "accidental" function overriding, and the conflict between "shared library of independent functions" versus "dynamic shared object which implements an integrated module with specified external programming interface." It is a goal of elfvector and vectool to demonstrate by example how much the traditional shared library approach costs in time, space, and complexity.
Elfvector doesn't use the bfd package because it was faster to code from scratch than to understand the bfd documentation as of libbfd-2.9.1.0.23. Also, I could fix my own bugs faster than the bugs in bfd. For instance: -Wl,-noinhibit-exec puts undefined symbols only in the .symtab table, and not in the .dynsym table (bug gcc-queue/1644 at http://www-gnats.gnu.org:8080/cgi-bin/wwwgnats.pl ) Also, objdump --section-headers a.out frequently omits .symtab and .strtab even though they are there.
Note that the final vscript for expvec can be quite different from the version script used in loader_args. For instance, many of the symbols in the export vector can be declared local in vscript. Some symbols can be both global and in the vector.
Probably impvec should do the assembly and final load, too. [?]
expvec could take a flag argument to exclude symbols based on type (.bss, .data, absolute, weak, etc.)
Impvec ought to accept an LD_LIBRARY_PATH= argument to set the environment (like the Bourne shell, at the beginning of loader_args...) when sub invoking its builds.
Investigate why adding -Wl,-Bsymbolic doesn't save more space in the build of glibc-2.1.3. Why is the .plt still so big?
Remember to put ^ and $ in regular expressions: ^printf$
For use on glibc, you probably want to exclude
^malloc$ ^calloc$
^free$
^_IO_
etc. ld-linux.so.2 depends on libc's malloc,
for instance.