|  | The design of crazy_linker: | 
|  | =========================== | 
|  |  | 
|  | Introduction: | 
|  | ------------- | 
|  |  | 
|  | A system linker (e.g. ld.so on Linux, or /system/bin/linker on Android), is a | 
|  | particularly sophisticated piece of code because it is used to load and start | 
|  | _executables_ on the system. This requires dealing with really low-level | 
|  | details like: | 
|  |  | 
|  | - The way the kernel loads and initializes binaries into a new process. | 
|  |  | 
|  | - The way it passes initialization data (e.g. command-line arguments) to | 
|  | the process being launched. | 
|  |  | 
|  | - Setting up the C runtime library, thread-local storage, and others properly | 
|  | before calling main(). | 
|  |  | 
|  | - Be very careful in the way it operates, due to the fact that it will be used | 
|  | to load set-uid programs. | 
|  |  | 
|  | - Need to support a flurry of exotic flags and environment variables that | 
|  | affect runtime behaviour in "interesting" but mostly unpredictable ways | 
|  | (see the manpages for dlopen, dlsym and ld.so for details). | 
|  |  | 
|  | Add to this that most of this must be done without the C library being loaded or | 
|  | initialized yet. No wonder this code is really complex. | 
|  |  | 
|  | By contrast, crazy_linker is a static library whose only purpose is to load | 
|  | ELF shared libraries, inside an _existing_ executable process. This makes it | 
|  | considerably simpler: | 
|  |  | 
|  | - The runtime environment (C library, libstdc++) is available and properly | 
|  | initialized. | 
|  |  | 
|  | - No need to care about kernel interfaces. Everything uses mmap() and simple | 
|  | file accesses. | 
|  |  | 
|  | - The API is simple, and straightforward (no hidden behaviour changes due to | 
|  | environment variables). | 
|  |  | 
|  | This document explains how the crazy_linker works. A good understanding of the | 
|  | ELF file format is recommended, though not necessary. | 
|  |  | 
|  |  | 
|  | I. ELF Loading Basics: | 
|  | ---------------------- | 
|  |  | 
|  | When it comes to loading shared libraries, an ELF file mainly consists in the | 
|  | following parts: | 
|  |  | 
|  | - A fixed-size header that identifies the file as an ELF file and gives | 
|  | offsets/sizes to other tables. | 
|  |  | 
|  | - A table (called the "program header table"), containing entries describing | 
|  | 'segments' of interest in the ELF file. | 
|  |  | 
|  | - A table (called the "dynamic table"), containing entries describing | 
|  | properties of the ELF library. The most interesting ones are the list | 
|  | of libraries the current one depends on. | 
|  |  | 
|  | - A table describing the symbols (function or global names) that the library | 
|  | references or exports. | 
|  |  | 
|  | - One or more tables containing 'relocations'. Because libraries can be loaded | 
|  | at any page-aligned address in memory, numerical pointers they contain must | 
|  | be adjusted after load. That's what the relocation entries do. They can | 
|  | also reference symbols to be found in other libraries. | 
|  |  | 
|  | The process of loading a given ELF shared library can be decomposed into 4 steps: | 
|  |  | 
|  | 1) Map loadable segments into memory. | 
|  |  | 
|  | This step parses the program header table to identify 'loadable' segments, | 
|  | reserve the corresponding address space, then map them directly into | 
|  | memory with mmap(). | 
|  |  | 
|  | Related: src/crazy_linker_elf_loader.cpp | 
|  |  | 
|  |  | 
|  | 2) Load library dependencies. | 
|  |  | 
|  | This step parses the dynamic table to identify all the other shared | 
|  | libraries the current one depends on, then will _recursively_ load them. | 
|  |  | 
|  | Related: src/crazy_linker_library_list.cpp | 
|  | (crazy::LibraryList::LoadLibrary()) | 
|  |  | 
|  | 3) Apply all relocations. | 
|  |  | 
|  | This steps adjusts all pointers within the library for the actual load | 
|  | address. This can also reference symbols that appear in other libraries | 
|  | loaded in step 2). | 
|  |  | 
|  | Related: src/crazy_linker_elf_relocator.cpp | 
|  |  | 
|  | 4) Run constructors. | 
|  |  | 
|  | Libraries include a list of functions to be run at load time, typically | 
|  | to perform static C++ initialization. | 
|  |  | 
|  | Related: src/crazy_linker_shared_library.cpp | 
|  | (SharedLibrary::RunConstructors()) | 
|  |  | 
|  | Unloading a library is similar, but in reverse order: | 
|  |  | 
|  | 1) Run destructors. | 
|  | 2) Unload dependencies recursively. | 
|  | 3) Unmap loadable segments. | 
|  |  | 
|  |  | 
|  | II. Managing the list of libraries: | 
|  | ----------------------------------- | 
|  |  | 
|  | It is crucial to avoid loading the same library twice in the same process, | 
|  | otherwise some really bad undefined behaviour may happen. | 
|  |  | 
|  | This implies that, inside an Android application process, all system libraries | 
|  | should be loaded by the system linker (because otherwise, the Dalvik-based | 
|  | framework might load the same library on demand, at an unpredictable time). | 
|  |  | 
|  | To handle this, the crazy_linker uses a custom class (crazy::LibraryList) where | 
|  | each entry (crazy::LibraryView) is reference-counted, and either references: | 
|  |  | 
|  | - An application shared libraries, loaded by the crazy_linker itself. | 
|  | - A system shared libraries, loaded through the system dlopen(). | 
|  |  | 
|  | Libraries loaded by the crazy_linker are modelled by a crazy::SharedLibrary | 
|  | object. The source code comments often refer to these objects as | 
|  | "crazy libraries", as opposed to "system libraries". | 
|  |  | 
|  | As an example, here's a diagram that shows the list after loading a library | 
|  | 'libfoo.so' that depends on the system libraries 'libc.so', 'libm.so' and | 
|  | 'libOpenSLES.so'. | 
|  |  | 
|  | +-------------+ | 
|  | | LibraryList | | 
|  | +-------------+ | 
|  | | | 
|  | |    +-------------+ | 
|  | +----| LibraryView | ----> libc.so | 
|  | |    +-------------+ | 
|  | | | 
|  | |    +-------------+ | 
|  | +----| LibraryView | ----> libm.so | 
|  | |    +-------------+ | 
|  | | | 
|  | |    +-------------+ | 
|  | +----| LibraryView | ----> libOpenSLES.so | 
|  | |    +-------------+ | 
|  | | | 
|  | |    +-------------+      +-------------+ | 
|  | +----| LibraryView |----->|SharedLibrary| ---> libfoo.so | 
|  | |    +-------------+      +-------------+ | 
|  | | | 
|  | ___ | 
|  | _ | 
|  |  | 
|  | System libraries are identified by name. Only the official NDK-official system | 
|  | libraries are listed. It is likely that using crazy_linker to load non-NDK | 
|  | system libraries will not work correctly, so don't do it. | 
|  |  | 
|  |  | 
|  | III. Wrapping of linker symbols within crazy ones: | 
|  | -------------------------------------------------- | 
|  |  | 
|  | Libraries loaded by the crazy linker are not visible to the system linker. | 
|  |  | 
|  | This means that directly calling the system dlopen() or dlsym() from a library | 
|  | code loaded by the crazy_linker will not work properly. | 
|  |  | 
|  | To work-around this, crazy_linker redirects all linker symbols to its own | 
|  | wrapper implementation. This redirection happens transparently. | 
|  |  | 
|  | Related: src/crazy_linker_wrappers.cpp | 
|  |  | 
|  | This also includes a few "hidden" dynamic linker symbols which are used for | 
|  | stack-unwinding. This guarantees that C++ exception propagation works. | 
|  |  | 
|  |  | 
|  | IV. GDB support: | 
|  | ---------------- | 
|  |  | 
|  | The crazy_linker contains support code to ensure that libraries loaded with it | 
|  | are visible through GDB at runtime. For more details, see the extensive comments | 
|  | in src/crazy_linker_rdebug.h | 
|  |  | 
|  |  | 
|  | V. Other Implementation details: | 
|  | -------------------------------- | 
|  |  | 
|  | The crazy_linker is written in C++, but its API is completely C-based. | 
|  |  | 
|  | The implementation doesn't require any C++ STL feature (except for new | 
|  | and delete). | 
|  |  | 
|  | Very little of the code is actually Android-specific. The target system's | 
|  | bitness is abstracted through a C++ traits class (see src/elf_traits.h). | 
|  |  | 
|  | Written originally for Chrome, so follows the Chromium coding style. Which can | 
|  | be enforced by using the 'clang-format' tool with: | 
|  |  | 
|  | cd /path/to/crazy_linker/ | 
|  | find . -name "*.h" -o -name "*.cpp" | xargs clang-format -style Chromium -i |