--- /dev/null
+<html>
+<head>
+ <title>Linker</title>
+</head>
+<body>
+
+
+<h1>
+ Inside the Linker
+</h1>
+<div class="doc_author">
+ <p>Written by <a href="mailto:kledzik@apple.com">Nick Kledzik</a></p>
+</div>
+
+
+<h2>
+ <a name="introduction">Introduction</a>
+</h2>
+
+<p>The Darwin linker is a new generation of linker. It is not "section" based
+like traditional linkers which mostly just interlace sections from multiple
+object files into the output file. The Darwin linker is based on "Atoms".
+Traditional section based linking work well for simple linking, but their model
+makes advanced linking features difficult to implement. Features like dead code
+stripping, reordering functions for locality, and C++ coalescing require the
+linker to work at a finer grain.
+</p>
+
+<p>An atom is an indivisible chunk of code or data. An atom has a set of
+attributes, such as: name, scope, content-type, alignment, etc. An atom also
+has a list of Fixups. A Fixup contains: a kind, an optional offset, an optional
+addend, and an optional target atom.</p>
+
+<p>The Atom model allows the linker to use standard graph theory models for
+linking data structures. Each atom is a node, and each Fixup is an edge.
+The feature of dead code stripping is implemented by following edges to mark
+all live atoms, and then delete the non-live atoms.</p>
+<br>
+<h2>
+ <a name="Atom model">Atom model</a>
+</h2>
+
+<p>An atom is an indivisible chuck of code or data. Typically each user
+written function or global variable is an atom. In addition, the compiler may
+emit other atoms, such as for literal c-strings or floating point constants, or
+for runtime data structures like dwarf unwind info or pointers to initializers.
+</p>
+
+<p>A simple "hello world" object file would be modeled like this:</p>
+<img src="hello.png" alt="hello world graphic"/>
+<p>There are two atoms: main and an anonymous atom containing the c-string
+literal "hello world". The Atom "main" has two fixups. One is the call site
+for the call to printf, and the other is a fixup for the instruction that loads
+the address of the c-string literal. </p>
+
+<br>
+<h2>
+ <a name="File model">File model</a>
+</h2>
+
+<p>The linker views the input files as basically containers of Atoms and Fixups,
+ and just a few attributes of their own. The linker works with three kinds
+of files: object files, static libraries, and dynamic libraries. Each kind
+of file has reader object which presents the file in the model expected by
+the linker.</p>
+<h4> <a>Object File</a>
+</h4>
+An object file is just a container of atoms. When linking with
+an object file, all atoms are added to the initial graph of atoms.
+
+<h4> <a>Static Library (Archive)</a>
+</h4>
+This is the traditional unix static archive which is just a collection of
+object files with a "table of contents". When linking with a static library,
+by default nothing is added to the initial graph of atoms. Instead, if there
+are unresolved references (dangling edges) in the master graph of all atoms,
+and the table of contents for a static library says that one of the object files
+in the library defines one of the missing symbols (dangling edge),
+the set of atoms from the specified object file in the static library is added
+to the master graph of atoms.
+
+<h4> <a>Dynamic Library (Shared Object)</a>
+</h4>
+Dynamic libraries are unique in that the don't directly add add any atoms.
+Their purpose is to check at build time that all references are resolved and
+provide a list of dynamic libraries (SO_NEEDED) that will be needed at runtime.
+The way this is modeled in the linker is that a dynamic library contributes
+no atoms to the initial graph of atoms. Instead, (like static libraries) if
+there are unresolved references (dangling edges) in the master graph of all atoms,
+if a dynamic library exports a required symbol, then a "proxy" atom is
+instantiated by the linker. The proxy atom allows the master atom graph to have
+all edges resolved and also records from which dynamic library a symbol came.</p>
+
+<br>
+<h2>
+ <a name="Linking Steps">Linking Steps</a>
+</h2>
+<p>Through the use of abstract Atoms, the core of linking is architecture
+independent and file format independent. All command line parsing is factored
+out into a separate "options" abstraction which enables the linker to be driven
+with different command line sets.</p>
+<p>The overall steps in linking are:<p>
+<ol>
+ <li>Command line processing</li>
+ <li>Parsing input files</li>
+ <li>Resolving</li>
+ <li>Passes/Optimizations</li>
+ <li>Generate output file</li>
+</ol>
+
+<p>The Resolving and Passes steps are done purely on the master graph of atoms,
+so they have no notion of file formats such as mach-o or ELF.</p>
+
+<h4> <a>Resolving</a>
+</h4>
+<p>The resolving step takes all the atoms graphs from each object file and
+combines them into one master object graph. Unfortunately, it is not as simple
+as appending the atom list from each file into one big list. There are many
+cases where atoms need to be coalesced. That is, two or more atoms need to
+be coalesced into one atom. This is necessary to support: C language
+ "tentative definitions", C++ weak symbols for templates and inlines defined
+in headers, and for merging copies of constants like c-strings and floating
+point constants.</p>
+
+<p>The linker support coalescing by-name and by-content. By-name is used for
+tentative definitions and weak symbols. By-content is used for constant data
+that can be merged. </p>
+
+<p>When one atom has a reference (FixUp) to another atom, there is also a binding
+type: by-name, direct, or indirect. A Fixup contains a tagged union that if
+the binding type is by-name, the union field is a pointer to a c-string. If
+the binding type is direct, the union is a pointer to an Atom. If the binding
+type is indirect, the union is a index into a table of pointers to Atoms. Below
+is a graphical representation of the binding types:</p>
+<img src="bindings.png" alt="binding types graphic"/>
+
+<p>Input file Atoms contain only direct and by-name references. Direct
+references are used for atoms defined in the same object file for which the
+target atom is either unnamed or cannot change. For instance, calling
+a static function in a translation unit will result in a direct reference
+to the static functions's atom. Also the FDE (dwarf unwind info) for a function
+has a direct reference to its function. On the other hand references to
+global symbols (e.g. call to printf) use by-name binding in object files.
+</p>
+
+<p>The resolving process maintains some global linking "state", including:
+a "symbol table" which is a map from c-string to Atom*, an indirect symbol
+table which is a growable array of Atom*, and for each kind of coalesable
+constants there is a content to Atom* map. With these data structures,
+the linker walks all atoms in all input files. For each
+atom, it checks if the atom should be in one symbol table or one of the
+coalescing tables. If so, it attempts to add the atom. If there already is
+a matching atom in that table, that means the current atom needs to be
+coalesced with the found atom.
+</p>
+
+<p>To support coalescing, all references to coalesable atoms are changed to
+indirect binding and an entry is added to the indirect table which points
+to the current chosen atom. When all input atoms have been processed by
+the resolver, there should be only direct and indirect bindings left. If
+there are any NULL entries in the indirect table, that means there are
+undefined references. The linker then looks to the supplied libraries (both
+static and dynamic) to resolve those references.
+</p>
+
+<p>Dead code stripping (if requested) is done at the end of resolving. The
+linker does a simple mark-and-sweep. It starts with "root" atoms (like "main"
+in a main executable) and follows each references and marks each Atom that
+it visits as "live". When done, all atoms not marked "live" are removed.
+</p>
+
+<h4> <a>Passes</a>
+</h4>
+<p>The Passes step
+is an open ended set of routines that each get a change to modify or enhance
+the master graph of atoms. Passes are only run if the master graph of
+atoms is completely resolved (no dangling edges).
+The current set of Passes in the Darwin linker are:</p>
+<ul>
+ <li>Objective-C optimizations (Apple)</li>
+ <li>stub (PLT) generation</li>
+ <li>GOT instantiation</li>
+ <li>TLV instantiation (Apple)</li>
+ <li>order_file optimization</li>
+ <li>branch island generation</li>
+ <li>branch shim generation</li>
+ <li>dtrace probe processing (Apple)</li>
+ <li>compact unwind encoding (Apple)</li>
+</ul>
+<p>Some of these passes are specific to Apple's runtime environments. But many
+of the passes are applicable to any OS (such as generating branch island for
+out of range branch instructions).</p>
+
+<p>The general structure of a pass is to walk the master graph inspecting each
+atom and doing something. For instance, the stub pass, walks the graph looking
+for atoms with call sites to proxy atoms (e.g. call to printf). It then
+instantiates a "stub" atom (PLT entry) and a "lazy pointer" atom for each
+proxy atom needed, and these new atoms are added to the master graph. Next
+all the noted call sites to proxy atoms are replaced with calls to the
+corresponding stub atom.</p>
+
+<h4><a>Generate Output File</a>
+</h4>
+<p>Once the passes are done, the output file generator is given a sorted list
+of atoms. Its job is to create the executable content file wrapper and place
+the content of the atoms into it.
+</p>
+
+
+<h2>
+ <a name="Future Directions">Future Directions</a>
+</h2>
+
+<h4><a>Sections</a>
+</h4>
+<p>The current use of sections in mach-o .o files over-constrains the linker.
+By default, the linker should preserve the section an atom is in. But since
+all sections must be contiguous in the output, that limits the ability of
+the linker to order atoms for locality. It would be helpful to enrich the
+object file with with reason something is in the section it is. For instance,
+is the section found at runtime? Or was the use of a section just a quick
+way to group some content together?
+</p>
+<p>The ELF model for sections is a little better than mach-o because ELF
+sections have write and execute bits, whereas mach-o sections must be in some
+segment and the segment has the write and execute bits.
+</p>
+
+<h4><a>Mach-o Object File Format</a>
+</h4>
+<p>
+The messiest part of the linker is the mach-o parser. This is because mach-o
+is a traditional section and symbols based file format. The parser must infer
+atom boundaries using two approaches. The first is that some section types have
+well defined content which the linker can parse into atoms (e.g. __cstring,
+__eh_frame). The other approach is a naming convention (which the compiler follows)
+by which the linker breaks sections into atoms at any non-local (not starting
+with 'L') symbol. The processing the linker has to do parse mach-o .o files is a
+significant part of the link time.
+</p>
+
+<p>Given that the assembler writes object files once, whereas the linker reads
+them many times (during development), it would make sense to optimize the object
+file format to be something the linker can read/parse efficiently.</p>
+
+<h4><a>New Object File Model</a>
+</h4>
+<p>LLVM has a nice model for its IR. There are three representations:
+the binary bit code file, the in-memory object model, and a textual
+representation. LLVM contains utility possible code for converting between these
+representations. The same model makes sense for atoms too. There should be
+three representations for atoms: binary file, in-memory, and textual. The Darwin
+linker already has an in-memory C++ object model for Atoms. All we need is a
+textual representation and binary file format.
+</p>
+<p>Note: in the darwin linker the binary format for input object files is
+independent of the output executable format. That is, we could have one
+universal object file format which the linker could use as input to produce
+mach-o, ELF, or PE executables.</p>
+<p>
+The object file binary format should be designed to instantiate into atoms
+as fast as possible. The obvious way to do that is that the
+file format would be an array of atoms. The linker just mmaps in the file and
+looks at the header to see how many atoms there and instantiate that many atoms
+with the atom attribute information coming from that array. The trick is
+designing this in a way that can be extended as the Atom mode evolves and new
+attributes are added.
+</p>
+<p>
+In designing a textual format we want something easy for humans to read and
+easy for the linker to parse. Since an atom has lots of attributes most of
+which are usually just the default, we should define default values for
+every attribute so that those can be omitted from the text representation.
+One possile format is YAML. Here is the atoms for a simple hello world
+program expressed in YAML.
+</p>
+<pre>
+---
+target-triple: x86_64-apple-darwin11
+source:
+
+atoms:
+ - name: _main
+ scope: linkage-unit
+ type: code
+ alignment:
+ power: 4
+ content: [ 55, 48, 89, e5, 48, 8d, 3d, 00, 00, 00, 00, 30, c0, e8, 00, 00,
+ 00, 00, 31, c0, 5d, c3 ]
+ fixups:
+ - offset: 07
+ kind: pcrel32
+ target: 2
+ - offset: 0E
+ kind: call32
+ target: _fprintf
+
+ - type: c-string
+ merge: by-content
+ content: [ 73, 5A, 00 ]
+
+...
+</pre>
+
+<p>One big use for the textual format will be writing test cases. The Darwin
+linker test suite test cases are written mostly in C/C++ and a few assembly
+files. The use of C means the same test case can be compiled for different
+architectures. But writing test cases in C is problematic because the compiler
+may vary its output over time for its own optimization reasons which my
+inadvertently disable or break the linker feature trying to be tested. By
+writing test cases in the linkers own textual format, we can exactly specify
+every attribute of every atom and thus target specific linker logic.
+</p>
+
+<h4><a>Debug Info</a>
+</h4>
+<p>Around 2005 when Apple switched from using STABS to using DWARF for debug
+information, we made a design decision to have the linker ignore DWARF in
+.o files. This improves linking performance because the linker is not
+copying tons of debug info. Instead, the linker adds "debug notes" into
+output binary that contain the paths of the original .o files. During development
+the Darwin debugger will notice the debug notes and the load the dwarf
+debug information from the original object files. For release builds,
+a tool named dsymutil is run on the program. It finds the debug notes and
+then the original object files, then reads, merges and optimizes all the dwarf
+debug information into one .dSYM file which can be loaded by the debugger
+if needed.</p>
+
+<p>The current way DWARF is generated is that all debug information for all
+functions in a translation unit are merged and optimized into sections based
+on debug info kind. For instance the mapping of instructions to source line
+numbers for all functions is compressed and put in one section. This does not
+play well in an Atom based file format. One idea is to have the compiler
+emit some intermediate representation debug information (one which is
+partitioned per atom) into the Atom based file format. The linker could
+then have code to convert that intermediate debug into to final dwarf.
+This is still an open question.</p>
+
+<h4><a>Extending Atom attributes to ELF and XCOFF</a>
+</h4>
+<p>The current set of attributes defined for Atoms in the darwin linker
+were chosen to meet the requirements of developing code to run on iOS and
+Mac OS X. Below is a list of the attributes and their possible values.
+It may just require adding more values to support ELF and XCOFF. Or there
+may need to be new attributes added to capture new functionality.
+</p>
+<ul>
+ <li>Name</li>
+ <li>Size</li>
+ <li>Section (I'd like to get rid of this)</li>
+ <li>ContentType (currently some of this comes from section)</li>
+ <ul>
+ <li>code</li>
+ <li>stub</li>
+ <li>data</li>
+ <li>zeroFill</li>
+ <li>initializerPointer</li>
+ <li>objc1Class</li>
+ <li>objc2Class</li>
+ <li>objcClassPointer</li>
+ <li>objc2CategoryList</li>
+ <li>non-lazy-pointer</li>
+ <li>lazy-pointer</li>
+ <li>constant</li>
+ <li>literal4</li>
+ <li>literal8</li>
+ <li>literal16</li>
+ <li>cstring</li>
+ <li>cstringPointer</li>
+ <li>utf16string</li>
+ <li>CFString</li>
+ <li>CFI</li>
+ <li>LSDA</li>
+ </ul>
+ </li>
+ <li>Scope
+ <ul>
+ <li>translationUnit (static functions)</li>
+ <li>linkageUnit (visibility hidden)</li>
+ <li>global</li>
+ </ul>
+ </li>
+ <li>DefinitionKind
+ <ul>
+ <li>regular</li>
+ <li>tentative (ANSI C feature)</li>
+ <li>absolute (assembly code feature)</li>
+ <li>proxy (stand-in for dynamic library symbol)</li>
+ </ul>
+ </li>
+ <li>Combine
+ <ul>
+ <li>never</li>
+ <li>byName (weak symbols)</li>
+ <li>byContent (simple constants)</li>
+ <li>byContentAndReferences (complex constants)</li>
+ </ul>
+ </li>
+ <li>SymbolTableStatus
+ <ul>
+ <li>In</li>
+ <li>notIn (anonymous)</li>
+ <li>inAsAbsolute (assembly code feature)</li>
+ <li>inAndNeverStrip (tell strip tool to leave)</li>
+ <li>inWithRandomName (mach-o .o feature)</li>
+ </ul>
+ <li>Alignment
+ <ul>
+ <li>powerOfTwo</li>
+ <li>modulus</li>
+ </ul>
+ <li>NeverDeadStrip (boolean)</li>
+ <li>IsThumb (ARM specific)</li>
+</ul>
+<p>Where does dllexport fit in here? Where does visibility protected and
+internal fit? Protected seems like scope=global plus the rule to not
+indirect references to it. Internal is like hidden plus enables some
+compiler optimizations. I'm not sure the linker needs to know about internal.
+</p>
+
+</body>
+</html>
+