ld64-133.3.tar.gz

[apple/ld64.git] / doc / design / linker.html
diff --git a/doc/design/linker.html b/doc/design/linker.html

new file mode 100644 (file)

index 0000000..0e39f2f
--- /dev/null
+++ b/doc/design/linker.html
@@ -0,0 +1,423 @@
+<html>
+<head>
+  <title>Linker</title>
+</head>
+<body>
+
+
+<h1>
+  Inside the Linker 
+</h1>
+<div class="doc_author">
+  <p>Written by <a href="mailto:kledzik@apple.com">Nick Kledzik</a></p>
+</div>
+
+
+<h2>
+  <a name="introduction">Introduction</a>
+</h2>
+
+<p>The Darwin linker is a new generation of linker.  It is not "section" based
+like traditional linkers which mostly just interlace sections from multiple
+object files into the output file.  The Darwin linker is based on "Atoms".
+Traditional section based linking work well for simple linking, but their model
+makes advanced linking features difficult to implement.  Features like dead code 
+stripping, reordering functions for locality, and C++ coalescing require the
+linker to work at a finer grain.
+</p>
+
+<p>An atom is an indivisible chunk of code or data.  An atom has a set of
+attributes, such as: name, scope, content-type, alignment, etc.  An atom also
+has a list of Fixups.  A Fixup contains: a kind, an optional offset, an optional
+addend, and an optional target atom.</p>
+
+<p>The Atom model allows the linker to use standard graph theory models for 
+linking data structures.  Each atom is a node, and each Fixup is an edge. 
+The feature of dead code stripping is implemented by following edges to mark
+all live atoms, and then delete the non-live atoms.</p>
+<br>
+<h2>
+  <a name="Atom model">Atom model</a>
+</h2>
+
+<p>An atom is an indivisible chuck of code or data.  Typically each user
+written function or global variable is an atom.  In addition, the compiler may
+emit other atoms, such as for literal c-strings or floating point constants, or
+for runtime data structures like dwarf unwind info or pointers to initializers.
+</p>
+
+<p>A simple "hello world" object file would be modeled like this:</p>
+<img src="hello.png" alt="hello world graphic"/>
+<p>There are two atoms: main and an anonymous atom containing the c-string
+literal "hello world".  The Atom "main" has two fixups.  One is the call site
+for the call to printf, and the other is a fixup for the instruction that loads
+the address of the c-string literal. </p>
+
+<br>
+<h2>
+  <a name="File model">File model</a>
+</h2>
+
+<p>The linker views the input files as basically containers of Atoms and Fixups,
+ and just a few attributes of their own.  The linker works with three kinds
+of files: object files, static libraries, and dynamic libraries.  Each kind
+of file has reader object which presents the file in the model expected by 
+the linker.</p>
+<h4> <a>Object File</a> 
+</h4>
+An object file is just a container of atoms.  When linking with
+an object file, all atoms are added to the initial graph of atoms.
+
+<h4> <a>Static Library (Archive)</a> 
+</h4>
+This is the traditional unix static archive which is just a collection of
+object files with a "table of contents". When linking with a static library,
+by default nothing is added to the initial graph of atoms. Instead, if there
+are unresolved references (dangling edges) in the master graph of all atoms, 
+and the table of contents for a static library says that one of the object files 
+in the library defines one of the missing symbols (dangling edge), 
+the set of atoms from the specified object file in the static library is added 
+to the master graph of atoms. 
+
+<h4> <a>Dynamic Library (Shared Object)</a> 
+</h4>
+Dynamic libraries are unique in that the don't directly add add any atoms.  
+Their purpose is to check at build time that all references are resolved and
+provide a list of dynamic libraries (SO_NEEDED) that will be needed at runtime. 
+The way this is modeled in the linker is that a dynamic library contributes
+no atoms to the initial graph of atoms.  Instead, (like static libraries) if
+there are unresolved references (dangling edges) in the master graph of all atoms, 
+if a dynamic library exports a required symbol, then a "proxy" atom is 
+instantiated by the linker.  The proxy atom allows the master atom graph to have
+all edges resolved and also records from which dynamic library a symbol came.</p>
+
+<br>
+<h2>
+  <a name="Linking Steps">Linking Steps</a>
+</h2>
+<p>Through the use of abstract Atoms, the core of linking is architecture 
+independent and file format independent.  All command line parsing is factored
+out into a separate "options" abstraction which enables the linker to be driven
+with different command line sets.</p>
+<p>The overall steps in linking are:<p>
+<ol>
+  <li>Command line processing</li>
+  <li>Parsing input files</li>
+  <li>Resolving</li>
+  <li>Passes/Optimizations</li>
+  <li>Generate output file</li>
+</ol>
+
+<p>The Resolving and Passes steps are done purely on the master graph of atoms, 
+so they have no notion of file formats such as mach-o or ELF.</p>
+
+<h4> <a>Resolving</a> 
+</h4>
+<p>The resolving step takes all the atoms graphs from each object file and 
+combines them into one master object graph.  Unfortunately, it is not as simple
+as appending the atom list from each file into one big list.  There are many
+cases where atoms need to be coalesced.  That is, two or more atoms need to 
+be coalesced into one atom.  This is necessary to support: C language
+ "tentative definitions", C++ weak symbols for templates and inlines defined
+in headers, and for merging copies of constants like c-strings and floating
+point constants.</p>
+
+<p>The linker support coalescing by-name and by-content. By-name is used for
+tentative definitions and weak symbols.  By-content is used for constant data
+that can be merged. </p>
+
+<p>When one atom has a reference (FixUp) to another atom, there is also a binding
+type: by-name, direct, or indirect. A Fixup contains a tagged union that if
+the binding type is by-name, the union field is a pointer to a c-string.  If
+the binding type is direct, the union is a pointer to an Atom.  If the binding
+type is indirect, the union is a index into a table of pointers to Atoms. Below
+is a graphical representation of the binding types:</p>
+<img src="bindings.png" alt="binding types graphic"/>
+
+<p>Input file Atoms contain only direct and by-name references.  Direct 
+references are used for atoms defined in the same object file for which the 
+target atom is either unnamed or cannot change.  For instance, calling 
+a static function in a translation unit will result in a direct reference 
+to the static functions's atom.  Also the FDE (dwarf unwind info) for a function
+has a direct reference to its function.  On the other hand references to 
+global symbols (e.g. call to printf) use by-name binding in object files.
+</p>
+
+<p>The resolving process maintains some global linking "state", including:
+a "symbol table" which is a map from c-string to Atom*, an indirect symbol
+table which is a growable array of Atom*, and for each kind of coalesable
+constants there is a content to Atom* map.  With these data structures,
+the linker walks all atoms in all input files. For each
+atom, it checks if the atom should be in one symbol table or one of the 
+coalescing tables.  If so, it attempts to add the atom.  If there already is
+a matching atom in that table, that means the current atom needs to be 
+coalesced with the found atom.  
+</p>
+
+<p>To support coalescing, all references to coalesable atoms are changed to
+indirect binding and an entry is added to the indirect table which points
+to the current chosen atom.  When all input atoms have been processed by
+the resolver, there should be only direct and indirect bindings left.  If
+there are any NULL entries in the indirect table, that means there are  
+undefined references.  The linker then looks to the supplied libraries (both
+static and dynamic) to resolve those references.  
+</p>
+
+<p>Dead code stripping (if requested) is done at the end of resolving.  The
+linker does a simple mark-and-sweep. It starts with "root" atoms (like "main"
+in a main executable) and follows each references and marks each Atom that
+it visits as "live".  When done, all atoms not marked "live" are removed.
+</p>
+
+<h4> <a>Passes</a> 
+</h4>
+<p>The Passes step
+is an open ended set of routines that each get a change to modify or enhance
+the master graph of atoms. Passes are only run if the master graph of 
+atoms is completely resolved (no dangling edges). 
+The current set of Passes in the Darwin linker are:</p>
+<ul>
+  <li>Objective-C optimizations (Apple)</li>
+  <li>stub (PLT) generation</li>
+  <li>GOT instantiation</li>
+  <li>TLV instantiation (Apple)</li>
+  <li>order_file optimization</li>
+  <li>branch island generation</li>
+  <li>branch shim generation</li>
+  <li>dtrace probe processing (Apple)</li>
+  <li>compact unwind encoding (Apple)</li>
+</ul>
+<p>Some of these passes are specific to Apple's runtime environments.  But many
+of the passes are applicable to any OS (such as generating branch island for 
+out of range branch instructions).</p>
+
+<p>The general structure of a pass is to walk the master graph inspecting each
+atom and doing something.  For instance, the stub pass, walks the graph looking
+for atoms with call sites to proxy atoms (e.g. call to printf).  It then
+instantiates a "stub" atom (PLT entry) and a "lazy pointer" atom for each 
+proxy atom needed, and these new atoms are added to the master graph.  Next
+all the noted call sites to proxy atoms are replaced with calls to the 
+corresponding stub atom.</p>  
+
+<h4><a>Generate Output File</a> 
+</h4>
+<p>Once the passes are done, the output file generator is given a sorted list
+of atoms.  Its job is to create the executable content file wrapper and place
+the content of the atoms into it. 
+</p>
+
+
+<h2>
+  <a name="Future Directions">Future Directions</a>
+</h2>
+
+<h4><a>Sections</a> 
+</h4>
+<p>The current use of sections in mach-o .o files over-constrains the linker.
+By default, the linker should preserve the section an atom is in.  But since
+all sections must be contiguous in the output, that limits the ability of
+the linker to order atoms for locality.  It would be helpful to enrich the
+object file with with reason something is in the section it is.  For instance,
+is the section found at runtime? Or was the use of a section just a quick
+way to group some content together?
+</p>
+<p>The ELF model for sections is a little better than mach-o because ELF
+sections have write and execute bits, whereas mach-o sections must be in some
+segment and the segment has the write and execute bits.  
+</p>
+
+<h4><a>Mach-o Object File Format</a> 
+</h4>
+<p>
+The messiest part of the linker is the mach-o parser. This is because mach-o
+is a traditional section and symbols based file format.  The parser must infer
+atom boundaries using two approaches.  The first is that some section types have  
+well defined content which the linker can parse into atoms (e.g.  __cstring, 
+__eh_frame). The other approach is a naming convention (which the compiler follows)
+by which the linker breaks sections into atoms at any non-local (not starting 
+with 'L') symbol. The processing the linker has to do parse mach-o .o files is a
+significant part of the link time. 
+</p>
+
+<p>Given that the assembler writes object files once, whereas the linker reads
+them many times (during development), it would make sense to optimize the object
+file format to be something the linker can read/parse efficiently.</p>  
+
+<h4><a>New Object File Model</a> 
+</h4>
+<p>LLVM has a nice model for its IR.  There are three representations:
+the binary bit code file, the in-memory object model, and a textual 
+representation.  LLVM contains utility possible code for converting between these
+representations.  The same model makes sense for atoms too.  There should be
+three representations for atoms: binary file, in-memory, and textual. The Darwin 
+linker already has an in-memory C++ object model for Atoms.  All we need is a 
+textual representation and binary file format.
+</p>
+<p>Note: in the darwin linker the binary format for input object files is  
+independent of the output executable format.  That is, we could have one 
+universal object file format which the linker could use as input to produce 
+mach-o, ELF, or PE executables.</p>
+<p>
+The object file binary format should be designed to instantiate into atoms
+as fast as possible.  The obvious way to do that is that the 
+file format would be an array of atoms.  The linker just mmaps in the file and
+looks at the header to see how many atoms there and instantiate that many atoms
+with the atom attribute information coming from that array.  The trick is 
+designing this in a way that can be extended as the Atom mode evolves and new
+attributes are added.
+</p>
+<p>
+In designing a textual format we want something easy for humans to read and
+easy for the linker to parse.  Since an atom has lots of attributes most of
+which are usually just the default, we should define default values for 
+every attribute so that those can be omitted from the text representation.
+One possile format is YAML.  Here is the atoms for a simple hello world
+program expressed in YAML.
+</p>
+<pre>
+---
+target-triple:   x86_64-apple-darwin11
+source:
+
+atoms:
+    - name:    _main
+      scope:   linkage-unit
+      type:    code
+      alignment: 
+          power: 4
+      content: [ 55, 48, 89, e5, 48, 8d, 3d, 00, 00, 00, 00, 30, c0, e8, 00, 00,
+                 00, 00, 31, c0, 5d, c3 ]
+      fixups:
+      - offset: 07
+        kind:   pcrel32
+        target: 2
+      - offset: 0E
+        kind:   call32
+        target: _fprintf
+
+    - type:    c-string
+      merge:   by-content
+      content: [ 73, 5A, 00 ]
+
+...
+</pre>
+
+<p>One big use for the textual format will be writing test cases. The Darwin
+linker test suite test cases are written mostly in C/C++ and a few assembly
+files.  The use of C means the same test case can be compiled for different
+architectures.  But writing test cases in C is problematic because the compiler 
+may vary its output over time for its own optimization reasons which my 
+inadvertently disable or break the linker feature trying to be tested. By 
+writing test cases in the linkers own textual format, we can exactly specify 
+every attribute of every atom and thus target specific linker logic.
+</p>
+
+<h4><a>Debug Info</a> 
+</h4>
+<p>Around 2005 when Apple switched from using STABS to using DWARF for debug 
+information, we made a design decision to have the linker ignore DWARF in
+.o files.  This improves linking performance because the linker is not
+copying tons of debug info.  Instead, the linker adds "debug notes" into
+output binary that contain the paths of the original .o files. During development
+the Darwin debugger will notice the debug notes and the load the dwarf
+debug information from the original object files.  For release builds,
+a tool named dsymutil is run on the program.  It finds the debug notes and
+then the original object files, then reads, merges and optimizes all the dwarf
+debug information into one .dSYM file which can be loaded by the debugger
+if needed.</p>
+
+<p>The current way DWARF is generated is that all debug information for all
+functions in a translation unit are merged and optimized into sections based 
+on debug info kind.  For instance the mapping of instructions to source line
+numbers for all functions is compressed and put in one section. This does not
+play well in an Atom based file format.  One idea is to have the compiler
+emit some intermediate representation debug information (one which is 
+partitioned per atom) into the Atom based file format.  The linker could 
+then have code to convert that intermediate debug into to final dwarf.
+This is still an open question.</p>
+
+<h4><a>Extending Atom attributes to ELF and XCOFF</a> 
+</h4>
+<p>The current set of attributes defined for Atoms in the darwin linker
+were chosen to meet the requirements of developing code to run on iOS and 
+Mac OS X.  Below is a list of the attributes and their possible values.
+It may just require adding more values to support ELF and XCOFF.  Or there
+may need to be new attributes added to capture new functionality.
+</p>
+<ul>
+  <li>Name</li>
+  <li>Size</li>
+  <li>Section (I'd like to get rid of this)</li>
+  <li>ContentType (currently some of this comes from section)</li>
+  <ul>
+         <li>code</li>
+         <li>stub</li>
+         <li>data</li>
+         <li>zeroFill</li>
+         <li>initializerPointer</li>
+         <li>objc1Class</li>
+         <li>objc2Class</li>
+         <li>objcClassPointer</li>
+         <li>objc2CategoryList</li>
+         <li>non-lazy-pointer</li>
+         <li>lazy-pointer</li>
+         <li>constant</li>
+         <li>literal4</li>
+         <li>literal8</li>
+         <li>literal16</li>
+         <li>cstring</li>
+         <li>cstringPointer</li>
+         <li>utf16string</li>
+         <li>CFString</li>
+         <li>CFI</li>
+         <li>LSDA</li>
+         </ul>
+  </li>
+  <li>Scope
+  <ul>
+         <li>translationUnit  (static functions)</li>
+         <li>linkageUnit      (visibility hidden)</li>
+         <li>global</li>
+         </ul>
+  </li>
+  <li>DefinitionKind
+  <ul>
+         <li>regular</li>
+         <li>tentative         (ANSI C feature)</li>
+         <li>absolute          (assembly code feature)</li>
+         <li>proxy             (stand-in for dynamic library symbol)</li>
+  </ul>
+  </li>
+  <li>Combine
+  <ul>
+         <li>never</li>
+         <li>byName          (weak symbols)</li>
+         <li>byContent       (simple constants)</li>
+         <li>byContentAndReferences (complex constants)</li>
+  </ul>
+  </li>
+  <li>SymbolTableStatus
+  <ul>
+         <li>In</li>
+         <li>notIn              (anonymous)</li>
+         <li>inAsAbsolute       (assembly code feature)</li>
+         <li>inAndNeverStrip    (tell strip tool to leave)</li>
+         <li>inWithRandomName   (mach-o .o feature)</li>
+  </ul>
+  <li>Alignment
+  <ul>
+         <li>powerOfTwo</li>
+         <li>modulus</li>
+  </ul>
+  <li>NeverDeadStrip (boolean)</li>
+  <li>IsThumb (ARM specific)</li>
+</ul>
+<p>Where does dllexport fit in here?  Where does visibility protected and 
+internal fit?  Protected seems like scope=global plus the rule to not 
+indirect references to it.  Internal is like hidden plus enables some
+compiler optimizations.  I'm not sure the linker needs to know about internal.
+</p>
+
+</body>
+</html>
+