doc/design/linker.html

   1 <html>
   2 <head>
   3   <title>Linker</title>
   4 </head>
   5 <body>
   6
   7
   8 <h1>
   9   Inside the Linker
  10 </h1>
  11 <div class="doc_author">
  12   <p>Written by <a href="mailto:kledzik@apple.com">Nick Kledzik</a></p>
  13 </div>
  14
  15
  16 <h2>
  17   <a name="introduction">Introduction</a>
  18 </h2>
  19
  20 <p>The Darwin linker is a new generation of linker.  It is not "section" based
  21 like traditional linkers which mostly just interlace sections from multiple
  22 object files into the output file.  The Darwin linker is based on "Atoms".
  23 Traditional section based linking work well for simple linking, but their model
  24 makes advanced linking features difficult to implement.  Features like dead code
  25 stripping, reordering functions for locality, and C++ coalescing require the
  26 linker to work at a finer grain.
  27 </p>
  28
  29 <p>An atom is an indivisible chunk of code or data.  An atom has a set of
  30 attributes, such as: name, scope, content-type, alignment, etc.  An atom also
  31 has a list of Fixups.  A Fixup contains: a kind, an optional offset, an optional
  32 addend, and an optional target atom.</p>
  33
  34 <p>The Atom model allows the linker to use standard graph theory models for
  35 linking data structures.  Each atom is a node, and each Fixup is an edge.
  36 The feature of dead code stripping is implemented by following edges to mark
  37 all live atoms, and then delete the non-live atoms.</p>
  38 <br>
  39 <h2>
  40   <a name="Atom model">Atom model</a>
  41 </h2>
  42
  43 <p>An atom is an indivisible chuck of code or data.  Typically each user
  44 written function or global variable is an atom.  In addition, the compiler may
  45 emit other atoms, such as for literal c-strings or floating point constants, or
  46 for runtime data structures like dwarf unwind info or pointers to initializers.
  47 </p>
  48
  49 <p>A simple "hello world" object file would be modeled like this:</p>
  50 <img src="hello.png" alt="hello world graphic"/>
  51 <p>There are two atoms: main and an anonymous atom containing the c-string
  52 literal "hello world".  The Atom "main" has two fixups.  One is the call site
  53 for the call to printf, and the other is a fixup for the instruction that loads
  54 the address of the c-string literal. </p>
  55
  56 <br>
  57 <h2>
  58   <a name="File model">File model</a>
  59 </h2>
  60
  61 <p>The linker views the input files as basically containers of Atoms and Fixups,
  62  and just a few attributes of their own.  The linker works with three kinds
  63 of files: object files, static libraries, and dynamic libraries.  Each kind
  64 of file has reader object which presents the file in the model expected by
  65 the linker.</p>
  66 <h4> <a>Object File</a>
  67 </h4>
  68 An object file is just a container of atoms.  When linking with
  69 an object file, all atoms are added to the initial graph of atoms.
  70
  71 <h4> <a>Static Library (Archive)</a>
  72 </h4>
  73 This is the traditional unix static archive which is just a collection of
  74 object files with a "table of contents". When linking with a static library,
  75 by default nothing is added to the initial graph of atoms. Instead, if there
  76 are unresolved references (dangling edges) in the master graph of all atoms,
  77 and the table of contents for a static library says that one of the object files
  78 in the library defines one of the missing symbols (dangling edge),
  79 the set of atoms from the specified object file in the static library is added
  80 to the master graph of atoms.
  81
  82 <h4> <a>Dynamic Library (Shared Object)</a>
  83 </h4>
  84 Dynamic libraries are unique in that the don't directly add add any atoms.
  85 Their purpose is to check at build time that all references are resolved and
  86 provide a list of dynamic libraries (SO_NEEDED) that will be needed at runtime.
  87 The way this is modeled in the linker is that a dynamic library contributes
  88 no atoms to the initial graph of atoms.  Instead, (like static libraries) if
  89 there are unresolved references (dangling edges) in the master graph of all atoms,
  90 if a dynamic library exports a required symbol, then a "proxy" atom is
  91 instantiated by the linker.  The proxy atom allows the master atom graph to have
  92 all edges resolved and also records from which dynamic library a symbol came.</p>
  93
  94 <br>
  95 <h2>
  96   <a name="Linking Steps">Linking Steps</a>
  97 </h2>
  98 <p>Through the use of abstract Atoms, the core of linking is architecture
  99 independent and file format independent.  All command line parsing is factored
 100 out into a separate "options" abstraction which enables the linker to be driven
 101 with different command line sets.</p>
 102 <p>The overall steps in linking are:<p>
 103 <ol>
 104   <li>Command line processing</li>
 105   <li>Parsing input files</li>
 106   <li>Resolving</li>
 107   <li>Passes/Optimizations</li>
 108   <li>Generate output file</li>
 109 </ol>
 110
 111 <p>The Resolving and Passes steps are done purely on the master graph of atoms,
 112 so they have no notion of file formats such as mach-o or ELF.</p>
 113
 114 <h4> <a>Resolving</a>
 115 </h4>
 116 <p>The resolving step takes all the atoms graphs from each object file and
 117 combines them into one master object graph.  Unfortunately, it is not as simple
 118 as appending the atom list from each file into one big list.  There are many
 119 cases where atoms need to be coalesced.  That is, two or more atoms need to
 120 be coalesced into one atom.  This is necessary to support: C language
 121  "tentative definitions", C++ weak symbols for templates and inlines defined
 122 in headers, and for merging copies of constants like c-strings and floating
 123 point constants.</p>
 124
 125 <p>The linker support coalescing by-name and by-content. By-name is used for
 126 tentative definitions and weak symbols.  By-content is used for constant data
 127 that can be merged. </p>
 128
 129 <p>When one atom has a reference (FixUp) to another atom, there is also a binding
 130 type: by-name, direct, or indirect. A Fixup contains a tagged union that if
 131 the binding type is by-name, the union field is a pointer to a c-string.  If
 132 the binding type is direct, the union is a pointer to an Atom.  If the binding
 133 type is indirect, the union is a index into a table of pointers to Atoms. Below
 134 is a graphical representation of the binding types:</p>
 135 <img src="bindings.png" alt="binding types graphic"/>
 136
 137 <p>Input file Atoms contain only direct and by-name references.  Direct
 138 references are used for atoms defined in the same object file for which the
 139 target atom is either unnamed or cannot change.  For instance, calling
 140 a static function in a translation unit will result in a direct reference
 141 to the static functions's atom.  Also the FDE (dwarf unwind info) for a function
 142 has a direct reference to its function.  On the other hand references to
 143 global symbols (e.g. call to printf) use by-name binding in object files.
 144 </p>
 145
 146 <p>The resolving process maintains some global linking "state", including:
 147 a "symbol table" which is a map from c-string to Atom*, an indirect symbol
 148 table which is a growable array of Atom*, and for each kind of coalesable
 149 constants there is a content to Atom* map.  With these data structures,
 150 the linker walks all atoms in all input files. For each
 151 atom, it checks if the atom should be in one symbol table or one of the
 152 coalescing tables.  If so, it attempts to add the atom.  If there already is
 153 a matching atom in that table, that means the current atom needs to be
 154 coalesced with the found atom.
 155 </p>
 156
 157 <p>To support coalescing, all references to coalesable atoms are changed to
 158 indirect binding and an entry is added to the indirect table which points
 159 to the current chosen atom.  When all input atoms have been processed by
 160 the resolver, there should be only direct and indirect bindings left.  If
 161 there are any NULL entries in the indirect table, that means there are
 162 undefined references.  The linker then looks to the supplied libraries (both
 163 static and dynamic) to resolve those references.
 164 </p>
 165
 166 <p>Dead code stripping (if requested) is done at the end of resolving.  The
 167 linker does a simple mark-and-sweep. It starts with "root" atoms (like "main"
 168 in a main executable) and follows each references and marks each Atom that
 169 it visits as "live".  When done, all atoms not marked "live" are removed.
 170 </p>
 171
 172 <h4> <a>Passes</a>
 173 </h4>
 174 <p>The Passes step
 175 is an open ended set of routines that each get a change to modify or enhance
 176 the master graph of atoms. Passes are only run if the master graph of
 177 atoms is completely resolved (no dangling edges).
 178 The current set of Passes in the Darwin linker are:</p>
 179 <ul>
 180   <li>Objective-C optimizations (Apple)</li>
 181   <li>stub (PLT) generation</li>
 182   <li>GOT instantiation</li>
 183   <li>TLV instantiation (Apple)</li>
 184   <li>order_file optimization</li>
 185   <li>branch island generation</li>
 186   <li>branch shim generation</li>
 187   <li>dtrace probe processing (Apple)</li>
 188   <li>compact unwind encoding (Apple)</li>
 189 </ul>
 190 <p>Some of these passes are specific to Apple's runtime environments.  But many
 191 of the passes are applicable to any OS (such as generating branch island for
 192 out of range branch instructions).</p>
 193
 194 <p>The general structure of a pass is to walk the master graph inspecting each
 195 atom and doing something.  For instance, the stub pass, walks the graph looking
 196 for atoms with call sites to proxy atoms (e.g. call to printf).  It then
 197 instantiates a "stub" atom (PLT entry) and a "lazy pointer" atom for each
 198 proxy atom needed, and these new atoms are added to the master graph.  Next
 199 all the noted call sites to proxy atoms are replaced with calls to the
 200 corresponding stub atom.</p>
 201
 202 <h4><a>Generate Output File</a>
 203 </h4>
 204 <p>Once the passes are done, the output file generator is given a sorted list
 205 of atoms.  Its job is to create the executable content file wrapper and place
 206 the content of the atoms into it.
 207 </p>
 208
 209
 210 <h2>
 211   <a name="Future Directions">Future Directions</a>
 212 </h2>
 213
 214 <h4><a>Sections</a>
 215 </h4>
 216 <p>The current use of sections in mach-o .o files over-constrains the linker.
 217 By default, the linker should preserve the section an atom is in.  But since
 218 all sections must be contiguous in the output, that limits the ability of
 219 the linker to order atoms for locality.  It would be helpful to enrich the
 220 object file with with reason something is in the section it is.  For instance,
 221 is the section found at runtime? Or was the use of a section just a quick
 222 way to group some content together?
 223 </p>
 224 <p>The ELF model for sections is a little better than mach-o because ELF
 225 sections have write and execute bits, whereas mach-o sections must be in some
 226 segment and the segment has the write and execute bits.
 227 </p>
 228
 229 <h4><a>Mach-o Object File Format</a>
 230 </h4>
 231 <p>
 232 The messiest part of the linker is the mach-o parser. This is because mach-o
 233 is a traditional section and symbols based file format.  The parser must infer
 234 atom boundaries using two approaches.  The first is that some section types have
 235 well defined content which the linker can parse into atoms (e.g.  __cstring,
 236 __eh_frame). The other approach is a naming convention (which the compiler follows)
 237 by which the linker breaks sections into atoms at any non-local (not starting
 238 with 'L') symbol. The processing the linker has to do parse mach-o .o files is a
 239 significant part of the link time.
 240 </p>
 241
 242 <p>Given that the assembler writes object files once, whereas the linker reads
 243 them many times (during development), it would make sense to optimize the object
 244 file format to be something the linker can read/parse efficiently.</p>
 245
 246 <h4><a>New Object File Model</a>
 247 </h4>
 248 <p>LLVM has a nice model for its IR.  There are three representations:
 249 the binary bit code file, the in-memory object model, and a textual
 250 representation.  LLVM contains utility possible code for converting between these
 251 representations.  The same model makes sense for atoms too.  There should be
 252 three representations for atoms: binary file, in-memory, and textual. The Darwin
 253 linker already has an in-memory C++ object model for Atoms.  All we need is a
 254 textual representation and binary file format.
 255 </p>
 256 <p>Note: in the darwin linker the binary format for input object files is
 257 independent of the output executable format.  That is, we could have one
 258 universal object file format which the linker could use as input to produce
 259 mach-o, ELF, or PE executables.</p>
 260 <p>
 261 The object file binary format should be designed to instantiate into atoms
 262 as fast as possible.  The obvious way to do that is that the
 263 file format would be an array of atoms.  The linker just mmaps in the file and
 264 looks at the header to see how many atoms there and instantiate that many atoms
 265 with the atom attribute information coming from that array.  The trick is
 266 designing this in a way that can be extended as the Atom mode evolves and new
 267 attributes are added.
 268 </p>
 269 <p>
 270 In designing a textual format we want something easy for humans to read and
 271 easy for the linker to parse.  Since an atom has lots of attributes most of
 272 which are usually just the default, we should define default values for
 273 every attribute so that those can be omitted from the text representation.
 274 One possile format is YAML.  Here is the atoms for a simple hello world
 275 program expressed in YAML.
 276 </p>
 277 <pre>
 278 ---
 279 target-triple:   x86_64-apple-darwin11
 280 source:
 281
 282 atoms:
 283     - name:    _main
 284       scope:   linkage-unit
 285       type:    code
 286       alignment:
 287           power: 4
 288       content: [ 55, 48, 89, e5, 48, 8d, 3d, 00, 00, 00, 00, 30, c0, e8, 00, 00,
 289                  00, 00, 31, c0, 5d, c3 ]
 290       fixups:
 291       - offset: 07
 292         kind:   pcrel32
 293         target: 2
 294       - offset: 0E
 295         kind:   call32
 296         target: _fprintf
 297
 298     - type:    c-string
 299       merge:   by-content
 300       content: [ 73, 5A, 00 ]
 301
 302 ...
 303 </pre>
 304
 305 <p>One big use for the textual format will be writing test cases. The Darwin
 306 linker test suite test cases are written mostly in C/C++ and a few assembly
 307 files.  The use of C means the same test case can be compiled for different
 308 architectures.  But writing test cases in C is problematic because the compiler
 309 may vary its output over time for its own optimization reasons which my
 310 inadvertently disable or break the linker feature trying to be tested. By
 311 writing test cases in the linkers own textual format, we can exactly specify
 312 every attribute of every atom and thus target specific linker logic.
 313 </p>
 314
 315 <h4><a>Debug Info</a>
 316 </h4>
 317 <p>Around 2005 when Apple switched from using STABS to using DWARF for debug
 318 information, we made a design decision to have the linker ignore DWARF in
 319 .o files.  This improves linking performance because the linker is not
 320 copying tons of debug info.  Instead, the linker adds "debug notes" into
 321 output binary that contain the paths of the original .o files. During development
 322 the Darwin debugger will notice the debug notes and the load the dwarf
 323 debug information from the original object files.  For release builds,
 324 a tool named dsymutil is run on the program.  It finds the debug notes and
 325 then the original object files, then reads, merges and optimizes all the dwarf
 326 debug information into one .dSYM file which can be loaded by the debugger
 327 if needed.</p>
 328
 329 <p>The current way DWARF is generated is that all debug information for all
 330 functions in a translation unit are merged and optimized into sections based
 331 on debug info kind.  For instance the mapping of instructions to source line
 332 numbers for all functions is compressed and put in one section. This does not
 333 play well in an Atom based file format.  One idea is to have the compiler
 334 emit some intermediate representation debug information (one which is
 335 partitioned per atom) into the Atom based file format.  The linker could
 336 then have code to convert that intermediate debug into to final dwarf.
 337 This is still an open question.</p>
 338
 339 <h4><a>Extending Atom attributes to ELF and XCOFF</a>
 340 </h4>
 341 <p>The current set of attributes defined for Atoms in the darwin linker
 342 were chosen to meet the requirements of developing code to run on iOS and
 343 Mac OS X.  Below is a list of the attributes and their possible values.
 344 It may just require adding more values to support ELF and XCOFF.  Or there
 345 may need to be new attributes added to capture new functionality.
 346 </p>
 347 <ul>
 348   <li>Name</li>
 349   <li>Size</li>
 350   <li>Section (I'd like to get rid of this)</li>
 351   <li>ContentType (currently some of this comes from section)</li>
 352   <ul>
 353           <li>code</li>
 354           <li>stub</li>
 355           <li>data</li>
 356           <li>zeroFill</li>
 357           <li>initializerPointer</li>
 358           <li>objc1Class</li>
 359           <li>objc2Class</li>
 360           <li>objcClassPointer</li>
 361           <li>objc2CategoryList</li>
 362           <li>non-lazy-pointer</li>
 363           <li>lazy-pointer</li>
 364           <li>constant</li>
 365           <li>literal4</li>
 366           <li>literal8</li>
 367           <li>literal16</li>
 368           <li>cstring</li>
 369           <li>cstringPointer</li>
 370           <li>utf16string</li>
 371           <li>CFString</li>
 372           <li>CFI</li>
 373           <li>LSDA</li>
 374           </ul>
 375   </li>
 376   <li>Scope
 377   <ul>
 378           <li>translationUnit  (static functions)</li>
 379           <li>linkageUnit      (visibility hidden)</li>
 380           <li>global</li>
 381           </ul>
 382   </li>
 383   <li>DefinitionKind
 384   <ul>
 385           <li>regular</li>
 386           <li>tentative         (ANSI C feature)</li>
 387           <li>absolute          (assembly code feature)</li>
 388           <li>proxy             (stand-in for dynamic library symbol)</li>
 389   </ul>
 390   </li>
 391   <li>Combine
 392   <ul>
 393           <li>never</li>
 394           <li>byName          (weak symbols)</li>
 395           <li>byContent       (simple constants)</li>
 396           <li>byContentAndReferences (complex constants)</li>
 397   </ul>
 398   </li>
 399   <li>SymbolTableStatus
 400   <ul>
 401           <li>In</li>
 402           <li>notIn              (anonymous)</li>
 403           <li>inAsAbsolute       (assembly code feature)</li>
 404           <li>inAndNeverStrip    (tell strip tool to leave)</li>
 405           <li>inWithRandomName   (mach-o .o feature)</li>
 406   </ul>
 407   <li>Alignment
 408   <ul>
 409           <li>powerOfTwo</li>
 410           <li>modulus</li>
 411   </ul>
 412   <li>NeverDeadStrip (boolean)</li>
 413   <li>IsThumb (ARM specific)</li>
 414 </ul>
 415 <p>Where does dllexport fit in here?  Where does visibility protected and
 416 internal fit?  Protected seems like scope=global plus the rule to not
 417 indirect references to it.  Internal is like hidden plus enables some
 418 compiler optimizations.  I'm not sure the linker needs to know about internal.
 419 </p>
 420
 421 </body>
 422 </html>
 423