]> git.saurik.com Git - apple/ld64.git/blame - doc/design/linker.html
ld64-133.3.tar.gz
[apple/ld64.git] / doc / design / linker.html
CommitLineData
ebf6f434
A
1<html>
2<head>
3 <title>Linker</title>
4</head>
5<body>
6
7
8<h1>
9 Inside the Linker
10</h1>
11<div class="doc_author">
12 <p>Written by <a href="mailto:kledzik@apple.com">Nick Kledzik</a></p>
13</div>
14
15
16<h2>
17 <a name="introduction">Introduction</a>
18</h2>
19
20<p>The Darwin linker is a new generation of linker. It is not "section" based
21like traditional linkers which mostly just interlace sections from multiple
22object files into the output file. The Darwin linker is based on "Atoms".
23Traditional section based linking work well for simple linking, but their model
24makes advanced linking features difficult to implement. Features like dead code
25stripping, reordering functions for locality, and C++ coalescing require the
26linker to work at a finer grain.
27</p>
28
29<p>An atom is an indivisible chunk of code or data. An atom has a set of
30attributes, such as: name, scope, content-type, alignment, etc. An atom also
31has a list of Fixups. A Fixup contains: a kind, an optional offset, an optional
32addend, and an optional target atom.</p>
33
34<p>The Atom model allows the linker to use standard graph theory models for
35linking data structures. Each atom is a node, and each Fixup is an edge.
36The feature of dead code stripping is implemented by following edges to mark
37all live atoms, and then delete the non-live atoms.</p>
38<br>
39<h2>
40 <a name="Atom model">Atom model</a>
41</h2>
42
43<p>An atom is an indivisible chuck of code or data. Typically each user
44written function or global variable is an atom. In addition, the compiler may
45emit other atoms, such as for literal c-strings or floating point constants, or
46for runtime data structures like dwarf unwind info or pointers to initializers.
47</p>
48
49<p>A simple "hello world" object file would be modeled like this:</p>
50<img src="hello.png" alt="hello world graphic"/>
51<p>There are two atoms: main and an anonymous atom containing the c-string
52literal "hello world". The Atom "main" has two fixups. One is the call site
53for the call to printf, and the other is a fixup for the instruction that loads
54the address of the c-string literal. </p>
55
56<br>
57<h2>
58 <a name="File model">File model</a>
59</h2>
60
61<p>The linker views the input files as basically containers of Atoms and Fixups,
62 and just a few attributes of their own. The linker works with three kinds
63of files: object files, static libraries, and dynamic libraries. Each kind
64of file has reader object which presents the file in the model expected by
65the linker.</p>
66<h4> <a>Object File</a>
67</h4>
68An object file is just a container of atoms. When linking with
69an object file, all atoms are added to the initial graph of atoms.
70
71<h4> <a>Static Library (Archive)</a>
72</h4>
73This is the traditional unix static archive which is just a collection of
74object files with a "table of contents". When linking with a static library,
75by default nothing is added to the initial graph of atoms. Instead, if there
76are unresolved references (dangling edges) in the master graph of all atoms,
77and the table of contents for a static library says that one of the object files
78in the library defines one of the missing symbols (dangling edge),
79the set of atoms from the specified object file in the static library is added
80to the master graph of atoms.
81
82<h4> <a>Dynamic Library (Shared Object)</a>
83</h4>
84Dynamic libraries are unique in that the don't directly add add any atoms.
85Their purpose is to check at build time that all references are resolved and
86provide a list of dynamic libraries (SO_NEEDED) that will be needed at runtime.
87The way this is modeled in the linker is that a dynamic library contributes
88no atoms to the initial graph of atoms. Instead, (like static libraries) if
89there are unresolved references (dangling edges) in the master graph of all atoms,
90if a dynamic library exports a required symbol, then a "proxy" atom is
91instantiated by the linker. The proxy atom allows the master atom graph to have
92all edges resolved and also records from which dynamic library a symbol came.</p>
93
94<br>
95<h2>
96 <a name="Linking Steps">Linking Steps</a>
97</h2>
98<p>Through the use of abstract Atoms, the core of linking is architecture
99independent and file format independent. All command line parsing is factored
100out into a separate "options" abstraction which enables the linker to be driven
101with different command line sets.</p>
102<p>The overall steps in linking are:<p>
103<ol>
104 <li>Command line processing</li>
105 <li>Parsing input files</li>
106 <li>Resolving</li>
107 <li>Passes/Optimizations</li>
108 <li>Generate output file</li>
109</ol>
110
111<p>The Resolving and Passes steps are done purely on the master graph of atoms,
112so they have no notion of file formats such as mach-o or ELF.</p>
113
114<h4> <a>Resolving</a>
115</h4>
116<p>The resolving step takes all the atoms graphs from each object file and
117combines them into one master object graph. Unfortunately, it is not as simple
118as appending the atom list from each file into one big list. There are many
119cases where atoms need to be coalesced. That is, two or more atoms need to
120be coalesced into one atom. This is necessary to support: C language
121 "tentative definitions", C++ weak symbols for templates and inlines defined
122in headers, and for merging copies of constants like c-strings and floating
123point constants.</p>
124
125<p>The linker support coalescing by-name and by-content. By-name is used for
126tentative definitions and weak symbols. By-content is used for constant data
127that can be merged. </p>
128
129<p>When one atom has a reference (FixUp) to another atom, there is also a binding
130type: by-name, direct, or indirect. A Fixup contains a tagged union that if
131the binding type is by-name, the union field is a pointer to a c-string. If
132the binding type is direct, the union is a pointer to an Atom. If the binding
133type is indirect, the union is a index into a table of pointers to Atoms. Below
134is a graphical representation of the binding types:</p>
135<img src="bindings.png" alt="binding types graphic"/>
136
137<p>Input file Atoms contain only direct and by-name references. Direct
138references are used for atoms defined in the same object file for which the
139target atom is either unnamed or cannot change. For instance, calling
140a static function in a translation unit will result in a direct reference
141to the static functions's atom. Also the FDE (dwarf unwind info) for a function
142has a direct reference to its function. On the other hand references to
143global symbols (e.g. call to printf) use by-name binding in object files.
144</p>
145
146<p>The resolving process maintains some global linking "state", including:
147a "symbol table" which is a map from c-string to Atom*, an indirect symbol
148table which is a growable array of Atom*, and for each kind of coalesable
149constants there is a content to Atom* map. With these data structures,
150the linker walks all atoms in all input files. For each
151atom, it checks if the atom should be in one symbol table or one of the
152coalescing tables. If so, it attempts to add the atom. If there already is
153a matching atom in that table, that means the current atom needs to be
154coalesced with the found atom.
155</p>
156
157<p>To support coalescing, all references to coalesable atoms are changed to
158indirect binding and an entry is added to the indirect table which points
159to the current chosen atom. When all input atoms have been processed by
160the resolver, there should be only direct and indirect bindings left. If
161there are any NULL entries in the indirect table, that means there are
162undefined references. The linker then looks to the supplied libraries (both
163static and dynamic) to resolve those references.
164</p>
165
166<p>Dead code stripping (if requested) is done at the end of resolving. The
167linker does a simple mark-and-sweep. It starts with "root" atoms (like "main"
168in a main executable) and follows each references and marks each Atom that
169it visits as "live". When done, all atoms not marked "live" are removed.
170</p>
171
172<h4> <a>Passes</a>
173</h4>
174<p>The Passes step
175is an open ended set of routines that each get a change to modify or enhance
176the master graph of atoms. Passes are only run if the master graph of
177atoms is completely resolved (no dangling edges).
178The current set of Passes in the Darwin linker are:</p>
179<ul>
180 <li>Objective-C optimizations (Apple)</li>
181 <li>stub (PLT) generation</li>
182 <li>GOT instantiation</li>
183 <li>TLV instantiation (Apple)</li>
184 <li>order_file optimization</li>
185 <li>branch island generation</li>
186 <li>branch shim generation</li>
187 <li>dtrace probe processing (Apple)</li>
188 <li>compact unwind encoding (Apple)</li>
189</ul>
190<p>Some of these passes are specific to Apple's runtime environments. But many
191of the passes are applicable to any OS (such as generating branch island for
192out of range branch instructions).</p>
193
194<p>The general structure of a pass is to walk the master graph inspecting each
195atom and doing something. For instance, the stub pass, walks the graph looking
196for atoms with call sites to proxy atoms (e.g. call to printf). It then
197instantiates a "stub" atom (PLT entry) and a "lazy pointer" atom for each
198proxy atom needed, and these new atoms are added to the master graph. Next
199all the noted call sites to proxy atoms are replaced with calls to the
200corresponding stub atom.</p>
201
202<h4><a>Generate Output File</a>
203</h4>
204<p>Once the passes are done, the output file generator is given a sorted list
205of atoms. Its job is to create the executable content file wrapper and place
206the content of the atoms into it.
207</p>
208
209
210<h2>
211 <a name="Future Directions">Future Directions</a>
212</h2>
213
214<h4><a>Sections</a>
215</h4>
216<p>The current use of sections in mach-o .o files over-constrains the linker.
217By default, the linker should preserve the section an atom is in. But since
218all sections must be contiguous in the output, that limits the ability of
219the linker to order atoms for locality. It would be helpful to enrich the
220object file with with reason something is in the section it is. For instance,
221is the section found at runtime? Or was the use of a section just a quick
222way to group some content together?
223</p>
224<p>The ELF model for sections is a little better than mach-o because ELF
225sections have write and execute bits, whereas mach-o sections must be in some
226segment and the segment has the write and execute bits.
227</p>
228
229<h4><a>Mach-o Object File Format</a>
230</h4>
231<p>
232The messiest part of the linker is the mach-o parser. This is because mach-o
233is a traditional section and symbols based file format. The parser must infer
234atom boundaries using two approaches. The first is that some section types have
235well defined content which the linker can parse into atoms (e.g. __cstring,
236__eh_frame). The other approach is a naming convention (which the compiler follows)
237by which the linker breaks sections into atoms at any non-local (not starting
238with 'L') symbol. The processing the linker has to do parse mach-o .o files is a
239significant part of the link time.
240</p>
241
242<p>Given that the assembler writes object files once, whereas the linker reads
243them many times (during development), it would make sense to optimize the object
244file format to be something the linker can read/parse efficiently.</p>
245
246<h4><a>New Object File Model</a>
247</h4>
248<p>LLVM has a nice model for its IR. There are three representations:
249the binary bit code file, the in-memory object model, and a textual
250representation. LLVM contains utility possible code for converting between these
251representations. The same model makes sense for atoms too. There should be
252three representations for atoms: binary file, in-memory, and textual. The Darwin
253linker already has an in-memory C++ object model for Atoms. All we need is a
254textual representation and binary file format.
255</p>
256<p>Note: in the darwin linker the binary format for input object files is
257independent of the output executable format. That is, we could have one
258universal object file format which the linker could use as input to produce
259mach-o, ELF, or PE executables.</p>
260<p>
261The object file binary format should be designed to instantiate into atoms
262as fast as possible. The obvious way to do that is that the
263file format would be an array of atoms. The linker just mmaps in the file and
264looks at the header to see how many atoms there and instantiate that many atoms
265with the atom attribute information coming from that array. The trick is
266designing this in a way that can be extended as the Atom mode evolves and new
267attributes are added.
268</p>
269<p>
270In designing a textual format we want something easy for humans to read and
271easy for the linker to parse. Since an atom has lots of attributes most of
272which are usually just the default, we should define default values for
273every attribute so that those can be omitted from the text representation.
274One possile format is YAML. Here is the atoms for a simple hello world
275program expressed in YAML.
276</p>
277<pre>
278---
279target-triple: x86_64-apple-darwin11
280source:
281
282atoms:
283 - name: _main
284 scope: linkage-unit
285 type: code
286 alignment:
287 power: 4
288 content: [ 55, 48, 89, e5, 48, 8d, 3d, 00, 00, 00, 00, 30, c0, e8, 00, 00,
289 00, 00, 31, c0, 5d, c3 ]
290 fixups:
291 - offset: 07
292 kind: pcrel32
293 target: 2
294 - offset: 0E
295 kind: call32
296 target: _fprintf
297
298 - type: c-string
299 merge: by-content
300 content: [ 73, 5A, 00 ]
301
302...
303</pre>
304
305<p>One big use for the textual format will be writing test cases. The Darwin
306linker test suite test cases are written mostly in C/C++ and a few assembly
307files. The use of C means the same test case can be compiled for different
308architectures. But writing test cases in C is problematic because the compiler
309may vary its output over time for its own optimization reasons which my
310inadvertently disable or break the linker feature trying to be tested. By
311writing test cases in the linkers own textual format, we can exactly specify
312every attribute of every atom and thus target specific linker logic.
313</p>
314
315<h4><a>Debug Info</a>
316</h4>
317<p>Around 2005 when Apple switched from using STABS to using DWARF for debug
318information, we made a design decision to have the linker ignore DWARF in
319.o files. This improves linking performance because the linker is not
320copying tons of debug info. Instead, the linker adds "debug notes" into
321output binary that contain the paths of the original .o files. During development
322the Darwin debugger will notice the debug notes and the load the dwarf
323debug information from the original object files. For release builds,
324a tool named dsymutil is run on the program. It finds the debug notes and
325then the original object files, then reads, merges and optimizes all the dwarf
326debug information into one .dSYM file which can be loaded by the debugger
327if needed.</p>
328
329<p>The current way DWARF is generated is that all debug information for all
330functions in a translation unit are merged and optimized into sections based
331on debug info kind. For instance the mapping of instructions to source line
332numbers for all functions is compressed and put in one section. This does not
333play well in an Atom based file format. One idea is to have the compiler
334emit some intermediate representation debug information (one which is
335partitioned per atom) into the Atom based file format. The linker could
336then have code to convert that intermediate debug into to final dwarf.
337This is still an open question.</p>
338
339<h4><a>Extending Atom attributes to ELF and XCOFF</a>
340</h4>
341<p>The current set of attributes defined for Atoms in the darwin linker
342were chosen to meet the requirements of developing code to run on iOS and
343Mac OS X. Below is a list of the attributes and their possible values.
344It may just require adding more values to support ELF and XCOFF. Or there
345may need to be new attributes added to capture new functionality.
346</p>
347<ul>
348 <li>Name</li>
349 <li>Size</li>
350 <li>Section (I'd like to get rid of this)</li>
351 <li>ContentType (currently some of this comes from section)</li>
352 <ul>
353 <li>code</li>
354 <li>stub</li>
355 <li>data</li>
356 <li>zeroFill</li>
357 <li>initializerPointer</li>
358 <li>objc1Class</li>
359 <li>objc2Class</li>
360 <li>objcClassPointer</li>
361 <li>objc2CategoryList</li>
362 <li>non-lazy-pointer</li>
363 <li>lazy-pointer</li>
364 <li>constant</li>
365 <li>literal4</li>
366 <li>literal8</li>
367 <li>literal16</li>
368 <li>cstring</li>
369 <li>cstringPointer</li>
370 <li>utf16string</li>
371 <li>CFString</li>
372 <li>CFI</li>
373 <li>LSDA</li>
374 </ul>
375 </li>
376 <li>Scope
377 <ul>
378 <li>translationUnit (static functions)</li>
379 <li>linkageUnit (visibility hidden)</li>
380 <li>global</li>
381 </ul>
382 </li>
383 <li>DefinitionKind
384 <ul>
385 <li>regular</li>
386 <li>tentative (ANSI C feature)</li>
387 <li>absolute (assembly code feature)</li>
388 <li>proxy (stand-in for dynamic library symbol)</li>
389 </ul>
390 </li>
391 <li>Combine
392 <ul>
393 <li>never</li>
394 <li>byName (weak symbols)</li>
395 <li>byContent (simple constants)</li>
396 <li>byContentAndReferences (complex constants)</li>
397 </ul>
398 </li>
399 <li>SymbolTableStatus
400 <ul>
401 <li>In</li>
402 <li>notIn (anonymous)</li>
403 <li>inAsAbsolute (assembly code feature)</li>
404 <li>inAndNeverStrip (tell strip tool to leave)</li>
405 <li>inWithRandomName (mach-o .o feature)</li>
406 </ul>
407 <li>Alignment
408 <ul>
409 <li>powerOfTwo</li>
410 <li>modulus</li>
411 </ul>
412 <li>NeverDeadStrip (boolean)</li>
413 <li>IsThumb (ARM specific)</li>
414</ul>
415<p>Where does dllexport fit in here? Where does visibility protected and
416internal fit? Protected seems like scope=global plus the rule to not
417indirect references to it. Internal is like hidden plus enables some
418compiler optimizations. I'm not sure the linker needs to know about internal.
419</p>
420
421</body>
422</html>
423