]> git.saurik.com Git - apple/ld64.git/blob - doc/design/linker.html
ld64-242.2.tar.gz
[apple/ld64.git] / doc / design / linker.html
1 <html>
2 <head>
3 <title>Linker</title>
4 </head>
5 <body>
6
7
8 <h1>
9 Inside the Linker
10 </h1>
11 <div class="doc_author">
12 <p>Written by <a href="mailto:kledzik@apple.com">Nick Kledzik</a></p>
13 </div>
14
15
16 <h2>
17 <a name="introduction">Introduction</a>
18 </h2>
19
20 <p>The Darwin linker is a new generation of linker. It is not "section" based
21 like traditional linkers which mostly just interlace sections from multiple
22 object files into the output file. The Darwin linker is based on "Atoms".
23 Traditional section based linking work well for simple linking, but their model
24 makes advanced linking features difficult to implement. Features like dead code
25 stripping, reordering functions for locality, and C++ coalescing require the
26 linker to work at a finer grain.
27 </p>
28
29 <p>An atom is an indivisible chunk of code or data. An atom has a set of
30 attributes, such as: name, scope, content-type, alignment, etc. An atom also
31 has a list of Fixups. A Fixup contains: a kind, an optional offset, an optional
32 addend, and an optional target atom.</p>
33
34 <p>The Atom model allows the linker to use standard graph theory models for
35 linking data structures. Each atom is a node, and each Fixup is an edge.
36 The feature of dead code stripping is implemented by following edges to mark
37 all live atoms, and then delete the non-live atoms.</p>
38 <br>
39 <h2>
40 <a name="Atom model">Atom model</a>
41 </h2>
42
43 <p>An atom is an indivisible chuck of code or data. Typically each user
44 written function or global variable is an atom. In addition, the compiler may
45 emit other atoms, such as for literal c-strings or floating point constants, or
46 for runtime data structures like dwarf unwind info or pointers to initializers.
47 </p>
48
49 <p>A simple "hello world" object file would be modeled like this:</p>
50 <img src="hello.png" alt="hello world graphic"/>
51 <p>There are two atoms: main and an anonymous atom containing the c-string
52 literal "hello world". The Atom "main" has two fixups. One is the call site
53 for the call to printf, and the other is a fixup for the instruction that loads
54 the address of the c-string literal. </p>
55
56 <br>
57 <h2>
58 <a name="File model">File model</a>
59 </h2>
60
61 <p>The linker views the input files as basically containers of Atoms and Fixups,
62 and just a few attributes of their own. The linker works with three kinds
63 of files: object files, static libraries, and dynamic libraries. Each kind
64 of file has reader object which presents the file in the model expected by
65 the linker.</p>
66 <h4> <a>Object File</a>
67 </h4>
68 An object file is just a container of atoms. When linking with
69 an object file, all atoms are added to the initial graph of atoms.
70
71 <h4> <a>Static Library (Archive)</a>
72 </h4>
73 This is the traditional unix static archive which is just a collection of
74 object files with a "table of contents". When linking with a static library,
75 by default nothing is added to the initial graph of atoms. Instead, if there
76 are unresolved references (dangling edges) in the master graph of all atoms,
77 and the table of contents for a static library says that one of the object files
78 in the library defines one of the missing symbols (dangling edge),
79 the set of atoms from the specified object file in the static library is added
80 to the master graph of atoms.
81
82 <h4> <a>Dynamic Library (Shared Object)</a>
83 </h4>
84 Dynamic libraries are unique in that the don't directly add add any atoms.
85 Their purpose is to check at build time that all references are resolved and
86 provide a list of dynamic libraries (SO_NEEDED) that will be needed at runtime.
87 The way this is modeled in the linker is that a dynamic library contributes
88 no atoms to the initial graph of atoms. Instead, (like static libraries) if
89 there are unresolved references (dangling edges) in the master graph of all atoms,
90 if a dynamic library exports a required symbol, then a "proxy" atom is
91 instantiated by the linker. The proxy atom allows the master atom graph to have
92 all edges resolved and also records from which dynamic library a symbol came.</p>
93
94 <br>
95 <h2>
96 <a name="Linking Steps">Linking Steps</a>
97 </h2>
98 <p>Through the use of abstract Atoms, the core of linking is architecture
99 independent and file format independent. All command line parsing is factored
100 out into a separate "options" abstraction which enables the linker to be driven
101 with different command line sets.</p>
102 <p>The overall steps in linking are:<p>
103 <ol>
104 <li>Command line processing</li>
105 <li>Parsing input files</li>
106 <li>Resolving</li>
107 <li>Passes/Optimizations</li>
108 <li>Generate output file</li>
109 </ol>
110
111 <p>The Resolving and Passes steps are done purely on the master graph of atoms,
112 so they have no notion of file formats such as mach-o or ELF.</p>
113
114 <h4> <a>Resolving</a>
115 </h4>
116 <p>The resolving step takes all the atoms graphs from each object file and
117 combines them into one master object graph. Unfortunately, it is not as simple
118 as appending the atom list from each file into one big list. There are many
119 cases where atoms need to be coalesced. That is, two or more atoms need to
120 be coalesced into one atom. This is necessary to support: C language
121 "tentative definitions", C++ weak symbols for templates and inlines defined
122 in headers, and for merging copies of constants like c-strings and floating
123 point constants.</p>
124
125 <p>The linker support coalescing by-name and by-content. By-name is used for
126 tentative definitions and weak symbols. By-content is used for constant data
127 that can be merged. </p>
128
129 <p>When one atom has a reference (FixUp) to another atom, there is also a binding
130 type: by-name, direct, or indirect. A Fixup contains a tagged union that if
131 the binding type is by-name, the union field is a pointer to a c-string. If
132 the binding type is direct, the union is a pointer to an Atom. If the binding
133 type is indirect, the union is a index into a table of pointers to Atoms. Below
134 is a graphical representation of the binding types:</p>
135 <img src="bindings.png" alt="binding types graphic"/>
136
137 <p>Input file Atoms contain only direct and by-name references. Direct
138 references are used for atoms defined in the same object file for which the
139 target atom is either unnamed or cannot change. For instance, calling
140 a static function in a translation unit will result in a direct reference
141 to the static functions's atom. Also the FDE (dwarf unwind info) for a function
142 has a direct reference to its function. On the other hand references to
143 global symbols (e.g. call to printf) use by-name binding in object files.
144 </p>
145
146 <p>The resolving process maintains some global linking "state", including:
147 a "symbol table" which is a map from c-string to Atom*, an indirect symbol
148 table which is a growable array of Atom*, and for each kind of coalesable
149 constants there is a content to Atom* map. With these data structures,
150 the linker walks all atoms in all input files. For each
151 atom, it checks if the atom should be in one symbol table or one of the
152 coalescing tables. If so, it attempts to add the atom. If there already is
153 a matching atom in that table, that means the current atom needs to be
154 coalesced with the found atom.
155 </p>
156
157 <p>To support coalescing, all references to coalesable atoms are changed to
158 indirect binding and an entry is added to the indirect table which points
159 to the current chosen atom. When all input atoms have been processed by
160 the resolver, there should be only direct and indirect bindings left. If
161 there are any NULL entries in the indirect table, that means there are
162 undefined references. The linker then looks to the supplied libraries (both
163 static and dynamic) to resolve those references.
164 </p>
165
166 <p>Dead code stripping (if requested) is done at the end of resolving. The
167 linker does a simple mark-and-sweep. It starts with "root" atoms (like "main"
168 in a main executable) and follows each references and marks each Atom that
169 it visits as "live". When done, all atoms not marked "live" are removed.
170 </p>
171
172 <h4> <a>Passes</a>
173 </h4>
174 <p>The Passes step
175 is an open ended set of routines that each get a change to modify or enhance
176 the master graph of atoms. Passes are only run if the master graph of
177 atoms is completely resolved (no dangling edges).
178 The current set of Passes in the Darwin linker are:</p>
179 <ul>
180 <li>Objective-C optimizations (Apple)</li>
181 <li>stub (PLT) generation</li>
182 <li>GOT instantiation</li>
183 <li>TLV instantiation (Apple)</li>
184 <li>order_file optimization</li>
185 <li>branch island generation</li>
186 <li>branch shim generation</li>
187 <li>dtrace probe processing (Apple)</li>
188 <li>compact unwind encoding (Apple)</li>
189 </ul>
190 <p>Some of these passes are specific to Apple's runtime environments. But many
191 of the passes are applicable to any OS (such as generating branch island for
192 out of range branch instructions).</p>
193
194 <p>The general structure of a pass is to walk the master graph inspecting each
195 atom and doing something. For instance, the stub pass, walks the graph looking
196 for atoms with call sites to proxy atoms (e.g. call to printf). It then
197 instantiates a "stub" atom (PLT entry) and a "lazy pointer" atom for each
198 proxy atom needed, and these new atoms are added to the master graph. Next
199 all the noted call sites to proxy atoms are replaced with calls to the
200 corresponding stub atom.</p>
201
202 <h4><a>Generate Output File</a>
203 </h4>
204 <p>Once the passes are done, the output file generator is given a sorted list
205 of atoms. Its job is to create the executable content file wrapper and place
206 the content of the atoms into it.
207 </p>
208
209
210 <h2>
211 <a name="Future Directions">Future Directions</a>
212 </h2>
213
214 <h4><a>Sections</a>
215 </h4>
216 <p>The current use of sections in mach-o .o files over-constrains the linker.
217 By default, the linker should preserve the section an atom is in. But since
218 all sections must be contiguous in the output, that limits the ability of
219 the linker to order atoms for locality. It would be helpful to enrich the
220 object file with with reason something is in the section it is. For instance,
221 is the section found at runtime? Or was the use of a section just a quick
222 way to group some content together?
223 </p>
224 <p>The ELF model for sections is a little better than mach-o because ELF
225 sections have write and execute bits, whereas mach-o sections must be in some
226 segment and the segment has the write and execute bits.
227 </p>
228
229 <h4><a>Mach-o Object File Format</a>
230 </h4>
231 <p>
232 The messiest part of the linker is the mach-o parser. This is because mach-o
233 is a traditional section and symbols based file format. The parser must infer
234 atom boundaries using two approaches. The first is that some section types have
235 well defined content which the linker can parse into atoms (e.g. __cstring,
236 __eh_frame). The other approach is a naming convention (which the compiler follows)
237 by which the linker breaks sections into atoms at any non-local (not starting
238 with 'L') symbol. The processing the linker has to do parse mach-o .o files is a
239 significant part of the link time.
240 </p>
241
242 <p>Given that the assembler writes object files once, whereas the linker reads
243 them many times (during development), it would make sense to optimize the object
244 file format to be something the linker can read/parse efficiently.</p>
245
246 <h4><a>New Object File Model</a>
247 </h4>
248 <p>LLVM has a nice model for its IR. There are three representations:
249 the binary bit code file, the in-memory object model, and a textual
250 representation. LLVM contains utility possible code for converting between these
251 representations. The same model makes sense for atoms too. There should be
252 three representations for atoms: binary file, in-memory, and textual. The Darwin
253 linker already has an in-memory C++ object model for Atoms. All we need is a
254 textual representation and binary file format.
255 </p>
256 <p>Note: in the darwin linker the binary format for input object files is
257 independent of the output executable format. That is, we could have one
258 universal object file format which the linker could use as input to produce
259 mach-o, ELF, or PE executables.</p>
260 <p>
261 The object file binary format should be designed to instantiate into atoms
262 as fast as possible. The obvious way to do that is that the
263 file format would be an array of atoms. The linker just mmaps in the file and
264 looks at the header to see how many atoms there and instantiate that many atoms
265 with the atom attribute information coming from that array. The trick is
266 designing this in a way that can be extended as the Atom mode evolves and new
267 attributes are added.
268 </p>
269 <p>
270 In designing a textual format we want something easy for humans to read and
271 easy for the linker to parse. Since an atom has lots of attributes most of
272 which are usually just the default, we should define default values for
273 every attribute so that those can be omitted from the text representation.
274 One possile format is YAML. Here is the atoms for a simple hello world
275 program expressed in YAML.
276 </p>
277 <pre>
278 ---
279 target-triple: x86_64-apple-darwin11
280 source:
281
282 atoms:
283 - name: _main
284 scope: linkage-unit
285 type: code
286 alignment:
287 power: 4
288 content: [ 55, 48, 89, e5, 48, 8d, 3d, 00, 00, 00, 00, 30, c0, e8, 00, 00,
289 00, 00, 31, c0, 5d, c3 ]
290 fixups:
291 - offset: 07
292 kind: pcrel32
293 target: 2
294 - offset: 0E
295 kind: call32
296 target: _fprintf
297
298 - type: c-string
299 merge: by-content
300 content: [ 73, 5A, 00 ]
301
302 ...
303 </pre>
304
305 <p>One big use for the textual format will be writing test cases. The Darwin
306 linker test suite test cases are written mostly in C/C++ and a few assembly
307 files. The use of C means the same test case can be compiled for different
308 architectures. But writing test cases in C is problematic because the compiler
309 may vary its output over time for its own optimization reasons which my
310 inadvertently disable or break the linker feature trying to be tested. By
311 writing test cases in the linkers own textual format, we can exactly specify
312 every attribute of every atom and thus target specific linker logic.
313 </p>
314
315 <h4><a>Debug Info</a>
316 </h4>
317 <p>Around 2005 when Apple switched from using STABS to using DWARF for debug
318 information, we made a design decision to have the linker ignore DWARF in
319 .o files. This improves linking performance because the linker is not
320 copying tons of debug info. Instead, the linker adds "debug notes" into
321 output binary that contain the paths of the original .o files. During development
322 the Darwin debugger will notice the debug notes and the load the dwarf
323 debug information from the original object files. For release builds,
324 a tool named dsymutil is run on the program. It finds the debug notes and
325 then the original object files, then reads, merges and optimizes all the dwarf
326 debug information into one .dSYM file which can be loaded by the debugger
327 if needed.</p>
328
329 <p>The current way DWARF is generated is that all debug information for all
330 functions in a translation unit are merged and optimized into sections based
331 on debug info kind. For instance the mapping of instructions to source line
332 numbers for all functions is compressed and put in one section. This does not
333 play well in an Atom based file format. One idea is to have the compiler
334 emit some intermediate representation debug information (one which is
335 partitioned per atom) into the Atom based file format. The linker could
336 then have code to convert that intermediate debug into to final dwarf.
337 This is still an open question.</p>
338
339 <h4><a>Extending Atom attributes to ELF and XCOFF</a>
340 </h4>
341 <p>The current set of attributes defined for Atoms in the darwin linker
342 were chosen to meet the requirements of developing code to run on iOS and
343 Mac OS X. Below is a list of the attributes and their possible values.
344 It may just require adding more values to support ELF and XCOFF. Or there
345 may need to be new attributes added to capture new functionality.
346 </p>
347 <ul>
348 <li>Name</li>
349 <li>Size</li>
350 <li>Section (I'd like to get rid of this)</li>
351 <li>ContentType (currently some of this comes from section)</li>
352 <ul>
353 <li>code</li>
354 <li>stub</li>
355 <li>data</li>
356 <li>zeroFill</li>
357 <li>initializerPointer</li>
358 <li>objc1Class</li>
359 <li>objc2Class</li>
360 <li>objcClassPointer</li>
361 <li>objc2CategoryList</li>
362 <li>non-lazy-pointer</li>
363 <li>lazy-pointer</li>
364 <li>constant</li>
365 <li>literal4</li>
366 <li>literal8</li>
367 <li>literal16</li>
368 <li>cstring</li>
369 <li>cstringPointer</li>
370 <li>utf16string</li>
371 <li>CFString</li>
372 <li>CFI</li>
373 <li>LSDA</li>
374 </ul>
375 </li>
376 <li>Scope
377 <ul>
378 <li>translationUnit (static functions)</li>
379 <li>linkageUnit (visibility hidden)</li>
380 <li>global</li>
381 </ul>
382 </li>
383 <li>DefinitionKind
384 <ul>
385 <li>regular</li>
386 <li>tentative (ANSI C feature)</li>
387 <li>absolute (assembly code feature)</li>
388 <li>proxy (stand-in for dynamic library symbol)</li>
389 </ul>
390 </li>
391 <li>Combine
392 <ul>
393 <li>never</li>
394 <li>byName (weak symbols)</li>
395 <li>byContent (simple constants)</li>
396 <li>byContentAndReferences (complex constants)</li>
397 </ul>
398 </li>
399 <li>SymbolTableStatus
400 <ul>
401 <li>In</li>
402 <li>notIn (anonymous)</li>
403 <li>inAsAbsolute (assembly code feature)</li>
404 <li>inAndNeverStrip (tell strip tool to leave)</li>
405 <li>inWithRandomName (mach-o .o feature)</li>
406 </ul>
407 <li>Alignment
408 <ul>
409 <li>powerOfTwo</li>
410 <li>modulus</li>
411 </ul>
412 <li>NeverDeadStrip (boolean)</li>
413 <li>IsThumb (ARM specific)</li>
414 </ul>
415 <p>Where does dllexport fit in here? Where does visibility protected and
416 internal fit? Protected seems like scope=global plus the rule to not
417 indirect references to it. Internal is like hidden plus enables some
418 compiler optimizations. I'm not sure the linker needs to know about internal.
419 </p>
420
421 </body>
422 </html>
423