]>
Commit | Line | Data |
---|---|---|
ebf6f434 A |
1 | <html> |
2 | <head> | |
3 | <title>Linker</title> | |
4 | </head> | |
5 | <body> | |
6 | ||
7 | ||
8 | <h1> | |
9 | Inside the Linker | |
10 | </h1> | |
11 | <div class="doc_author"> | |
12 | <p>Written by <a href="mailto:kledzik@apple.com">Nick Kledzik</a></p> | |
13 | </div> | |
14 | ||
15 | ||
16 | <h2> | |
17 | <a name="introduction">Introduction</a> | |
18 | </h2> | |
19 | ||
20 | <p>The Darwin linker is a new generation of linker. It is not "section" based | |
21 | like traditional linkers which mostly just interlace sections from multiple | |
22 | object files into the output file. The Darwin linker is based on "Atoms". | |
23 | Traditional section based linking work well for simple linking, but their model | |
24 | makes advanced linking features difficult to implement. Features like dead code | |
25 | stripping, reordering functions for locality, and C++ coalescing require the | |
26 | linker to work at a finer grain. | |
27 | </p> | |
28 | ||
29 | <p>An atom is an indivisible chunk of code or data. An atom has a set of | |
30 | attributes, such as: name, scope, content-type, alignment, etc. An atom also | |
31 | has a list of Fixups. A Fixup contains: a kind, an optional offset, an optional | |
32 | addend, and an optional target atom.</p> | |
33 | ||
34 | <p>The Atom model allows the linker to use standard graph theory models for | |
35 | linking data structures. Each atom is a node, and each Fixup is an edge. | |
36 | The feature of dead code stripping is implemented by following edges to mark | |
37 | all live atoms, and then delete the non-live atoms.</p> | |
38 | <br> | |
39 | <h2> | |
40 | <a name="Atom model">Atom model</a> | |
41 | </h2> | |
42 | ||
43 | <p>An atom is an indivisible chuck of code or data. Typically each user | |
44 | written function or global variable is an atom. In addition, the compiler may | |
45 | emit other atoms, such as for literal c-strings or floating point constants, or | |
46 | for runtime data structures like dwarf unwind info or pointers to initializers. | |
47 | </p> | |
48 | ||
49 | <p>A simple "hello world" object file would be modeled like this:</p> | |
50 | <img src="hello.png" alt="hello world graphic"/> | |
51 | <p>There are two atoms: main and an anonymous atom containing the c-string | |
52 | literal "hello world". The Atom "main" has two fixups. One is the call site | |
53 | for the call to printf, and the other is a fixup for the instruction that loads | |
54 | the address of the c-string literal. </p> | |
55 | ||
56 | <br> | |
57 | <h2> | |
58 | <a name="File model">File model</a> | |
59 | </h2> | |
60 | ||
61 | <p>The linker views the input files as basically containers of Atoms and Fixups, | |
62 | and just a few attributes of their own. The linker works with three kinds | |
63 | of files: object files, static libraries, and dynamic libraries. Each kind | |
64 | of file has reader object which presents the file in the model expected by | |
65 | the linker.</p> | |
66 | <h4> <a>Object File</a> | |
67 | </h4> | |
68 | An object file is just a container of atoms. When linking with | |
69 | an object file, all atoms are added to the initial graph of atoms. | |
70 | ||
71 | <h4> <a>Static Library (Archive)</a> | |
72 | </h4> | |
73 | This is the traditional unix static archive which is just a collection of | |
74 | object files with a "table of contents". When linking with a static library, | |
75 | by default nothing is added to the initial graph of atoms. Instead, if there | |
76 | are unresolved references (dangling edges) in the master graph of all atoms, | |
77 | and the table of contents for a static library says that one of the object files | |
78 | in the library defines one of the missing symbols (dangling edge), | |
79 | the set of atoms from the specified object file in the static library is added | |
80 | to the master graph of atoms. | |
81 | ||
82 | <h4> <a>Dynamic Library (Shared Object)</a> | |
83 | </h4> | |
84 | Dynamic libraries are unique in that the don't directly add add any atoms. | |
85 | Their purpose is to check at build time that all references are resolved and | |
86 | provide a list of dynamic libraries (SO_NEEDED) that will be needed at runtime. | |
87 | The way this is modeled in the linker is that a dynamic library contributes | |
88 | no atoms to the initial graph of atoms. Instead, (like static libraries) if | |
89 | there are unresolved references (dangling edges) in the master graph of all atoms, | |
90 | if a dynamic library exports a required symbol, then a "proxy" atom is | |
91 | instantiated by the linker. The proxy atom allows the master atom graph to have | |
92 | all edges resolved and also records from which dynamic library a symbol came.</p> | |
93 | ||
94 | <br> | |
95 | <h2> | |
96 | <a name="Linking Steps">Linking Steps</a> | |
97 | </h2> | |
98 | <p>Through the use of abstract Atoms, the core of linking is architecture | |
99 | independent and file format independent. All command line parsing is factored | |
100 | out into a separate "options" abstraction which enables the linker to be driven | |
101 | with different command line sets.</p> | |
102 | <p>The overall steps in linking are:<p> | |
103 | <ol> | |
104 | <li>Command line processing</li> | |
105 | <li>Parsing input files</li> | |
106 | <li>Resolving</li> | |
107 | <li>Passes/Optimizations</li> | |
108 | <li>Generate output file</li> | |
109 | </ol> | |
110 | ||
111 | <p>The Resolving and Passes steps are done purely on the master graph of atoms, | |
112 | so they have no notion of file formats such as mach-o or ELF.</p> | |
113 | ||
114 | <h4> <a>Resolving</a> | |
115 | </h4> | |
116 | <p>The resolving step takes all the atoms graphs from each object file and | |
117 | combines them into one master object graph. Unfortunately, it is not as simple | |
118 | as appending the atom list from each file into one big list. There are many | |
119 | cases where atoms need to be coalesced. That is, two or more atoms need to | |
120 | be coalesced into one atom. This is necessary to support: C language | |
121 | "tentative definitions", C++ weak symbols for templates and inlines defined | |
122 | in headers, and for merging copies of constants like c-strings and floating | |
123 | point constants.</p> | |
124 | ||
125 | <p>The linker support coalescing by-name and by-content. By-name is used for | |
126 | tentative definitions and weak symbols. By-content is used for constant data | |
127 | that can be merged. </p> | |
128 | ||
129 | <p>When one atom has a reference (FixUp) to another atom, there is also a binding | |
130 | type: by-name, direct, or indirect. A Fixup contains a tagged union that if | |
131 | the binding type is by-name, the union field is a pointer to a c-string. If | |
132 | the binding type is direct, the union is a pointer to an Atom. If the binding | |
133 | type is indirect, the union is a index into a table of pointers to Atoms. Below | |
134 | is a graphical representation of the binding types:</p> | |
135 | <img src="bindings.png" alt="binding types graphic"/> | |
136 | ||
137 | <p>Input file Atoms contain only direct and by-name references. Direct | |
138 | references are used for atoms defined in the same object file for which the | |
139 | target atom is either unnamed or cannot change. For instance, calling | |
140 | a static function in a translation unit will result in a direct reference | |
141 | to the static functions's atom. Also the FDE (dwarf unwind info) for a function | |
142 | has a direct reference to its function. On the other hand references to | |
143 | global symbols (e.g. call to printf) use by-name binding in object files. | |
144 | </p> | |
145 | ||
146 | <p>The resolving process maintains some global linking "state", including: | |
147 | a "symbol table" which is a map from c-string to Atom*, an indirect symbol | |
148 | table which is a growable array of Atom*, and for each kind of coalesable | |
149 | constants there is a content to Atom* map. With these data structures, | |
150 | the linker walks all atoms in all input files. For each | |
151 | atom, it checks if the atom should be in one symbol table or one of the | |
152 | coalescing tables. If so, it attempts to add the atom. If there already is | |
153 | a matching atom in that table, that means the current atom needs to be | |
154 | coalesced with the found atom. | |
155 | </p> | |
156 | ||
157 | <p>To support coalescing, all references to coalesable atoms are changed to | |
158 | indirect binding and an entry is added to the indirect table which points | |
159 | to the current chosen atom. When all input atoms have been processed by | |
160 | the resolver, there should be only direct and indirect bindings left. If | |
161 | there are any NULL entries in the indirect table, that means there are | |
162 | undefined references. The linker then looks to the supplied libraries (both | |
163 | static and dynamic) to resolve those references. | |
164 | </p> | |
165 | ||
166 | <p>Dead code stripping (if requested) is done at the end of resolving. The | |
167 | linker does a simple mark-and-sweep. It starts with "root" atoms (like "main" | |
168 | in a main executable) and follows each references and marks each Atom that | |
169 | it visits as "live". When done, all atoms not marked "live" are removed. | |
170 | </p> | |
171 | ||
172 | <h4> <a>Passes</a> | |
173 | </h4> | |
174 | <p>The Passes step | |
175 | is an open ended set of routines that each get a change to modify or enhance | |
176 | the master graph of atoms. Passes are only run if the master graph of | |
177 | atoms is completely resolved (no dangling edges). | |
178 | The current set of Passes in the Darwin linker are:</p> | |
179 | <ul> | |
180 | <li>Objective-C optimizations (Apple)</li> | |
181 | <li>stub (PLT) generation</li> | |
182 | <li>GOT instantiation</li> | |
183 | <li>TLV instantiation (Apple)</li> | |
184 | <li>order_file optimization</li> | |
185 | <li>branch island generation</li> | |
186 | <li>branch shim generation</li> | |
187 | <li>dtrace probe processing (Apple)</li> | |
188 | <li>compact unwind encoding (Apple)</li> | |
189 | </ul> | |
190 | <p>Some of these passes are specific to Apple's runtime environments. But many | |
191 | of the passes are applicable to any OS (such as generating branch island for | |
192 | out of range branch instructions).</p> | |
193 | ||
194 | <p>The general structure of a pass is to walk the master graph inspecting each | |
195 | atom and doing something. For instance, the stub pass, walks the graph looking | |
196 | for atoms with call sites to proxy atoms (e.g. call to printf). It then | |
197 | instantiates a "stub" atom (PLT entry) and a "lazy pointer" atom for each | |
198 | proxy atom needed, and these new atoms are added to the master graph. Next | |
199 | all the noted call sites to proxy atoms are replaced with calls to the | |
200 | corresponding stub atom.</p> | |
201 | ||
202 | <h4><a>Generate Output File</a> | |
203 | </h4> | |
204 | <p>Once the passes are done, the output file generator is given a sorted list | |
205 | of atoms. Its job is to create the executable content file wrapper and place | |
206 | the content of the atoms into it. | |
207 | </p> | |
208 | ||
209 | ||
210 | <h2> | |
211 | <a name="Future Directions">Future Directions</a> | |
212 | </h2> | |
213 | ||
214 | <h4><a>Sections</a> | |
215 | </h4> | |
216 | <p>The current use of sections in mach-o .o files over-constrains the linker. | |
217 | By default, the linker should preserve the section an atom is in. But since | |
218 | all sections must be contiguous in the output, that limits the ability of | |
219 | the linker to order atoms for locality. It would be helpful to enrich the | |
220 | object file with with reason something is in the section it is. For instance, | |
221 | is the section found at runtime? Or was the use of a section just a quick | |
222 | way to group some content together? | |
223 | </p> | |
224 | <p>The ELF model for sections is a little better than mach-o because ELF | |
225 | sections have write and execute bits, whereas mach-o sections must be in some | |
226 | segment and the segment has the write and execute bits. | |
227 | </p> | |
228 | ||
229 | <h4><a>Mach-o Object File Format</a> | |
230 | </h4> | |
231 | <p> | |
232 | The messiest part of the linker is the mach-o parser. This is because mach-o | |
233 | is a traditional section and symbols based file format. The parser must infer | |
234 | atom boundaries using two approaches. The first is that some section types have | |
235 | well defined content which the linker can parse into atoms (e.g. __cstring, | |
236 | __eh_frame). The other approach is a naming convention (which the compiler follows) | |
237 | by which the linker breaks sections into atoms at any non-local (not starting | |
238 | with 'L') symbol. The processing the linker has to do parse mach-o .o files is a | |
239 | significant part of the link time. | |
240 | </p> | |
241 | ||
242 | <p>Given that the assembler writes object files once, whereas the linker reads | |
243 | them many times (during development), it would make sense to optimize the object | |
244 | file format to be something the linker can read/parse efficiently.</p> | |
245 | ||
246 | <h4><a>New Object File Model</a> | |
247 | </h4> | |
248 | <p>LLVM has a nice model for its IR. There are three representations: | |
249 | the binary bit code file, the in-memory object model, and a textual | |
250 | representation. LLVM contains utility possible code for converting between these | |
251 | representations. The same model makes sense for atoms too. There should be | |
252 | three representations for atoms: binary file, in-memory, and textual. The Darwin | |
253 | linker already has an in-memory C++ object model for Atoms. All we need is a | |
254 | textual representation and binary file format. | |
255 | </p> | |
256 | <p>Note: in the darwin linker the binary format for input object files is | |
257 | independent of the output executable format. That is, we could have one | |
258 | universal object file format which the linker could use as input to produce | |
259 | mach-o, ELF, or PE executables.</p> | |
260 | <p> | |
261 | The object file binary format should be designed to instantiate into atoms | |
262 | as fast as possible. The obvious way to do that is that the | |
263 | file format would be an array of atoms. The linker just mmaps in the file and | |
264 | looks at the header to see how many atoms there and instantiate that many atoms | |
265 | with the atom attribute information coming from that array. The trick is | |
266 | designing this in a way that can be extended as the Atom mode evolves and new | |
267 | attributes are added. | |
268 | </p> | |
269 | <p> | |
270 | In designing a textual format we want something easy for humans to read and | |
271 | easy for the linker to parse. Since an atom has lots of attributes most of | |
272 | which are usually just the default, we should define default values for | |
273 | every attribute so that those can be omitted from the text representation. | |
274 | One possile format is YAML. Here is the atoms for a simple hello world | |
275 | program expressed in YAML. | |
276 | </p> | |
277 | <pre> | |
278 | --- | |
279 | target-triple: x86_64-apple-darwin11 | |
280 | source: | |
281 | ||
282 | atoms: | |
283 | - name: _main | |
284 | scope: linkage-unit | |
285 | type: code | |
286 | alignment: | |
287 | power: 4 | |
288 | content: [ 55, 48, 89, e5, 48, 8d, 3d, 00, 00, 00, 00, 30, c0, e8, 00, 00, | |
289 | 00, 00, 31, c0, 5d, c3 ] | |
290 | fixups: | |
291 | - offset: 07 | |
292 | kind: pcrel32 | |
293 | target: 2 | |
294 | - offset: 0E | |
295 | kind: call32 | |
296 | target: _fprintf | |
297 | ||
298 | - type: c-string | |
299 | merge: by-content | |
300 | content: [ 73, 5A, 00 ] | |
301 | ||
302 | ... | |
303 | </pre> | |
304 | ||
305 | <p>One big use for the textual format will be writing test cases. The Darwin | |
306 | linker test suite test cases are written mostly in C/C++ and a few assembly | |
307 | files. The use of C means the same test case can be compiled for different | |
308 | architectures. But writing test cases in C is problematic because the compiler | |
309 | may vary its output over time for its own optimization reasons which my | |
310 | inadvertently disable or break the linker feature trying to be tested. By | |
311 | writing test cases in the linkers own textual format, we can exactly specify | |
312 | every attribute of every atom and thus target specific linker logic. | |
313 | </p> | |
314 | ||
315 | <h4><a>Debug Info</a> | |
316 | </h4> | |
317 | <p>Around 2005 when Apple switched from using STABS to using DWARF for debug | |
318 | information, we made a design decision to have the linker ignore DWARF in | |
319 | .o files. This improves linking performance because the linker is not | |
320 | copying tons of debug info. Instead, the linker adds "debug notes" into | |
321 | output binary that contain the paths of the original .o files. During development | |
322 | the Darwin debugger will notice the debug notes and the load the dwarf | |
323 | debug information from the original object files. For release builds, | |
324 | a tool named dsymutil is run on the program. It finds the debug notes and | |
325 | then the original object files, then reads, merges and optimizes all the dwarf | |
326 | debug information into one .dSYM file which can be loaded by the debugger | |
327 | if needed.</p> | |
328 | ||
329 | <p>The current way DWARF is generated is that all debug information for all | |
330 | functions in a translation unit are merged and optimized into sections based | |
331 | on debug info kind. For instance the mapping of instructions to source line | |
332 | numbers for all functions is compressed and put in one section. This does not | |
333 | play well in an Atom based file format. One idea is to have the compiler | |
334 | emit some intermediate representation debug information (one which is | |
335 | partitioned per atom) into the Atom based file format. The linker could | |
336 | then have code to convert that intermediate debug into to final dwarf. | |
337 | This is still an open question.</p> | |
338 | ||
339 | <h4><a>Extending Atom attributes to ELF and XCOFF</a> | |
340 | </h4> | |
341 | <p>The current set of attributes defined for Atoms in the darwin linker | |
342 | were chosen to meet the requirements of developing code to run on iOS and | |
343 | Mac OS X. Below is a list of the attributes and their possible values. | |
344 | It may just require adding more values to support ELF and XCOFF. Or there | |
345 | may need to be new attributes added to capture new functionality. | |
346 | </p> | |
347 | <ul> | |
348 | <li>Name</li> | |
349 | <li>Size</li> | |
350 | <li>Section (I'd like to get rid of this)</li> | |
351 | <li>ContentType (currently some of this comes from section)</li> | |
352 | <ul> | |
353 | <li>code</li> | |
354 | <li>stub</li> | |
355 | <li>data</li> | |
356 | <li>zeroFill</li> | |
357 | <li>initializerPointer</li> | |
358 | <li>objc1Class</li> | |
359 | <li>objc2Class</li> | |
360 | <li>objcClassPointer</li> | |
361 | <li>objc2CategoryList</li> | |
362 | <li>non-lazy-pointer</li> | |
363 | <li>lazy-pointer</li> | |
364 | <li>constant</li> | |
365 | <li>literal4</li> | |
366 | <li>literal8</li> | |
367 | <li>literal16</li> | |
368 | <li>cstring</li> | |
369 | <li>cstringPointer</li> | |
370 | <li>utf16string</li> | |
371 | <li>CFString</li> | |
372 | <li>CFI</li> | |
373 | <li>LSDA</li> | |
374 | </ul> | |
375 | </li> | |
376 | <li>Scope | |
377 | <ul> | |
378 | <li>translationUnit (static functions)</li> | |
379 | <li>linkageUnit (visibility hidden)</li> | |
380 | <li>global</li> | |
381 | </ul> | |
382 | </li> | |
383 | <li>DefinitionKind | |
384 | <ul> | |
385 | <li>regular</li> | |
386 | <li>tentative (ANSI C feature)</li> | |
387 | <li>absolute (assembly code feature)</li> | |
388 | <li>proxy (stand-in for dynamic library symbol)</li> | |
389 | </ul> | |
390 | </li> | |
391 | <li>Combine | |
392 | <ul> | |
393 | <li>never</li> | |
394 | <li>byName (weak symbols)</li> | |
395 | <li>byContent (simple constants)</li> | |
396 | <li>byContentAndReferences (complex constants)</li> | |
397 | </ul> | |
398 | </li> | |
399 | <li>SymbolTableStatus | |
400 | <ul> | |
401 | <li>In</li> | |
402 | <li>notIn (anonymous)</li> | |
403 | <li>inAsAbsolute (assembly code feature)</li> | |
404 | <li>inAndNeverStrip (tell strip tool to leave)</li> | |
405 | <li>inWithRandomName (mach-o .o feature)</li> | |
406 | </ul> | |
407 | <li>Alignment | |
408 | <ul> | |
409 | <li>powerOfTwo</li> | |
410 | <li>modulus</li> | |
411 | </ul> | |
412 | <li>NeverDeadStrip (boolean)</li> | |
413 | <li>IsThumb (ARM specific)</li> | |
414 | </ul> | |
415 | <p>Where does dllexport fit in here? Where does visibility protected and | |
416 | internal fit? Protected seems like scope=global plus the rule to not | |
417 | indirect references to it. Internal is like hidden plus enables some | |
418 | compiler optimizations. I'm not sure the linker needs to know about internal. | |
419 | </p> | |
420 | ||
421 | </body> | |
422 | </html> | |
423 |