1 # IMP caches generation
5 When you call an Objective-C method, Objective-C looks for the actual IMP (function pointer) given the class and selector. It then
6 stores, in malloced memory, this [selector, IMP] pair in a hash table.
10 * imperfect: the hash function is just a mask applied to the selector's
11 address, so if two selectors have the same low bits, you will get a collision.
12 Collisions are resolved through linear probing
13 * expensive: the footprint of these hash tables is often around 70MB total on a carry device.
15 We want to replace this with static hash tables, inside the shared cache, where you have:
17 * a perfect hashing function
18 * and the tables are in clean memory
22 Because the hash function is executed as part of `objc_msgSend`, it has to be blazingly fast. The algorithm below explains how we can
23 make a hash function of the form `h(x) = (x >> shift) & mask` work, where `shift` and `mask` are two class-specific parameters.
25 Note that we cannot use traditional perfect hash tables techniques as:
27 * they require at least [1.44 bits of information per key](https://arxiv.org/pdf/1910.06416.pdf).
28 * they often require multiple lookups (two-level hashing).
30 The idea is that because the input of the hash function is a selector's address, we have some control over the input... because the
31 dyld shared cache builder is the one placing all the selectors.
33 So now the problem becomes : given a set of classes and the selectors they implement, how do you place the selectors, and for each
34 class, how do you find a shift and mask so that the hash table generated by (selector >> shift) & mask is perfect?
36 (Note that the shift + mask idiom lets us use various "bit stripes" of the selector's address for various hash tables).
40 There are basically two steps to the main algorithm:
42 ### Finding shifts and masks
44 Assign some of the high bits of the selector's address, and find a shift and a mask for each class. This is a backtracking
45 algorithm which goes through each class, one after another. As it goes through classes, it finds a shift and a mask compatible
46 with the bit ranges that are already set on each selector's address, and assigns the corresponding bits of the selector's address.
47 Note that because addresses are partial at this point (some of the bits are unset) it's very difficult to check for collisions,
48 so selectors will end up at the same address.
50 At each step of the algorithm, we go through a set of possible shifts and masks until we find one which works. If none work, we let
51 the hash table grow to one more bit to make our job slightly easier. In practice few hash tables grow one more bit (82 out of 18k).
52 If we cannot find a suitable shift and mask after backtracking a few times, we'll also allow ourselves to drop a class from the set
53 and not generate an IMP cache for that one. This happens for a dozen classes or so with the current data set.
55 Next we have to deal with collisions, and constraints on the addresses themselves because each selector is... a `char*`, which has
56 a length. You cannot have an overlap between two selectors. So the idea here is to try to get rid of this constraint by assigning
57 in step 1 just the address of a "bucket" which is 128 bytes long (7 bits). So step 1 will assign the high bits of the address
58 (which will be the address of the bucket) and we can then place the selectors linearly in the buckets.
60 ### Shuffling selectors around
62 Then you have to deal with address collisions. If the lengths of selectors in a given bucket add up to more than 128, you have
63 to move some selectors out. So step 2 goes through all the selectors, checks if it fits within the bucket it's supposed to be in,
64 and if it doesn't finds another suitable bucket.
66 To do so, it iterates through all the classes the selector is in, and builds a "constraint set" applying to that selector's
67 bucket's address (by basically looking at which slots in each hash table are free and combining all these constraints, which
68 impact a different "stripe" of the address due to the shifts and masks). Once we have the constraint set, we can find a different
69 bucket for the selector without changing the shifts and masks. (If there is no other possible bucket, we allow ourselves to drop the
70 classes involving this selector, too).
72 Once each selector has a valid bucket, you can simply assign the low 7 bits of each address by looking at which selectors are in
73 each bucket and looking at their lengths.
75 The problem is hard but not that hard given the number of classes we are targeting with this optimization:
76 we are roughly targeting ~ 20k classes out of 120k and ~ 200k selectors out of 900k, so we have lots of "free space".
78 Note that any holes we leave in the bucketization of the selectors can be filled later by placing all the selectors not targeted
79 by the optimization and any selectors from OS executables inside them.
81 ## Shared cache builder setup
83 The optimization is guided by a file dropped by the OrderFiles project (`/AppleInternal/OrderFiles/shared-cache-objc-optimizations.json`).
84 The performance team is responsible for updating this file by looking at the caches on live or StressCycler devices with `objcdt dump-imp-caches`.
86 This file specifies a list of classes and metaclasses the algorithm targets, and a list of flattening roots.
88 We haven't explained flattening yet. When a class D(aughter) inherits from a class M(other), we should in theory add all of M's methods
89 to D's IMP cache. However
91 * This constrains the problem too much. The solver's job will be too difficult if the same selectors are in thousands of classes.
92 * This makes the IMP cache sizes blow up.
94 So our scheme is to target only classes which are leaves in the inheritance tree with this optimization. For any selector that comes from
95 parent classes, the cache lookup will fail and fall back to looking for the method in `super`. Because `super` is not a leaf class,
96 it will have a dynamically allocated IMP cache and can cache there any selector that comes from the parents.
98 However, this means that some very interesting classes from a memory standpoint (because we find their caches in many processes)
99 get excluded because they have child classes. A solution to this is to turn on selector inheritance (add Mother's selectors
100 to Daughter's cache) starting at some flattening root. Then the IMP cache will have a "fallback class" that is the superclass
101 of the flattening root, and `objc_msgSend` will fallback to that class if it cannot find the selector in the child cache,
102 skipping over all the classes up to the flattening root (because we know all the selectors in that chain will be present
105 Very early into the shared cache builder's life, the algorithm described above runs. To do so, it parses all of the source dylibs
106 to find out which methods will end up in which class's cache (see section "what ends up in each cache" below).
107 The output of the algorithm is:
111 * a list of methods that will end up in the cache with (source install name, source class name, source category name)
112 - for each selector: an offset at which it needs to be placed relative to the beginning of the selectors section
113 - a list of holes in the selector address space (or more accurately offset space) that can be used to add additional selectors
114 not targeted by the algorithm
116 Then, we do all the selector deduping work, and when we get to the ObjC optimizer:
118 - we go through all the classes again to build a map from (source install name, source class name, source category name) to the
119 actual IMP's address (which is not known before the shared cache is laid out)
120 - we go through all the IMP caches returned by the algorithm, find the corresponding IMP, and emit the actual IMP cache in the
121 objc optimizer's RO region.
123 ## Some code pointers
125 Most of the logic lives in the single C++ file `IMPCaches.cpp` and the two headers `IMPCaches.hpp` and `IMPCachesBuilder.hpp`. A tour:
129 `IMPCaches::Selector` : A unique selector. Has a name, a list of classes it's used in, and an address (which is actually an offset from
130 the beginning of the selectors section). Due to how the placement algorithm work, it also has a current
131 partial address and the corresponding bitmask of fixed bits. The algorithm adds bits to the partial address
132 as it makes progress and updates the mask accordingly
134 `IMPCaches::AddressSpace` : a representation of the address space available in the selectors section, so that step 2 of the algorithm
135 can check if selector buckets are overflowing.
137 `IMPCaches::HoleMap` : represents the holes left by the selector placement algorithm, to be filled later with other selectors we did not target.
139 `IMPCaches::Constraint` : represents a constraint on some of the bits of an address. It stores a set of allowed values for a given range of bits (shift and mask)
141 `IMPCachesBuilder` : what actually computes the caches (part of the algorithm ; what actually lays down the caches lives in the
146 * `IMPCachesBuilder::parseDylibs` : parses the source dylibs to figure out which method ends up in which cache.
148 * `IMPCachesBuilder::findShiftsAndMasks` : finds the shift and mask for each class (see "Findings shifts and masks" above).
150 * `IMPCachesBuilder::solveGivenShiftsAndMasks` : moves selectors around to make sure each bucket only contains 128 bytes worth of selectors.
152 Then in `OptimizerObjC`:
154 * `IMPMapBuilder` : Builds a map from (install name, class name, method name) to actual IMPs
156 * `IMPCachesEmitter` : emits the actual IMP caches in the shared cache.
158 ### What ends up in each cache?
160 `IMPCachesBuilder::parseDylibs` goes through all the classes and categories to decide, for each (class,selector) pair, which implementation we will actually call at runtime.
163 * Get all the methods from the method lists
164 * Attach any methods from non-cross-image categories (when we have cross-image categories, we prevent the class from getting an IMP cache). If there is any collision at this point we'll
165 use the implementation from the main class (or from the first category we found).
166 * Then we may inline some of the implementations from superclasses, see the explanation on flattening above.