]>
Commit | Line | Data |
---|---|---|
f427ee49 A |
1 | ARMv8.3 Pointer Authentication in xnu |
2 | ===================================== | |
3 | ||
4 | Introduction | |
5 | ------------ | |
6 | ||
7 | This document describes xnu's use of the ARMv8.3-PAuth extension. Specifically, | |
8 | xnu uses ARMv8.3-PAuth to protect against Return-Oriented-Programming (ROP) | |
9 | and Jump-Oriented-Programming (JOP) attacks, which attempt to gain control flow | |
10 | over a victim program by overwriting return addresses or function pointers | |
11 | stored in memory. | |
12 | ||
13 | It is assumed the reader is already familar with the basic concepts behind | |
14 | ARMv8.3-PAuth and what its instructions do. The "ARMv8.3-A Pointer | |
15 | Authentication" section of Google Project Zero's ["Examining Pointer | |
16 | Authentication on the iPhone | |
17 | XS"](https://googleprojectzero.blogspot.com/2019/02/examining-pointer-authentication-on.html) | |
18 | provides a good introduction to ARMv8.3-PAuth. The reader may find more | |
19 | comprehensive background material in: | |
20 | ||
21 | * The "Pointer authentication in AArch64 state" section of the [ARMv8 | |
22 | ARM](https://developer.arm.com/docs/ddi0487/latest/arm-architecture-reference-manual-armv8-for-armv8-a-architecture-profile) | |
23 | describes the new instructions and registers associated with ARMv8.3-PAuth. | |
24 | ||
25 | * [LLVM's Pointer Authentication | |
26 | documentation](https://github.com/apple/llvm-project/blob/apple/master/clang/docs/PointerAuthentication.rst) | |
27 | outlines how clang uses ARMv8.3-PAuth instructions to harden key C, C++, | |
28 | Swift, and Objective-C language constructs. | |
29 | ||
30 | ### Threat model | |
31 | ||
32 | Pointer authentication's threat model assumes that an attacker has found a gadget | |
33 | to read and write arbitrary memory belonging to a victim process, which may | |
34 | include the kernel. The attacker does *not* have the ability to execute | |
35 | arbitrary code in that process's context. Pointer authentication aims to | |
36 | prevent the attacker from gaining control flow over the victim process by | |
37 | overwriting sensitive pointers in its address space (e.g., return addresses | |
38 | stored on the stack). | |
39 | ||
40 | Following this threat model, xnu takes a two-pronged approach to prevent the | |
41 | attacker from gaining control flow over the victim process: | |
42 | ||
43 | 1. Both xnu and first-party binaries are built with LLVM's `-arch arm64e` flag, | |
44 | which generates pointer-signing and authentication instructions to protect | |
45 | addresses stored in memory (including ones pushed to the stack). This | |
46 | process is generally transparent to xnu, with exceptions discussed below. | |
47 | ||
48 | 2. On exception entry, xnu hashes critical register state before it is spilled | |
49 | to memory. On exception return, the reloaded state is validated against this | |
50 | hash. | |
51 | ||
52 | The ["xnu PAC infrastructure"](#xnu-pac-infrastructure) section discusses how | |
53 | these hardening techniques are implemented in xnu in more detail. | |
54 | ||
55 | ||
56 | Key generation on Apple CPUs | |
57 | ---------------------------- | |
58 | ||
59 | ARMv8.3-PAuth implementations may use an <span style="font-variant: | |
60 | small-caps">implementation defined</span> cipher. Apple CPUs implement an | |
61 | optional custom cipher with two key-generation changes relevant to xnu. | |
62 | ||
63 | ||
64 | ### Per-boot diversifier | |
65 | ||
66 | Apple's optional cipher adds a per-boot diversifier. In effect, even if xnu | |
67 | initializes the "ARM key" registers (`APIAKey`, `APGAKey`, etc.) with constants, | |
68 | signing a given value will still produce different signatures from boot to boot. | |
69 | ||
70 | ||
71 | ### Kernel/userspace diversifier | |
72 | ||
73 | Apple CPUs also contain a second diversifier known as `KERNKey`. `KERNKey` is | |
74 | automatically mixed into the final signing key (or not) based on the CPU's | |
75 | exception level. When xnu needs to sign or authenticate userspace-signed | |
76 | pointers, it uses the `ml_enable_user_jop_key` and `ml_disable_user_jop_key` | |
77 | routines to manually enable or disable `KERNKey`. `KERNKey` allows the CPU to | |
78 | effectively use different signing keys for userspace and kernel, without needing | |
79 | to explicitly reprogram the generic ARM keys on every kernel entry and exit. | |
80 | ||
81 | ||
82 | xnu PAC infrastructure | |
83 | ---------------------- | |
84 | ||
85 | For historical reasons, the xnu codebase collectively refers to xnu + iOS's | |
86 | pointer authentication infrastructure as Pointer Authentication Codes (PAC). The | |
87 | remainder of this document will follow this terminology for consistency with | |
88 | xnu. | |
89 | ||
90 | ### arm64e binary "slice" | |
91 | ||
92 | Binaries with PAC instructions are not fully backwards-compatible with non-PAC | |
93 | CPUs. Hence LLVM/iOS treat PAC-enabled binaries as a distinct ABI "slice" named | |
94 | arm64e. xnu enforces this distinction by disabling the PAC keys when returning | |
95 | to non-arm64e userspace, effectively turning ARMv8.3-PAuth auth and sign | |
96 | instructions into no-ops (see the ["SCTLR_EL1"](#sctlr-el1) heading below for | |
97 | more details). | |
98 | ||
99 | ### Kernel pointer signing | |
100 | ||
101 | xnu is built with `-arch arm64e`, which causes LLVM to automatically sign and | |
102 | authenticate function pointers and return addresses spilled onto the stack. This | |
103 | process is largely transparent to software, with some exceptions: | |
104 | ||
105 | - During early boot, xnu rebases and signs the pointers stored in its own | |
106 | `__thread_starts` section (see `rebase_threaded_starts` in | |
107 | `osfmk/arm/arm_init.c`). | |
108 | ||
109 | - As parts of the userspace shared region are paged in, the page-in handler must | |
110 | also slide and re-sign any signed pointers stored in it. The ["Signed | |
111 | pointers in shared regions"](#signed-pointers-in-shared-regions) section | |
112 | discusses this in further detail. | |
113 | ||
114 | - Assembly routines must manually sign the return address with `pacibsp` before | |
115 | pushing it onto the stack, and use an authenticating `retab` instruction in | |
116 | place of `ret`. xnu provides assembly macros `ARM64_STACK_PROLOG` and | |
117 | `ARM64_STACK_EPILOG` which emit the appropriate instructions for both arm64 | |
118 | and arm64e targets. | |
119 | ||
120 | Likewise, branches in assembly to signed C function pointers must use the | |
121 | authenticating `blraa` instruction in place of `blr`. | |
122 | ||
123 | - Signed pointers must be stripped with `ptrauth_strip` before they can be | |
124 | compared against compile-time constants like `VM_MIN_KERNEL_ADDRESS`. | |
125 | ||
126 | ### Testing data pointer signing | |
127 | ||
128 | xnu contains tests for each manually qualified data pointer that should be | |
129 | updated as new pointers are qualified. The tests allocate a structure | |
130 | containing a __ptrauth qualified member, and write a pointer to that member. | |
131 | We can then compare the stored value, which should be signed, with a manually | |
132 | constructed signature. See `ALLOC_VALIDATE_DATA_PTR`. | |
133 | ||
134 | Tests are triggered by setting the `kern.run_ptrauth_data_tests` sysctl. The | |
135 | sysctl is implemented, and BSD structures are tested, in `bsd/tests/ptrauth_data_tests_sysctl.c`. | |
136 | Mach structures are tested in `osfmk/tests/ptrauth_data_tests.c`. | |
137 | ||
138 | ### Managing PAC register state | |
139 | ||
140 | xnu generally tries to avoid reprogramming the CPU's PAC-related registers on | |
141 | kernel entry and exit, since this could add significant overhead to a hot | |
142 | codepath. Instead, xnu uses the following strategies to manage the PAC register | |
143 | state. | |
144 | ||
145 | #### A keys | |
146 | ||
147 | Userspace processes' A keys (`AP{IA,DA,GA}Key`) are derived from the field | |
148 | `jop_pid` inside `struct task`. For implementation reasons, an exact duplicate | |
149 | of this field is cached in the corresponding `struct machine_thread`. | |
150 | ||
151 | ||
152 | A keys are randomly generated at shared region initialization time (see ["Signed | |
153 | pointers in shared regions"](#signed-pointers-in-shared-regions) below) and | |
154 | copied into `jop_pid` during process activation. This shared region, and hence | |
155 | associated A keys, may be shared among arm64e processes under specific | |
156 | circumstances: | |
157 | ||
158 | 1. "System processes" (i.e., processes launched from first-party signed binaries | |
159 | on the iOS system image) generally use a common shared region with a default | |
160 | `jop_pid` value, separate from non-system processes. | |
161 | ||
162 | If a system process wishes to isolate its A keys even from other system | |
163 | processes, it may opt into a custom shared region using an entitlement in | |
164 | the form `com.apple.pac.shared_region_id=[...]`. That is, two processes with | |
165 | the entitlement `com.apple.pac.shared_region_id=foo` would share A keys and | |
166 | shared regions with each other, but not with other system processes. | |
167 | ||
168 | 2. Other arm64e processes automatically use the same shared region/A keys if | |
169 | their respective binaries are signed with the same team-identifier strings. | |
170 | ||
171 | 3. `posix_spawnattr_set_ptrauth_task_port_np()` allows explicit "inheriting" of | |
172 | A keys during `posix_spawn()`, using a supplied mach task port. This API is | |
173 | intended to support debugging tools that may need to auth or sign pointers | |
174 | using the target process's keys. | |
175 | ||
176 | #### B keys | |
177 | ||
178 | Each process is assigned a random set of "B keys" (`AP{IB,DB}Key`) on process | |
179 | creation. As a special exception, processes which inherit their parents' memory | |
180 | address space (e.g., during `fork`) will also inherit their parents' B keys. | |
181 | These keys are stored as the field `rop_pid` inside `struct task`, with an exact | |
182 | duplicate in `struct machine_thread` for implementation reasons. | |
183 | ||
184 | xnu reprograms the ARM B-key registers during context switch, via the macro | |
185 | `set_process_dependent_keys_and_sync_context` in `cswitch.s`. | |
186 | ||
187 | xnu uses the B keys internally to sign pointers pushed onto the kernel stack, | |
188 | such as stashed LR values. Note that xnu does *not* need to explicitly switch | |
189 | to a dedicated set of "kernel B keys" to do this: | |
190 | ||
191 | 1. The `KERNKey` diversifier already ensures that the actual signing keys are | |
192 | different between xnu and userspace. | |
193 | ||
194 | 2. Although reprogramming the ARM B-key registers will affect xnu's signing keys | |
195 | as well, pointers pushed onto the stack are inherently short-lived. | |
196 | Specifically, there will never be a situation where a stack pointer value is | |
197 | signed with one `current_task()`, but needs to be authed under a different | |
198 | active `current_task()`. | |
199 | ||
200 | #### SCTLR_EL1 | |
201 | ||
202 | As discussed above, xnu disables the ARM keys when returning to non-arm64e | |
203 | userspace processes. This is implemented by manipulating the `EnIA`, `EnIB`, | |
204 | and `EnDA`, and `EnDB` bits in the ARM `SCTLR_EL1` system register. When | |
205 | these bits are cleared, auth or sign instruction using the respective keys | |
206 | will simply pass through their inputs unmodified. | |
207 | ||
208 | Initially, xnu cleared these bits during every `exception_return` to a | |
209 | non-arm64e process. Since xnu itself uses these keys, the exception vector | |
210 | needs to restore the same bits on every exception entry (implemented in the | |
211 | `EL0_64_VECTOR` macro). | |
212 | ||
213 | Apple A13 CPUs now have controls that allow xnu to keep the PAC keys enabled at | |
214 | EL1, independent of `SCTLR_EL1` settings. On these CPUs, xnu only needs to | |
215 | reconfigure `SCTLR_EL1` when context-switching from a "vanilla" arm64 process to | |
216 | an arm64e process, or vice-versa (`pmap_switch_user_ttb_internal`). | |
217 | ||
218 | ### Signed pointers in shared regions | |
219 | ||
220 | Each userspace process has a *shared region* mapped into its address space, | |
221 | consisting of code and data shared across all processes of the same processor | |
222 | type, bitness, root directory, and (for arm64e processes) team ID. Comments at | |
223 | the top of `osfmk/vm/vm_shared_region.c` discuss this region, and the process of | |
224 | populating it, in more detail. | |
225 | ||
226 | As the VM layer pages in parts of the shared region, any embedded pointers must | |
227 | be rebased. Although this process is not new, PAC adds a new step: these | |
228 | embedded pointers may be signed, and must be re-signed after they are rebased. | |
229 | This process is implemented as `vm_shared_region_slide_page_v3` in | |
230 | `osfmk/vm/vm_shared_region.c`. | |
231 | ||
232 | xnu signs these embedded pointers using a shared-region-specific A key | |
233 | (`sr_jop_key`), which is randomly generated when the shared region is created. | |
234 | Since these pointers will be consumed by userspace processes, xnu temporarily | |
235 | switches to the userspace A keys when re-signing them. | |
236 | ||
237 | ### Signing spilled register state | |
238 | ||
239 | xnu saves register state into kernel memory when taking exceptions, and reloads | |
240 | this state on exception return. If an attacker has write access to kernel | |
241 | memory, it can modify this saved state and effectively get control over a | |
242 | victim thread's control flow. | |
243 | ||
244 | xnu hardens against this attack by calling `ml_sign_thread_state` on exception | |
245 | entry to hash certain registers before they're saved to memory. On exception | |
246 | return, it calls the complementary `ml_check_signed_state` function to ensure | |
247 | that the reloaded values still match this hash. `ml_sign_thread_state` hashes a | |
248 | handful of particularly sensitive registers: | |
249 | ||
250 | * `pc, lr`: directly affect control-flow | |
251 | * `cpsr`: controls process's exception level | |
252 | * `x16, x17`: used by LLVM to temporarily store unauthenticated addresses | |
253 | ||
254 | `ml_sign_thread_state` also uses the address of the thread's `arm_saved_state_t` | |
255 | as a diversifier. This step keeps attackers from using `ml_sign_thread_state` | |
256 | as a signing oracle. An attacker may attempt to create a sacrificial thread, | |
257 | set this thread to some desired state, and use kernel memory access gadgets to | |
258 | transplant the xnu-signed state onto a victim thread. Because the victim | |
259 | process has a different `arm_saved_state_t` address as a diversifier, | |
260 | `ml_check_signed_state` will detect a hash mismatch in the victim thread. | |
261 | ||
262 | Apart from exception entry and return, xnu calls `ml_check_signed_state` and | |
263 | `ml_sign_thread_state` whenever it needs to mutate one of these sensitive | |
264 | registers (e.g., advancing the PC to the next instruction). This process looks | |
265 | like: | |
266 | ||
267 | 1. Disable interrupts | |
268 | 2. Load `pc, lr, cpsr, x16, x17` values and hash from thread's | |
269 | `arm_saved_state_t` into registers | |
270 | 3. Call `ml_check_signed_state` to ensure values have not been tampered with | |
271 | 4. Mutate one or more of these values using *only* register-to-register | |
272 | instructions | |
273 | 5. Call `ml_sign_thread_state` to re-hash the mutated thread state | |
274 | 6. Store the mutated values and new hash back into thread's `arm_saved_state_t`. | |
275 | 7. Restore old interrupt state | |
276 | ||
277 | Critically, none of the sensitive register values can be spilled to memory | |
278 | between steps 1 and 7. Otherwise an attacker with kernel memory access could | |
279 | modify one of these values and use step 5 as a signing oracle. xnu implements | |
280 | these routines entirely in assembly to ensure full control over register use, | |
281 | using a macro `MANIPULATE_SIGNED_THREAD_STATE()` to generate boilerplate | |
282 | instructions. | |
283 | ||
284 | Interrupts must be disabled whenever `ml_check_signed_state` or | |
285 | `ml_sign_thread_state` are called, starting *before* their inputs (`x0`--`x5`) | |
286 | are populated. To understand why, consider what would happen if the CPU could | |
287 | be interrupted just before step 5 above. xnu's exception handler would spill | |
288 | the entire register state to memory. If an attacker has kernel memory access, | |
289 | they could attempt to replace the spilled `x0`--`x5` values. These modified | |
290 | values would then be reloaded into the CPU during exception return; and | |
291 | `ml_sign_thread_state` would be called with new, attacker-controlled inputs. | |
292 | ||
293 | ### thread_set_state | |
294 | ||
295 | The `thread_set_state` call lets userspace modify the register state of a target | |
296 | thread. Signed userspace state adds a wrinkle to this process, since the | |
297 | incoming FP, LR, SP, and PC values are signed using the *userspace process's* | |
298 | key. | |
299 | ||
300 | xnu handles this in two steps. First, `machine_thread_state_convert_from_user` | |
301 | converts the userspace thread state representation into an in-kernel | |
302 | representation. Signed values are authenticated using `pmap_auth_user_ptr`, | |
303 | which involves temporarily switching to the userspace keys. | |
304 | ||
305 | Second, `thread_state64_to_saved_state` applies this converted state to the | |
306 | target thread. Whenever `thread_state64_to_saved_state` modifies a register | |
307 | that makes up part of the thread state hash, it uses | |
308 | `MANIPULATE_SIGNED_THREAD_STATE()` as described above to update this hash. | |
309 | ||
310 | ||
311 | ### Signing arbitrary data blobs | |
312 | ||
313 | xnu provides `ptrauth_utils_sign_blob_generic` and `ptrauth_utils_auth_blob_generic` | |
314 | to sign and authenticate arbitrary blobs of data. Callers are responsible for | |
315 | storing the pointer-sized signature returned. The signature is a rolling MAC | |
316 | of the data, using the `pacga` instruction, mixed with a provided salt and optionally | |
317 | further diversified by storage address. | |
318 | ||
319 | Use of these functions is inherently racy. The data must be read from memory | |
320 | before each pointer-sized block can be added to the signature. In normal operation, | |
321 | standard thread-safety semantics protect from corruption, however in the malicious | |
322 | case, it may be possible to time overwriting the buffer before signing or after | |
323 | authentication. | |
324 | ||
325 | Callers of these functions must take care to minimise these race windows by | |
326 | using them immediately preceeding/following a write/read of the blob's data. |