doc/pac.md

   1 ARMv8.3 Pointer Authentication in xnu
   2 =====================================
   3
   4 Introduction
   5 ------------
   6
   7 This document describes xnu's use of the ARMv8.3-PAuth extension. Specifically,
   8 xnu uses ARMv8.3-PAuth to protect against Return-Oriented-Programming (ROP)
   9 and Jump-Oriented-Programming (JOP) attacks, which attempt to gain control flow
  10 over a victim program by overwriting return addresses or function pointers
  11 stored in memory.
  12
  13 It is assumed the reader is already familar with the basic concepts behind
  14 ARMv8.3-PAuth and what its instructions do.  The "ARMv8.3-A Pointer
  15 Authentication" section of Google Project Zero's ["Examining Pointer
  16 Authentication on the iPhone
  17 XS"](https://googleprojectzero.blogspot.com/2019/02/examining-pointer-authentication-on.html)
  18 provides a good introduction to ARMv8.3-PAuth. The reader may find more
  19 comprehensive background material in:
  20
  21 * The "Pointer authentication in AArch64 state" section of the [ARMv8
  22   ARM](https://developer.arm.com/docs/ddi0487/latest/arm-architecture-reference-manual-armv8-for-armv8-a-architecture-profile)
  23   describes the new instructions and registers associated with ARMv8.3-PAuth.
  24
  25 * [LLVM's Pointer Authentication
  26   documentation](https://github.com/apple/llvm-project/blob/apple/master/clang/docs/PointerAuthentication.rst)
  27   outlines how clang uses ARMv8.3-PAuth instructions to harden key C, C++,
  28   Swift, and Objective-C language constructs.
  29
  30 ### Threat model
  31
  32 Pointer authentication's threat model assumes that an attacker has found a gadget
  33 to read and write arbitrary memory belonging to a victim process, which may
  34 include the kernel. The attacker does *not* have the ability to execute
  35 arbitrary code in that process's context.  Pointer authentication aims to
  36 prevent the attacker from gaining control flow over the victim process by
  37 overwriting sensitive pointers in its address space (e.g., return addresses
  38 stored on the stack).
  39
  40 Following this threat model, xnu takes a two-pronged approach to prevent the
  41 attacker from gaining control flow over the victim process:
  42
  43 1. Both xnu and first-party binaries are built with LLVM's `-arch arm64e` flag,
  44    which generates pointer-signing and authentication instructions to protect
  45    addresses stored in memory (including ones pushed to the stack).  This
  46    process is generally transparent to xnu, with exceptions discussed below.
  47
  48 2. On exception entry, xnu hashes critical register state before it is spilled
  49    to memory.  On exception return, the reloaded state is validated against this
  50    hash.
  51
  52 The ["xnu PAC infrastructure"](#xnu-pac-infrastructure) section discusses how
  53 these hardening techniques are implemented in xnu in more detail.
  54
  55
  56 Key generation on Apple CPUs
  57 ----------------------------
  58
  59 ARMv8.3-PAuth implementations may use an <span style="font-variant:
  60 small-caps">implementation defined</span> cipher.  Apple CPUs implement an
  61 optional custom cipher with two key-generation changes relevant to xnu.
  62
  63
  64 ### Per-boot diversifier
  65
  66 Apple's optional cipher adds a per-boot diversifier.  In effect, even if xnu
  67 initializes the "ARM key" registers (`APIAKey`, `APGAKey`, etc.) with constants,
  68 signing a given value will still produce different signatures from boot to boot.
  69
  70
  71 ### Kernel/userspace diversifier
  72
  73 Apple CPUs also contain a second diversifier known as `KERNKey`.  `KERNKey` is
  74 automatically mixed into the final signing key (or not) based on the CPU's
  75 exception level. When xnu needs to sign or authenticate userspace-signed
  76 pointers, it uses the `ml_enable_user_jop_key` and `ml_disable_user_jop_key`
  77 routines to manually enable or disable `KERNKey`. `KERNKey` allows the CPU to
  78 effectively use different signing keys for userspace and kernel, without needing
  79 to explicitly reprogram the generic ARM keys on every kernel entry and exit.
  80
  81
  82 xnu PAC infrastructure
  83 ----------------------
  84
  85 For historical reasons, the xnu codebase collectively refers to xnu + iOS's
  86 pointer authentication infrastructure as Pointer Authentication Codes (PAC). The
  87 remainder of this document will follow this terminology for consistency with
  88 xnu.
  89
  90 ### arm64e binary "slice"
  91
  92 Binaries with PAC instructions are not fully backwards-compatible with non-PAC
  93 CPUs. Hence LLVM/iOS treat PAC-enabled binaries as a distinct ABI "slice" named
  94 arm64e. xnu enforces this distinction by disabling the PAC keys when returning
  95 to non-arm64e userspace, effectively turning ARMv8.3-PAuth auth and sign
  96 instructions into no-ops (see the ["SCTLR_EL1"](#sctlr-el1) heading below for
  97 more details).
  98
  99 ### Kernel pointer signing
 100
 101 xnu is built with `-arch arm64e`, which causes LLVM to automatically sign and
 102 authenticate function pointers and return addresses spilled onto the stack. This
 103 process is largely transparent to software, with some exceptions:
 104
 105 - During early boot, xnu rebases and signs the pointers stored in its own
 106   `__thread_starts` section (see `rebase_threaded_starts` in
 107   `osfmk/arm/arm_init.c`).
 108
 109 - As parts of the userspace shared region are paged in, the page-in handler must
 110   also slide and re-sign any signed pointers stored in it.  The ["Signed
 111   pointers in shared regions"](#signed-pointers-in-shared-regions) section
 112   discusses this in further detail.
 113
 114 - Assembly routines must manually sign the return address with `pacibsp` before
 115   pushing it onto the stack, and use an authenticating `retab` instruction in
 116   place of `ret`.  xnu provides assembly macros `ARM64_STACK_PROLOG` and
 117   `ARM64_STACK_EPILOG` which emit the appropriate instructions for both arm64
 118   and arm64e targets.
 119
 120   Likewise, branches in assembly to signed C function pointers must use the
 121   authenticating `blraa` instruction in place of `blr`.
 122
 123 - Signed pointers must be stripped with `ptrauth_strip` before they can be
 124   compared against compile-time constants like `VM_MIN_KERNEL_ADDRESS`.
 125
 126 ### Testing data pointer signing
 127
 128 xnu contains tests for each manually qualified data pointer that should be
 129 updated as new pointers are qualified. The tests allocate a structure
 130 containing a __ptrauth qualified member, and write a pointer to that member.
 131 We can then compare the stored value, which should be signed, with a manually
 132 constructed signature. See `ALLOC_VALIDATE_DATA_PTR`.
 133
 134 Tests are triggered by setting the `kern.run_ptrauth_data_tests` sysctl. The
 135 sysctl is implemented, and BSD structures are tested, in `bsd/tests/ptrauth_data_tests_sysctl.c`.
 136 Mach structures are tested in `osfmk/tests/ptrauth_data_tests.c`.
 137
 138 ### Managing PAC register state
 139
 140 xnu generally tries to avoid reprogramming the CPU's PAC-related registers on
 141 kernel entry and exit, since this could add significant overhead to a hot
 142 codepath. Instead, xnu uses the following strategies to manage the PAC register
 143 state.
 144
 145 #### A keys
 146
 147 Userspace processes' A keys (`AP{IA,DA,GA}Key`) are derived from the field
 148 `jop_pid` inside `struct task`.  For implementation reasons, an exact duplicate
 149 of this field is cached in the corresponding `struct machine_thread`.
 150
 151
 152 A keys are randomly generated at shared region initialization time (see ["Signed
 153 pointers in shared regions"](#signed-pointers-in-shared-regions) below) and
 154 copied into `jop_pid` during process activation.  This shared region, and hence
 155 associated A keys, may be shared among arm64e processes under specific
 156 circumstances:
 157
 158 1. "System processes" (i.e., processes launched from first-party signed binaries
 159    on the iOS system image) generally use a common shared region with a default
 160    `jop_pid` value, separate from non-system processes.
 161
 162    If a system process wishes to isolate its A keys even from other system
 163    processes, it may opt into a custom shared region using an entitlement in
 164    the form `com.apple.pac.shared_region_id=[...]`.  That is, two processes with
 165    the entitlement `com.apple.pac.shared_region_id=foo` would share A keys and
 166    shared regions with each other, but not with other system processes.
 167
 168 2. Other arm64e processes automatically use the same shared region/A keys if
 169    their respective binaries are signed with the same team-identifier strings.
 170
 171 3. `posix_spawnattr_set_ptrauth_task_port_np()` allows explicit "inheriting" of
 172    A keys during `posix_spawn()`, using a supplied mach task port.  This API is
 173    intended to support debugging tools that may need to auth or sign pointers
 174    using the target process's keys.
 175
 176 #### B keys
 177
 178 Each process is assigned a random set of "B keys" (`AP{IB,DB}Key`) on process
 179 creation.  As a special exception, processes which inherit their parents' memory
 180 address space (e.g., during `fork`) will also inherit their parents' B keys.
 181 These keys are stored as the field `rop_pid` inside `struct task`, with an exact
 182 duplicate in `struct machine_thread` for implementation reasons.
 183
 184 xnu reprograms the ARM B-key registers during context switch, via the macro
 185 `set_process_dependent_keys_and_sync_context` in `cswitch.s`.
 186
 187 xnu uses the B keys internally to sign pointers pushed onto the kernel stack,
 188 such as stashed LR values.  Note that xnu does *not* need to explicitly switch
 189 to a dedicated set of "kernel B keys" to do this:
 190
 191 1. The `KERNKey` diversifier already ensures that the actual signing keys are
 192    different between xnu and userspace.
 193
 194 2. Although reprogramming the ARM B-key registers will affect xnu's signing keys
 195    as well, pointers pushed onto the stack are inherently short-lived.
 196    Specifically, there will never be a situation where a stack pointer value is
 197    signed with one `current_task()`, but needs to be authed under a different
 198    active `current_task()`.
 199
 200 #### SCTLR_EL1
 201
 202 As discussed above, xnu disables the ARM keys when returning to non-arm64e
 203 userspace processes.  This is implemented by manipulating the `EnIA`, `EnIB`,
 204 and `EnDA`, and `EnDB` bits in the ARM `SCTLR_EL1` system register.  When
 205 these bits are cleared, auth or sign instruction using the respective keys
 206 will simply pass through their inputs unmodified.
 207
 208 Initially, xnu cleared these bits during every `exception_return` to a
 209 non-arm64e process.  Since xnu itself uses these keys, the exception vector
 210 needs to restore the same bits on every exception entry (implemented in the
 211 `EL0_64_VECTOR` macro).
 212
 213 Apple A13 CPUs now have controls that allow xnu to keep the PAC keys enabled at
 214 EL1, independent of `SCTLR_EL1` settings.  On these CPUs, xnu only needs to
 215 reconfigure `SCTLR_EL1` when context-switching from a "vanilla" arm64 process to
 216 an arm64e process, or vice-versa (`pmap_switch_user_ttb_internal`).
 217
 218 ### Signed pointers in shared regions
 219
 220 Each userspace process has a *shared region* mapped into its address space,
 221 consisting of code and data shared across all processes of the same processor
 222 type, bitness, root directory, and (for arm64e processes) team ID.  Comments at
 223 the top of `osfmk/vm/vm_shared_region.c` discuss this region, and the process of
 224 populating it, in more detail.
 225
 226 As the VM layer pages in parts of the shared region, any embedded pointers must
 227 be rebased.  Although this process is not new, PAC adds a new step: these
 228 embedded pointers may be signed, and must be re-signed after they are rebased.
 229 This process is implemented as `vm_shared_region_slide_page_v3` in
 230 `osfmk/vm/vm_shared_region.c`.
 231
 232 xnu signs these embedded pointers using a shared-region-specific A key
 233 (`sr_jop_key`), which is randomly generated when the shared region is created.
 234 Since these pointers will be consumed by userspace processes, xnu temporarily
 235 switches to the userspace A keys when re-signing them.
 236
 237 ### Signing spilled register state
 238
 239 xnu saves register state into kernel memory when taking exceptions, and reloads
 240 this state on exception return.  If an attacker has write access to kernel
 241 memory, it can modify this saved state and effectively get control over a
 242 victim thread's control flow.
 243
 244 xnu hardens against this attack by calling `ml_sign_thread_state` on exception
 245 entry to hash certain registers before they're saved to memory.  On exception
 246 return, it calls the complementary `ml_check_signed_state` function to ensure
 247 that the reloaded values still match this hash.  `ml_sign_thread_state` hashes a
 248 handful of particularly sensitive registers:
 249
 250 * `pc, lr`: directly affect control-flow
 251 * `cpsr`: controls process's exception level
 252 * `x16, x17`: used by LLVM to temporarily store unauthenticated addresses
 253
 254 `ml_sign_thread_state` also uses the address of the thread's `arm_saved_state_t`
 255 as a diversifier.  This step keeps attackers from using `ml_sign_thread_state`
 256 as a signing oracle.  An attacker may attempt to create a sacrificial thread,
 257 set this thread to some desired state, and use kernel memory access gadgets to
 258 transplant the xnu-signed state onto a victim thread.  Because the victim
 259 process has a different `arm_saved_state_t` address as a diversifier,
 260 `ml_check_signed_state` will detect a hash mismatch in the victim thread.
 261
 262 Apart from exception entry and return, xnu calls `ml_check_signed_state` and
 263 `ml_sign_thread_state` whenever it needs to mutate one of these sensitive
 264 registers (e.g., advancing the PC to the next instruction).  This process looks
 265 like:
 266
 267 1. Disable interrupts
 268 2. Load `pc, lr, cpsr, x16, x17` values and hash from thread's
 269    `arm_saved_state_t` into registers
 270 3. Call `ml_check_signed_state` to ensure values have not been tampered with
 271 4. Mutate one or more of these values using *only* register-to-register
 272    instructions
 273 5. Call `ml_sign_thread_state` to re-hash the mutated thread state
 274 6. Store the mutated values and new hash back into thread's `arm_saved_state_t`.
 275 7. Restore old interrupt state
 276
 277 Critically, none of the sensitive register values can be spilled to memory
 278 between steps 1 and 7.  Otherwise an attacker with kernel memory access could
 279 modify one of these values and use step 5 as a signing oracle. xnu implements
 280 these routines entirely in assembly to ensure full control over register use,
 281 using a macro `MANIPULATE_SIGNED_THREAD_STATE()` to generate boilerplate
 282 instructions.
 283
 284 Interrupts must be disabled whenever `ml_check_signed_state` or
 285 `ml_sign_thread_state` are called, starting *before* their inputs (`x0`--`x5`)
 286 are populated.  To understand why, consider what would happen if the CPU could
 287 be interrupted just before step 5 above.  xnu's exception handler would spill
 288 the entire register state to memory.  If an attacker has kernel memory access,
 289 they could attempt to replace the spilled `x0`--`x5` values.  These modified
 290 values would then be reloaded into the CPU during exception return; and
 291 `ml_sign_thread_state` would be called with new, attacker-controlled inputs.
 292
 293 ### thread_set_state
 294
 295 The `thread_set_state` call lets userspace modify the register state of a target
 296 thread.  Signed userspace state adds a wrinkle to this process, since the
 297 incoming FP, LR, SP, and PC values are signed using the *userspace process's*
 298 key.
 299
 300 xnu handles this in two steps.  First, `machine_thread_state_convert_from_user`
 301 converts the userspace thread state representation into an in-kernel
 302 representation.  Signed values are authenticated using `pmap_auth_user_ptr`,
 303 which involves temporarily switching to the userspace keys.
 304
 305 Second, `thread_state64_to_saved_state` applies this converted state to the
 306 target thread.  Whenever `thread_state64_to_saved_state` modifies a register
 307 that makes up part of the thread state hash, it uses
 308 `MANIPULATE_SIGNED_THREAD_STATE()` as described above to update this hash.
 309
 310
 311 ### Signing arbitrary data blobs
 312
 313 xnu provides `ptrauth_utils_sign_blob_generic` and `ptrauth_utils_auth_blob_generic`
 314 to sign and authenticate arbitrary blobs of data. Callers are responsible for
 315 storing the pointer-sized signature returned. The signature is a rolling MAC
 316 of the data, using the `pacga` instruction, mixed with a provided salt and optionally
 317 further diversified by storage address.
 318
 319 Use of these functions is inherently racy. The data must be read from memory
 320 before each pointer-sized block can be added to the signature. In normal operation,
 321 standard thread-safety semantics protect from corruption, however in the malicious
 322 case, it may be possible to time overwriting the buffer before signing or after
 323 authentication.
 324
 325 Callers of these functions must take care to minimise these race windows by
 326 using them immediately preceeding/following a write/read of the blob's data.