- * Set the floating-point state for a thread based
- * on the FXSave formatted data. This is basically
- * the same as fpu_set_state except it uses the
- * expanded data structure.
- * If the thread is not the current thread, it is
- * not running (held). Locking needed against
- * concurrent fpu_set_state or fpu_get_state.
+ * Set the floating-point state for a thread based on the FXSave formatted data.
+ * This is basically the same as fpu_set_state except it uses the expanded data
+ * structure.
+ * If the thread is not the current thread, it is not running (held). Locking
+ * needed against concurrent fpu_set_state or fpu_get_state.
+ *
+ * While translating between XNU FP state structures and the CPU-native XSAVE area,
+ * if we detect state components that are all zeroes, we clear the corresponding
+ * xstate_bv bit in the XSAVE area, because that allows the corresponding state to
+ * be initialized to a "clean" state. That's most important when clearing the YMM
+ * bit, since an initialized "upper clean" state results in a massive performance
+ * improvement due to elimination of false dependencies between the XMMs and the
+ * upper bits of the YMMs.