+ *
+ * Note that some versions of the atomic functions incorporate memory barriers,
+ * and some do not. Barriers strictly order memory access on a weakly-ordered
+ * architecture such as PPC. All loads and stores executed in sequential program
+ * order before the barrier will complete before any load or store executed after
+ * the barrier. On a uniprocessor, the barrier operation is typically a nop.
+ * On a multiprocessor, the barrier can be quite expensive.
+ *
+ * Most code will want to use the barrier functions to insure that memory shared
+ * between threads is properly synchronized. For example, if you want to initialize
+ * a shared data structure and then atomically increment a variable to indicate
+ * that the initialization is complete, then you MUST use OSAtomicIncrement32Barrier()
+ * to ensure that the stores to your data structure complete before the atomic add.
+ * Likewise, the consumer of that data structure MUST use OSAtomicDecrement32Barrier(),
+ * in order to ensure that their loads of the structure are not executed before
+ * the atomic decrement. On the other hand, if you are simply incrementing a global
+ * counter, then it is safe and potentially faster to use OSAtomicIncrement32().
+ *
+ * If you are unsure which version to use, prefer the barrier variants as they are
+ * safer.
+ *
+ * The spinlock and queue operations always incorporate a barrier.