+ // we rotate the image in 21-pixel (63-byte) wide strips
+ // to make better use of cpu cache - memory transfers
+ // (note: while much better than single-pixel "strips",
+ // our vertical strips will still generally straddle cachelines)
+ for (long ii = 0; ii < width; )