The code uses memmove() to move parts of the buffer to the
front when the buffer is only partially read. By simply
reading one page at a time, the maximum size of bytes that
must be moved has a hard limit, and performance improves:
In one test case, consisting of a 430 MB Contents file,
and a 75K PDiff, applying the PDiff previously took about
48 seconds and now completes in 2 seconds.
Further speed up can be achieved by buffering writes, they
account for about 60% of the run-time now.