[apple/file_cmds.git] / compress / doc / README


	@(#)README	8.1 (Berkeley) 6/9/93
  $FreeBSD: src/usr.bin/compress/doc/README,v 1.3 2002/12/30 21:18:11 schweikh Exp $

Compress version 4.0 improvements over 3.0:
	o compress() speedup (10-50%) by changing division hash to xor
	o decompress() speedup (5-10%)
	o Memory requirements reduced (3-30%)
	o Stack requirements reduced to less than 4kb
	o Removed 'Big+Fast' compress code (FBITS) because of compress speedup
    	o Portability mods for Z8000 and PC/XT (but not zeus 3.2)
	o Default to 'quiet' mode
	o Unification of 'force' flags
	o Manual page overhaul
	o Portability enhancement for M_XENIX
	o Removed text on #else and #endif
	o Added "-V" switch to print version and options
	o Added #defines for SIGNED_COMPARE_SLOW
	o Added Makefile and "usermem" program
	o Removed all floating point computations
	o New programs: [deleted]

The "usermem" script attempts to determine the maximum process size.  Some
editing of the script may be necessary (see the comments).  [It should work
fine on 4.3 BSD.] If you can't get it to work at all, just create file
"USERMEM" containing the maximum process size in decimal.

The following preprocessor symbols control the compilation of "compress.c":

	o USERMEM		Maximum process memory on the system
	o SACREDMEM		Amount to reserve for other processes
	o SIGNED_COMPARE_SLOW	Unsigned compare instructions are faster
	o NO_UCHAR		Don't use "unsigned char" types
	o BITS			Overrules default set by USERMEM-SACREDMEM
	o vax			Generate inline assembler
	o interdata		Defines SIGNED_COMPARE_SLOW
	o M_XENIX		Makes arrays < 65536 bytes each
	o pdp11			BITS=12, NO_UCHAR
	o z8000			BITS=12
	o pcxt			BITS=12
	o BSD4_2		Allow long filenames ( > 14 characters) &
				Call setlinebuf(stderr)

The difference "usermem-sacredmem" determines the maximum BITS that can be
specified with the "-b" flag.

memory: at least		BITS
------  -- -----                ----
     433,484			 16
     229,600			 15
     127,536			 14
      73,464			 13
           0			 12

The default is BITS=16.

The maximum bits can be overruled by specifying "-DBITS=bits" at
compilation time.

WARNING: files compressed on a large machine with more bits than allowed by 
a version of compress on a smaller machine cannot be decompressed!  Use the
"-b12" flag to generate a file on a large machine that can be uncompressed 
on a 16-bit machine.

The output of compress 4.0 is fully compatible with that of compress 3.0.
In other words, the output of compress 4.0 may be fed into uncompress 3.0 or
the output of compress 3.0 may be fed into uncompress 4.0.

The output of compress 4.0 not compatible with that of
compress 2.0.  However, compress 4.0 still accepts the output of
compress 2.0.  To generate output that is compatible with compress
2.0, use the undocumented "-C" flag.

	-from mod.sources, submitted by vax135!petsd!joe (Joe Orost), 8/1/85
--------------------------------

Enclosed is compress version 3.0 with the following changes:

1.	"Block" compression is performed.  After the BITS run out, the
	compression ratio is checked every so often.  If it is decreasing,
	the table is cleared and a new set of substrings are generated.

	This makes the output of compress 3.0 not compatible with that of
	compress 2.0.  However, compress 3.0 still accepts the output of
	compress 2.0.  To generate output that is compatible with compress
	2.0, use the undocumented "-C" flag.

2.	A quiet "-q" flag has been added for use by the news system.

3.	The character chaining has been deleted and the program now uses
	hashing.  This improves the speed of the program, especially
	during decompression.  Other speed improvements have been made,
	such as using putc() instead of fwrite().

4.	A large table is used on large machines when a relatively small
	number of bits is specified.  This saves much time when compressing
	for a 16-bit machine on a 32-bit virtual machine.  Note that the
	speed improvement only occurs when the input file is > 30000
	characters, and the -b BITS is less than or equal to the cutoff
	described below.

Most of these changes were made by James A. Woods (ames!jaw).  Thank you
James!

To compile compress:

	cc -O -DUSERMEM=usermem -o compress compress.c

Where "usermem" is the amount of physical user memory available (in bytes).  
If any physical memory is to be reserved for other processes, put in 
"-DSACREDMEM sacredmem", where "sacredmem" is the amount to be reserved.

The difference "usermem-sacredmem" determines the maximum BITS that can be
specified, and the cutoff bits where the large+fast table is used.

memory: at least		BITS		cutoff
------  -- -----                ----            ------
   4,718,592 			 16		  13
   2,621,440 			 16		  12
   1,572,864			 16		  11
   1,048,576			 16		  10
     631,808			 16               --
     329,728			 15               --
     178,176			 14		  --
      99,328			 13		  --
           0			 12		  --

The default memory size is 750,000 which gives a maximum BITS=16 and no
large+fast table.

The maximum bits can be overruled by specifying "-DBITS=bits" at
compilation time.

If your machine doesn't support unsigned characters, define "NO_UCHAR" 
when compiling.

If your machine has "int" as 16-bits, define "SHORT_INT" when compiling.

After compilation, move "compress" to a standard executable location, such 
as /usr/local.  Then:
	cd /usr/local
	ln compress uncompress
	ln compress zcat

On machines that have a fixed stack size (such as Perkin-Elmer), set the
stack to at least 12kb.  ("setstack compress 12" on Perkin-Elmer).

Next, install the manual (compress.l).
	cp compress.l /usr/man/manl
	cd /usr/man/manl
	ln compress.l uncompress.l
	ln compress.l zcat.l

		- or -

	cp compress.l /usr/man/man1/compress.1
	cd /usr/man/man1
	ln compress.1 uncompress.1
	ln compress.1 zcat.1

					regards,
					petsd!joe

Here is a note from the net:

>From hplabs!pesnta!amd!turtlevax!ken Sat Jan  5 03:35:20 1985
Path: ames!hplabs!pesnta!amd!turtlevax!ken
From: ken@turtlevax.UUCP (Ken Turkowski)
Newsgroups: net.sources
Subject: Re: Compress release 3.0 : sample Makefile
Organization: CADLINC, Inc. @ Menlo Park, CA

In the compress 3.0 source recently posted to mod.sources, there is a
#define variable which can be set for optimum performance on a machine
with a large amount of memory.  A program (usermem) to calculate the
usable amount of physical user memory is enclosed, as well as a sample
4.2BSD Vax Makefile for compress.

Here is the README file from the previous version of compress (2.0):

>Enclosed is compress.c version 2.0 with the following bugs fixed:
>
>1.	The packed files produced by compress are different on different
>	machines and dependent on the vax sysgen option.
>		The bug was in the different byte/bit ordering on the
>		various machines.  This has been fixed.
>
>		This version is NOT compatible with the original vax posting
>		unless the '-DCOMPATIBLE' option is specified to the C
>		compiler.  The original posting has a bug which I fixed, 
>		causing incompatible files.  I recommend you NOT to use this
>		option unless you already have a lot of packed files from
>		the original posting by Thomas.
>2.	The exit status is not well defined (on some machines) causing the
>	scripts to fail.
>		The exit status is now 0,1 or 2 and is documented in
>		compress.l.
>3.	The function getopt() is not available in all C libraries.
>		The function getopt() is no longer referenced by the
>		program.
>4.	Error status is not being checked on the fwrite() and fflush() calls.
>		Fixed.
>
>The following enhancements have been made:
>
>1.	Added facilities of "compact" into the compress program.  "Pack",
>	"Unpack", and "Pcat" are no longer required (no longer supplied).
>2.	Installed work around for C compiler bug with "-O".
>3.	Added a magic number header (\037\235).  Put the bits specified
>	in the file.
>4.	Added "-f" flag to force overwrite of output file.
>5.	Added "-c" flag and "zcat" program.  'ln compress zcat' after you
>	compile.
>6.	The 'uncompress' script has been deleted; simply 
>	'ln compress uncompress' after you compile and it will work.
>7.	Removed extra bit masking for machines that support unsigned
>	characters.  If your machine doesn't support unsigned characters,
>	define "NO_UCHAR" when compiling.
>
>Compile "compress.c" with "-O -o compress" flags.  Move "compress" to a
>standard executable location, such as /usr/local.  Then:
>	cd /usr/local
>	ln compress uncompress
>	ln compress zcat
>
>On machines that have a fixed stack size (such as Perkin-Elmer), set the
>stack to at least 12kb.  ("setstack compress 12" on Perkin-Elmer).
>
>Next, install the manual (compress.l).
>	cp compress.l /usr/man/manl		- or -
>	cp compress.l /usr/man/man1/compress.1
>
>Here is the README that I sent with my first posting:
>
>>Enclosed is a modified version of compress.c, along with scripts to make it
>>run identically to pack(1), unpack(1), and pcat(1).  Here is what I
>>(petsd!joe) and a colleague (petsd!peora!srd) did:
>>
>>1. Removed VAX dependencies.
>>2. Changed the struct to separate arrays; saves mucho memory.
>>3. Did comparisons in unsigned, where possible.  (Faster on Perkin-Elmer.)
>>4. Sorted the character next chain and changed the search to stop
>>prematurely.  This saves a lot on the execution time when compressing.
>>
>>This version is totally compatible with the original version.  Even though
>>lint(1) -p has no complaints about compress.c, it won't run on a 16-bit
>>machine, due to the size of the arrays.
>>
>>Here is the README file from the original author:
>> 
>>>Well, with all this discussion about file compression (for news batching
>>>in particular) going around, I decided to implement the text compression
>>>algorithm described in the June Computer magazine.  The author claimed
>>>blinding speed and good compression ratios.  It's certainly faster than
>>>compact (but, then, what wouldn't be), but it's also the same speed as
>>>pack, and gets better compression than both of them.  On 350K bytes of
>>>Unix-wizards, compact took about 8 minutes of CPU, pack took about 80
>>>seconds, and compress (herein) also took 80 seconds.  But, compact and
>>>pack got about 30% compression, whereas compress got over 50%.  So, I
>>>decided I had something, and that others might be interested, too.
>>>
>>>As is probably true of compact and pack (although I haven't checked),
>>>the byte order within a word is probably relevant here, but as long as
>>>you stay on a single machine type, you should be ok.  (Can anybody
>>>elucidate on this?)  There are a couple of asm's in the code (extv and
>>>insv instructions), so anyone porting it to another machine will have to
>>>deal with this anyway (and could probably make it compatible with Vax
>>>byte order at the same time).  Anyway, I've linted the code (both with
>>>and without -p), so it should run elsewhere.  Note the longs in the
>>>code, you can take these out if you reduce BITS to <= 15.
>>>
>>>Have fun, and as always, if you make good enhancements, or bug fixes,
>>>I'd like to see them.
>>>
>>>=Spencer (thomas@utah-20, {harpo,hplabs,arizona}!utah-cs!thomas)
>>
>>					regards,
>>					joe
>>
>>--
>>Full-Name:  Joseph M. Orost
>>UUCP:       ..!{decvax,ucbvax,ihnp4}!vax135!petsd!joe
>>US Mail:    MS 313; Perkin-Elmer; 106 Apple St; Tinton Falls, NJ 07724
>>Phone:      (201) 870-5844
Commit	Line	Data
64d2f73f A	1
	2	@(#)README 8.1 (Berkeley) 6/9/93
	3	$FreeBSD: src/usr.bin/compress/doc/README,v 1.3 2002/12/30 21:18:11 schweikh Exp $
	4
	5	Compress version 4.0 improvements over 3.0:
	6	o compress() speedup (10-50%) by changing division hash to xor
	7	o decompress() speedup (5-10%)
	8	o Memory requirements reduced (3-30%)
	9	o Stack requirements reduced to less than 4kb
	10	o Removed 'Big+Fast' compress code (FBITS) because of compress speedup
	11	o Portability mods for Z8000 and PC/XT (but not zeus 3.2)
	12	o Default to 'quiet' mode
	13	o Unification of 'force' flags
	14	o Manual page overhaul
	15	o Portability enhancement for M_XENIX
	16	o Removed text on #else and #endif
	17	o Added "-V" switch to print version and options
	18	o Added #defines for SIGNED_COMPARE_SLOW
	19	o Added Makefile and "usermem" program
	20	o Removed all floating point computations
	21	o New programs: [deleted]
	22
	23	The "usermem" script attempts to determine the maximum process size. Some
	24	editing of the script may be necessary (see the comments). [It should work
	25	fine on 4.3 BSD.] If you can't get it to work at all, just create file
	26	"USERMEM" containing the maximum process size in decimal.
	27
	28	The following preprocessor symbols control the compilation of "compress.c":
	29
	30	o USERMEM Maximum process memory on the system
	31	o SACREDMEM Amount to reserve for other processes
	32	o SIGNED_COMPARE_SLOW Unsigned compare instructions are faster
	33	o NO_UCHAR Don't use "unsigned char" types
	34	o BITS Overrules default set by USERMEM-SACREDMEM
	35	o vax Generate inline assembler
	36	o interdata Defines SIGNED_COMPARE_SLOW
	37	o M_XENIX Makes arrays < 65536 bytes each
	38	o pdp11 BITS=12, NO_UCHAR
	39	o z8000 BITS=12
	40	o pcxt BITS=12
	41	o BSD4_2 Allow long filenames ( > 14 characters) &
	42	Call setlinebuf(stderr)
	43
	44	The difference "usermem-sacredmem" determines the maximum BITS that can be
	45	specified with the "-b" flag.
	46
	47	memory: at least BITS
	48	------ -- ----- ----
	49	433,484 16
	50	229,600 15
	51	127,536 14
	52	73,464 13
	53	0 12
	54
	55	The default is BITS=16.
	56
	57	The maximum bits can be overruled by specifying "-DBITS=bits" at
	58	compilation time.
	59
	60	WARNING: files compressed on a large machine with more bits than allowed by
	61	a version of compress on a smaller machine cannot be decompressed! Use the
	62	"-b12" flag to generate a file on a large machine that can be uncompressed
	63	on a 16-bit machine.
	64
65	The output of compress 4.0 is fully compatible with that of compress 3.0.
66	In other words, the output of compress 4.0 may be fed into uncompress 3.0 or
67	the output of compress 3.0 may be fed into uncompress 4.0.
68
69	The output of compress 4.0 not compatible with that of
70	compress 2.0. However, compress 4.0 still accepts the output of
71	compress 2.0. To generate output that is compatible with compress
72	2.0, use the undocumented "-C" flag.
73
74	-from mod.sources, submitted by vax135!petsd!joe (Joe Orost), 8/1/85
75	--------------------------------
76
77	Enclosed is compress version 3.0 with the following changes:
78
79	1. "Block" compression is performed. After the BITS run out, the
80	compression ratio is checked every so often. If it is decreasing,
81	the table is cleared and a new set of substrings are generated.
82
83	This makes the output of compress 3.0 not compatible with that of
84	compress 2.0. However, compress 3.0 still accepts the output of
85	compress 2.0. To generate output that is compatible with compress
86	2.0, use the undocumented "-C" flag.
87
88	2. A quiet "-q" flag has been added for use by the news system.
89
90	3. The character chaining has been deleted and the program now uses
91	hashing. This improves the speed of the program, especially
92	during decompression. Other speed improvements have been made,
93	such as using putc() instead of fwrite().
94
95	4. A large table is used on large machines when a relatively small
96	number of bits is specified. This saves much time when compressing
97	for a 16-bit machine on a 32-bit virtual machine. Note that the
98	speed improvement only occurs when the input file is > 30000
99	characters, and the -b BITS is less than or equal to the cutoff
100	described below.
101
102	Most of these changes were made by James A. Woods (ames!jaw). Thank you
103	James!
104
105	To compile compress:
106
107	cc -O -DUSERMEM=usermem -o compress compress.c
108
109	Where "usermem" is the amount of physical user memory available (in bytes).
110	If any physical memory is to be reserved for other processes, put in
111	"-DSACREDMEM sacredmem", where "sacredmem" is the amount to be reserved.
112
113	The difference "usermem-sacredmem" determines the maximum BITS that can be
114	specified, and the cutoff bits where the large+fast table is used.
115
116	memory: at least BITS cutoff
117	------ -- ----- ---- ------
118	4,718,592 16 13
119	2,621,440 16 12
120	1,572,864 16 11
121	1,048,576 16 10
122	631,808 16 --
123	329,728 15 --
124	178,176 14 --
125	99,328 13 --
126	0 12 --
127
128	The default memory size is 750,000 which gives a maximum BITS=16 and no
129	large+fast table.
130
131	The maximum bits can be overruled by specifying "-DBITS=bits" at
132	compilation time.
133
134	If your machine doesn't support unsigned characters, define "NO_UCHAR"
135	when compiling.
136
137	If your machine has "int" as 16-bits, define "SHORT_INT" when compiling.
138
139	After compilation, move "compress" to a standard executable location, such
140	as /usr/local. Then:
141	cd /usr/local
142	ln compress uncompress
143	ln compress zcat
144
145	On machines that have a fixed stack size (such as Perkin-Elmer), set the
146	stack to at least 12kb. ("setstack compress 12" on Perkin-Elmer).
147
148	Next, install the manual (compress.l).
149	cp compress.l /usr/man/manl
150	cd /usr/man/manl
151	ln compress.l uncompress.l
152	ln compress.l zcat.l
153
154	- or -
155
156	cp compress.l /usr/man/man1/compress.1
157	cd /usr/man/man1
158	ln compress.1 uncompress.1
159	ln compress.1 zcat.1
160
161	regards,
162	petsd!joe
163
164	Here is a note from the net:
165
166	>From hplabs!pesnta!amd!turtlevax!ken Sat Jan 5 03:35:20 1985
167	Path: ames!hplabs!pesnta!amd!turtlevax!ken
168	From: ken@turtlevax.UUCP (Ken Turkowski)
169	Newsgroups: net.sources
170	Subject: Re: Compress release 3.0 : sample Makefile
171	Organization: CADLINC, Inc. @ Menlo Park, CA
172
173	In the compress 3.0 source recently posted to mod.sources, there is a
174	#define variable which can be set for optimum performance on a machine
175	with a large amount of memory. A program (usermem) to calculate the
176	usable amount of physical user memory is enclosed, as well as a sample
177	4.2BSD Vax Makefile for compress.
178
179	Here is the README file from the previous version of compress (2.0):
180
181	>Enclosed is compress.c version 2.0 with the following bugs fixed:
182	>
183	>1. The packed files produced by compress are different on different
184	> machines and dependent on the vax sysgen option.
185	> The bug was in the different byte/bit ordering on the
186	> various machines. This has been fixed.
187	>
188	> This version is NOT compatible with the original vax posting
189	> unless the '-DCOMPATIBLE' option is specified to the C
190	> compiler. The original posting has a bug which I fixed,
191	> causing incompatible files. I recommend you NOT to use this
192	> option unless you already have a lot of packed files from
193	> the original posting by Thomas.
194	>2. The exit status is not well defined (on some machines) causing the
195	> scripts to fail.
196	> The exit status is now 0,1 or 2 and is documented in
197	> compress.l.
198	>3. The function getopt() is not available in all C libraries.
199	> The function getopt() is no longer referenced by the
200	> program.
201	>4. Error status is not being checked on the fwrite() and fflush() calls.
202	> Fixed.
203	>
204	>The following enhancements have been made:
205	>
206	>1. Added facilities of "compact" into the compress program. "Pack",
207	> "Unpack", and "Pcat" are no longer required (no longer supplied).
208	>2. Installed work around for C compiler bug with "-O".
209	>3. Added a magic number header (\037\235). Put the bits specified
210	> in the file.
211	>4. Added "-f" flag to force overwrite of output file.
212	>5. Added "-c" flag and "zcat" program. 'ln compress zcat' after you
213	> compile.
214	>6. The 'uncompress' script has been deleted; simply
215	> 'ln compress uncompress' after you compile and it will work.
216	>7. Removed extra bit masking for machines that support unsigned
217	> characters. If your machine doesn't support unsigned characters,
218	> define "NO_UCHAR" when compiling.
219	>
220	>Compile "compress.c" with "-O -o compress" flags. Move "compress" to a
221	>standard executable location, such as /usr/local. Then:
222	> cd /usr/local
223	> ln compress uncompress
224	> ln compress zcat
225	>
226	>On machines that have a fixed stack size (such as Perkin-Elmer), set the
227	>stack to at least 12kb. ("setstack compress 12" on Perkin-Elmer).
228	>
229	>Next, install the manual (compress.l).
230	> cp compress.l /usr/man/manl - or -
231	> cp compress.l /usr/man/man1/compress.1
232	>
233	>Here is the README that I sent with my first posting:
234	>
235	>>Enclosed is a modified version of compress.c, along with scripts to make it
236	>>run identically to pack(1), unpack(1), and pcat(1). Here is what I
237	>>(petsd!joe) and a colleague (petsd!peora!srd) did:
238	>>
239	>>1. Removed VAX dependencies.
240	>>2. Changed the struct to separate arrays; saves mucho memory.
241	>>3. Did comparisons in unsigned, where possible. (Faster on Perkin-Elmer.)
242	>>4. Sorted the character next chain and changed the search to stop
243	>>prematurely. This saves a lot on the execution time when compressing.
244	>>
245	>>This version is totally compatible with the original version. Even though
246	>>lint(1) -p has no complaints about compress.c, it won't run on a 16-bit
247	>>machine, due to the size of the arrays.
248	>>
249	>>Here is the README file from the original author:
250	>>
251	>>>Well, with all this discussion about file compression (for news batching
252	>>>in particular) going around, I decided to implement the text compression
253	>>>algorithm described in the June Computer magazine. The author claimed
254	>>>blinding speed and good compression ratios. It's certainly faster than
255	>>>compact (but, then, what wouldn't be), but it's also the same speed as
256	>>>pack, and gets better compression than both of them. On 350K bytes of
257	>>>Unix-wizards, compact took about 8 minutes of CPU, pack took about 80
258	>>>seconds, and compress (herein) also took 80 seconds. But, compact and
259	>>>pack got about 30% compression, whereas compress got over 50%. So, I
260	>>>decided I had something, and that others might be interested, too.
261	>>>
262	>>>As is probably true of compact and pack (although I haven't checked),
263	>>>the byte order within a word is probably relevant here, but as long as
264	>>>you stay on a single machine type, you should be ok. (Can anybody
265	>>>elucidate on this?) There are a couple of asm's in the code (extv and
266	>>>insv instructions), so anyone porting it to another machine will have to
267	>>>deal with this anyway (and could probably make it compatible with Vax
268	>>>byte order at the same time). Anyway, I've linted the code (both with
269	>>>and without -p), so it should run elsewhere. Note the longs in the
270	>>>code, you can take these out if you reduce BITS to <= 15.
271	>>>
272	>>>Have fun, and as always, if you make good enhancements, or bug fixes,
273	>>>I'd like to see them.
274	>>>
275	>>>=Spencer (thomas@utah-20, {harpo,hplabs,arizona}!utah-cs!thomas)
276	>>
277	>> regards,
278	>> joe
279	>>
280	>>--
281	>>Full-Name: Joseph M. Orost
282	>>UUCP: ..!{decvax,ucbvax,ihnp4}!vax135!petsd!joe
283	>>US Mail: MS 313; Perkin-Elmer; 106 Apple St; Tinton Falls, NJ 07724
284	>>Phone: (201) 870-5844