Fix a couple of spelling mistakes in the documentation.
[wxWidgets.git] / docs / doxygen / overviews / archive.h
1 /////////////////////////////////////////////////////////////////////////////
2 // Name: archive.h
3 // Purpose: topic overview
4 // Author: wxWidgets team
5 // Licence: wxWindows licence
6 /////////////////////////////////////////////////////////////////////////////
7
8 /**
9
10 @page overview_archive Archive Formats
11
12 @tableofcontents
13
14 The archive classes handle archive formats such as zip, tar, rar and cab.
15 Currently wxZip, wxTar and wxZlib classes are included.
16
17 For each archive type, there are the following classes (using zip here as an
18 example):
19
20 @li wxZipInputStream: Input stream
21 @li wxZipOutputStream: Output stream
22 @li wxZipEntry: Holds meta-data for an entry (e.g. filename, timestamp, etc.)
23
24 There are also abstract wxArchive classes that can be used to write code that
25 can handle any of the archive types, see @ref overview_archive_generic.
26
27 Also see wxFileSystem for a higher level interface that can handle archive
28 files in a generic way.
29
30 The classes are designed to handle archives on both seekable streams such as
31 disk files, or non-seekable streams such as pipes and sockets (see
32 @ref overview_archive_noseek).
33
34
35
36 @section overview_archive_create Creating an Archive
37
38 Call wxArchiveOutputStream::PutNextEntry() to create each new entry in the
39 archive, then write the entry's data. Another call to PutNextEntry() closes the
40 current entry and begins the next. For example:
41
42 @code
43 wxFFileOutputStream out(wxT("test.zip"));
44 wxZipOutputStream zip(out);
45 wxTextOutputStream txt(zip);
46 wxString sep(wxFileName::GetPathSeparator());
47
48 zip.PutNextEntry(wxT("entry1.txt"));
49 txt << wxT("Some text for entry1.txt\n");
50
51 zip.PutNextEntry(wxT("subdir") + sep + wxT("entry2.txt"));
52 txt << wxT("Some text for subdir/entry2.txt\n");
53 @endcode
54
55 The name of each entry can be a full path, which makes it possible to store
56 entries in subdirectories.
57
58
59 @section overview_archive_extract Extracting an Archive
60
61 wxArchiveInputStream::GetNextEntry() returns a pointer to entry object
62 containing the meta-data for the next entry in the archive (and gives away
63 ownership).
64
65 Reading from the input stream then returns the entry's data. Eof() becomes
66 @true after an attempt has been made to read past the end of the entry's data.
67
68 When there are no more entries, GetNextEntry() returns @NULL and sets Eof().
69
70 @code
71 auto_ptr<wxZipEntry> entry;
72
73 wxFFileInputStream in(wxT("test.zip"));
74 wxZipInputStream zip(in);
75
76 while (entry.reset(zip.GetNextEntry()), entry.get() != NULL)
77 {
78 // access meta-data
79 wxString name = entry->GetName();
80 // read 'zip' to access the entry's data
81 }
82 @endcode
83
84
85
86 @section overview_archive_modify Modifying an Archive
87
88 To modify an existing archive, write a new copy of the archive to a new file,
89 making any necessary changes along the way and transferring any unchanged
90 entries using wxArchiveOutputStream::CopyEntry().
91
92 For archive types which compress entry data, CopyEntry() is likely to be much
93 more efficient than transferring the data using Read() and Write() since it
94 will copy them without decompressing and recompressing them.
95
96 In general modifications are not possible without rewriting the archive, though
97 it may be possible in some limited cases. Even then, rewriting the archive is
98 usually a better choice since a failure can be handled without losing the whole
99 archive. wxTempFileOutputStream can be helpful to do this.
100
101 For example to delete all entries matching the pattern "*.txt":
102
103 @code
104 auto_ptr<wxFFileInputStream> in(new wxFFileInputStream(wxT("test.zip")));
105 wxTempFileOutputStream out(wxT("test.zip"));
106
107 wxZipInputStream inzip(*in);
108 wxZipOutputStream outzip(out);
109
110 auto_ptr<wxZipEntry> entry;
111
112 // transfer any meta-data for the archive as a whole (the zip comment
113 // in the case of zip)
114 outzip.CopyArchiveMetaData(inzip);
115
116 // call CopyEntry for each entry except those matching the pattern
117 while (entry.reset(inzip.GetNextEntry()), entry.get() != NULL)
118 if (!entry->GetName().Matches(wxT("*.txt")))
119 if (!outzip.CopyEntry(entry.release(), inzip))
120 break;
121
122 // close the input stream by releasing the pointer to it, do this
123 // before closing the output stream so that the file can be replaced
124 in.reset();
125
126 // you can check for success as follows
127 bool success = inzip.Eof() && outzip.Close() && out.Commit();
128 @endcode
129
130
131
132 @section overview_archive_byname Looking Up an Archive Entry by Name
133
134 Also see wxFileSystem for a higher level interface that is more convenient for
135 accessing archive entries by name.
136
137 To open just one entry in an archive, the most efficient way is to simply
138 search for it linearly by calling wxArchiveInputStream::GetNextEntry() until
139 the required entry is found. This works both for archives on seekable and
140 non-seekable streams.
141
142 The format of filenames in the archive is likely to be different from the local
143 filename format. For example zips and tars use unix style names, with forward
144 slashes as the path separator, and absolute paths are not allowed. So if on
145 Windows the file "C:\MYDIR\MYFILE.TXT" is stored, then when reading the entry
146 back wxArchiveEntry::GetName() will return "MYDIR\MYFILE.TXT". The conversion
147 into the internal format and back has lost some information.
148
149 So to avoid ambiguity when searching for an entry matching a local name, it is
150 better to convert the local name to the archive's internal format and search
151 for that:
152
153 @code
154 auto_ptr<wxZipEntry> entry;
155
156 // convert the local name we are looking for into the internal format
157 wxString name = wxZipEntry::GetInternalName(localname);
158
159 // open the zip
160 wxFFileInputStream in(wxT("test.zip"));
161 wxZipInputStream zip(in);
162
163 // call GetNextEntry() until the required internal name is found
164 do
165 {
166 entry.reset(zip.GetNextEntry());
167 }
168 while (entry.get() != NULL && entry->GetInternalName() != name);
169
170 if (entry.get() != NULL)
171 {
172 // read the entry's data...
173 }
174 @endcode
175
176 To access several entries randomly, it is most efficient to transfer the entire
177 catalogue of entries to a container such as a std::map or a wxHashMap then
178 entries looked up by name can be opened using the
179 wxArchiveInputStream::OpenEntry() method.
180
181 @code
182 WX_DECLARE_STRING_HASH_MAP(wxZipEntry*, ZipCatalog);
183 ZipCatalog::iterator it;
184 wxZipEntry *entry;
185 ZipCatalog cat;
186
187 // open the zip
188 wxFFileInputStream in(wxT("test.zip"));
189 wxZipInputStream zip(in);
190
191 // load the zip catalog
192 while ((entry = zip.GetNextEntry()) != NULL)
193 {
194 wxZipEntry*& current = cat[entry->GetInternalName()];
195 // some archive formats can have multiple entries with the same name
196 // (e.g. tar) though it is an error in the case of zip
197 delete current;
198 current = entry;
199 }
200
201 // open an entry by name
202 if ((it = cat.find(wxZipEntry::GetInternalName(localname))) != cat.end())
203 {
204 zip.OpenEntry(*it->second);
205 // ... now read entry's data
206 }
207 @endcode
208
209 To open more than one entry simultaneously you need more than one underlying
210 stream on the same archive:
211
212 @code
213 // opening another entry without closing the first requires another
214 // input stream for the same file
215 wxFFileInputStream in2(wxT("test.zip"));
216 wxZipInputStream zip2(in2);
217 if ((it = cat.find(wxZipEntry::GetInternalName(local2))) != cat.end())
218 zip2.OpenEntry(*it->second);
219 @endcode
220
221
222
223 @section overview_archive_generic Generic Archive Programming
224
225 Also see wxFileSystem for a higher level interface that can handle archive
226 files in a generic way.
227
228 The specific archive classes, such as the wxZip classes, inherit from the
229 following abstract classes which can be used to write code that can handle any
230 of the archive types:
231
232 @li wxArchiveInputStream: Input stream
233 @li wxArchiveOutputStream: Output stream
234 @li wxArchiveEntry: Holds the meta-data for an entry (e.g. filename)
235
236 In order to able to write generic code it's necessary to be able to create
237 instances of the classes without knowing which archive type is being used.
238
239 To allow this there is a class factory for each archive type, derived from
240 wxArchiveClassFactory, that can create the other classes.
241
242 For example, given wxArchiveClassFactory* factory, streams and entries can be
243 created like this:
244
245 @code
246 // create streams without knowing their type
247 auto_ptr<wxArchiveInputStream> inarc(factory->NewStream(in));
248 auto_ptr<wxArchiveOutputStream> outarc(factory->NewStream(out));
249
250 // create an empty entry object
251 auto_ptr<wxArchiveEntry> entry(factory->NewEntry());
252 @endcode
253
254 For the factory itself, the static member wxArchiveClassFactory::Find() can be
255 used to find a class factory that can handle a given file extension or mime
256 type. For example, given @e filename:
257
258 @code
259 const wxArchiveClassFactory *factory;
260 factory = wxArchiveClassFactory::Find(filename, wxSTREAM_FILEEXT);
261
262 if (factory)
263 stream = factory->NewStream(new wxFFileInputStream(filename));
264 @endcode
265
266 @e Find() does not give away ownership of the returned pointer, so it does not
267 need to be deleted.
268
269 There are similar class factories for the filter streams that handle the
270 compression and decompression of a single stream, such as wxGzipInputStream.
271 These can be found using wxFilterClassFactory::Find().
272
273 For example, to list the contents of archive @e filename:
274
275 @code
276 auto_ptr<wxInputStream> in(new wxFFileInputStream(filename));
277
278 if (in->IsOk())
279 {
280 // look for a filter handler, e.g. for '.gz'
281 const wxFilterClassFactory *fcf;
282 fcf = wxFilterClassFactory::Find(filename, wxSTREAM_FILEEXT);
283 if (fcf)
284 {
285 in.reset(fcf->NewStream(in.release()));
286 // pop the extension, so if it was '.tar.gz' it is now just '.tar'
287 filename = fcf->PopExtension(filename);
288 }
289
290 // look for a archive handler, e.g. for '.zip' or '.tar'
291 const wxArchiveClassFactory *acf;
292 acf = wxArchiveClassFactory::Find(filename, wxSTREAM_FILEEXT);
293 if (acf)
294 {
295 auto_ptr<wxArchiveInputStream> arc(acf->NewStream(in.release()));
296 auto_ptr<wxArchiveEntry> entry;
297
298 // list the contents of the archive
299 while ((entry.reset(arc->GetNextEntry())), entry.get() != NULL)
300 std::wcout << entry->GetName().c_str() << "\n";
301 }
302 else
303 {
304 wxLogError(wxT("can't handle '%s'"), filename.c_str());
305 }
306 }
307 @endcode
308
309
310
311 @section overview_archive_noseek Archives on Non-Seekable Streams
312
313 In general, handling archives on non-seekable streams is done in the same way
314 as for seekable streams, with a few caveats.
315
316 The main limitation is that accessing entries randomly using
317 wxArchiveInputStream::OpenEntry() is not possible, the entries can only be
318 accessed sequentially in the order they are stored within the archive.
319
320 For each archive type, there will also be other limitations which will depend
321 on the order the entries' meta-data is stored within the archive. These are not
322 too difficult to deal with, and are outlined below.
323
324 @subsection overview_archive_noseek_entrysize PutNextEntry and the Entry Size
325
326 When writing archives, some archive formats store the entry size before the
327 entry's data (tar has this limitation, zip doesn't). In this case the entry's
328 size must be passed to wxArchiveOutputStream::PutNextEntry() or an error
329 occurs.
330
331 This is only an issue on non-seekable streams, since otherwise the archive
332 output stream can seek back and fix up the header once the size of the entry is
333 known.
334
335 For generic programming, one way to handle this is to supply the size whenever
336 it is known, and rely on the error message from the output stream when the
337 operation is not supported.
338
339 @subsection overview_archive_noseek_weak GetNextEntry and the Weak Reference Mechanism
340
341 Some archive formats do not store all an entry's meta-data before the entry's
342 data (zip is an example). In this case, when reading from a non-seekable
343 stream, wxArchiveInputStream::GetNextEntry() can only return a partially
344 populated wxArchiveEntry object - not all the fields are set.
345
346 The input stream then keeps a weak reference to the entry object and updates it
347 when more meta-data becomes available. A weak reference being one that does not
348 prevent you from deleting the wxArchiveEntry object - the input stream only
349 attempts to update it if it is still around.
350
351 The documentation for each archive entry type gives the details of what
352 meta-data becomes available and when. For generic programming, when the worst
353 case must be assumed, you can rely on all the fields of wxArchiveEntry being
354 fully populated when GetNextEntry() returns, with the following exceptions:
355
356 @li wxArchiveEntry::GetSize(): Guaranteed to be available after the entry has
357 been read to wxInputStream::Eof(), or wxArchiveInputStream::CloseEntry()
358 has been called.
359 @li wxArchiveEntry::IsReadOnly(): Guaranteed to be available after the end of
360 the archive has been reached, i.e. after GetNextEntry() returns @NULL and
361 Eof() is @true.
362
363 This mechanism allows wxArchiveOutputStream::CopyEntry() to always fully
364 preserve entries' meta-data. No matter what order order the meta-data occurs
365 within the archive, the input stream will always have read it before the output
366 stream must write it.
367
368 @subsection overview_archive_noseek_notifier wxArchiveNotifier
369
370 Notifier objects can be used to get a notification whenever an input stream
371 updates a wxArchiveEntry object's data via the weak reference mechanism.
372
373 Consider the following code which renames an entry in an archive. This is the
374 usual way to modify an entry's meta-data, simply set the required field before
375 writing it with wxArchiveOutputStream::CopyEntry():
376
377 @code
378 auto_ptr<wxArchiveInputStream> arc(factory->NewStream(in));
379 auto_ptr<wxArchiveOutputStream> outarc(factory->NewStream(out));
380 auto_ptr<wxArchiveEntry> entry;
381
382 outarc->CopyArchiveMetaData(*arc);
383
384 while (entry.reset(arc->GetNextEntry()), entry.get() != NULL)
385 {
386 if (entry->GetName() == from)
387 entry->SetName(to);
388 if (!outarc->CopyEntry(entry.release(), *arc))
389 break;
390 }
391
392 bool success = arc->Eof() && outarc->Close();
393 @endcode
394
395 However, for non-seekable streams, this technique cannot be used for fields
396 such as wxArchiveEntry::IsReadOnly(), which are not necessarily set when
397 wxArchiveInputStream::GetNextEntry() returns.
398
399 In this case a wxArchiveNotifier can be used:
400
401 @code
402 class MyNotifier : public wxArchiveNotifier
403 {
404 public:
405 void OnEntryUpdated(wxArchiveEntry& entry) { entry.SetIsReadOnly(false); }
406 };
407 @endcode
408
409 The meta-data changes are done in your notifier's
410 wxArchiveNotifier::OnEntryUpdated() method, then wxArchiveEntry::SetNotifier()
411 is called before CopyEntry():
412
413 @code
414 auto_ptr<wxArchiveInputStream> arc(factory->NewStream(in));
415 auto_ptr<wxArchiveOutputStream> outarc(factory->NewStream(out));
416 auto_ptr<wxArchiveEntry> entry;
417 MyNotifier notifier;
418
419 outarc->CopyArchiveMetaData(*arc);
420
421 while (entry.reset(arc->GetNextEntry()), entry.get() != NULL)
422 {
423 entry->SetNotifier(notifier);
424 if (!outarc->CopyEntry(entry.release(), *arc))
425 break;
426 }
427
428 bool success = arc->Eof() && outarc->Close();
429 @endcode
430
431 SetNotifier() calls OnEntryUpdated() immediately, then the input stream calls
432 it again whenever it sets more fields in the entry. Since OnEntryUpdated() will
433 be called at least once, this technique always works even when it is not
434 strictly necessary to use it. For example, changing the entry name can be done
435 this way too and it works on seekable streams as well as non-seekable.
436
437 */
438