]> git.saurik.com Git - apt.git/blame - doc/cache.sgml
Fix for 2.7.2 compiler
[apt.git] / doc / cache.sgml
CommitLineData
578bfd0a
AL
1<!doctype debiandoc system>
2<!-- -*- mode: sgml; mode: fold -*- -->
3<book>
4<title>APT Cache File Format</title>
5
6<author>Jason Gunthorpe <email>jgg@debian.org</email></author>
7<version>$Id: cache.sgml,v 1.1 1998/07/02 02:58:12 jgg Exp $</version>
8
9<abstract>
10This document describes the complete implementation and format of the APT
11Cache file. The APT Cache file is a way for APT to parse and store a
12large number of package files for display in the UI. It's primary design
13goal is to make display of a single package in the tree very fast by
14pre-linking important things like dependencies and provides.
15
16The specification doubles as documentation for one of the in-memory
17structures used by the package library and the APT GUI.
18
19</abstract>
20
21<copyright>
22Copyright &copy; Jason Gunthorpe, 1997.
23<p>
24APT and this document are free software; you can redistribute them and/or
25modify them under the terms of the GNU General Public License as published
26by the Free Software Foundation; either version 2 of the License, or (at your
27option) any later version.
28
29<p>
30For more details, on Debian GNU/Linux systems, see the file
31/usr/doc/copyright/GPL for the full license.
32</copyright>
33
34<toc sect>
35
36<chapt>Introduction
37<!-- Purpose {{{ -->
38<!-- ===================================================================== -->
39<sect>Purpose
40
41<p>
42This document describes the implementation of an architecture
43dependent binary cache file. The goal of this cache file is two fold,
44firstly to speed loading and processing of the package file array and
45secondly to reduce memory consumption of the package file array.
46
47<p>
48The implementation is aimed at an environment with many primary package
49files, for instance someone that has a Package file for their CD-ROM, a
50Package file for the latest version of the distribution on the CD-ROM and a
51package file for the development version. Always present is the information
52contained in the status file which might be considered a separate package
53file.
54
55<p>
56Please understand, this is designed as a -CACHE FILE- it is not ment to be
57used on any system other than the one it was created for. It is not ment to
58be authoritative either, ie if a system crash or software failure occures it
59must be perfectly acceptable for the cache file to be in an inconsistant
60state. Furthermore at any time the cache file may be erased without losing
61any information.
62
63<p>
64Also the structures and storage layout is optimized for use by the APT
65GUI and may not be suitable for all purposes. However it should be possible
66to extend it with associate cache files that contain other information.
67
68<p>
69To keep memory use down the cache file only contains often used fields and
70fields that are inexepensive to store, the Package file has a full list of
71fields. Also the client may assume that all items are perfectly valid and
72need not perform checks against their correctness. Removal of information
73from the cache is possible, but blanks will be left in the file, and
74unused strings will also be present. The recommended implementation is to
75simply rebuild the cache each time any of the data files change. It is
76possible to add a new package file to the cache without any negative side
77effects.
78
79<sect1>Note on Pointer access
80<p>
81Every item in every structure is stored as the index to that structure.
82What this means is that once the files is mmaped every data access has to
83go through a fixup stage to get a real memory pointer. This is done
84by taking the tndex, multiplying it by the type size and then adding
85it to the start address of the memory block. This sounds complex, but
86in C it is a single array dereference. Because all items are aligned to
87their size and indexs are stored as multiples of the size of the structure
88the format is immediately portable to all possible architectures - BUT the
89generated files are -NOT-.
90
91<p>
92This scheme allows code like this to be written:
93<example>
94 void *Map = mmap(...);
95 Package *PkgList = (Package *)Map;
96 Header *Head = (Header *)Map;
97 char *Strings = (char *)Map;
98 cout << (Strings + PkgList[Head->HashTable[0]]->Name) << endl;
99</example>
100<p>
101Notice the lack of casting or multiplication. The net result is to return
102the name of the first package in the first hash bucket, without error
103checks.
104
105<p>
106The generator uses allocation pools to group similarly sized structures in
107large blocks to eliminate any alignment overhead. The generator also
108assures that no structures overlap and all indexes are unique. Although
109at first glance it may seem like there is the potential for two structures
110to exist at the same point the generator never allows this to happen.
111(See the discussion of free space pools)
112 <!-- }}} -->
113
114<chapt>Structures
115<!-- Header {{{ -->
116<!-- ===================================================================== -->
117<sect>Header
118<p>
119This is the first item in the file.
120<example>
121 struct Header
122 {
123 // Signature information
124 unsigned long Signature;
125 short MajorVersion;
126 short MinorVersion;
127 bool Dirty;
128
129 // Size of structure values
130 unsigned short HeaderSz;
131 unsigned short PackageSz;
132 unsigned short PackageFileSz;
133 unsigned short VersionSz;
134 unsigned short DependencySz;
135 unsigned short ProvidesSz;
136
137 // Structure counts
138 unsigned long PackageCount;
139 unsigned long VersionCount;
140 unsigned long DependsCount;
141 unsigned long PackageFileCount;
142
143 // Offsets
144 unsigned long FileList; // PackageFile
145 unsigned long StringList; // StringItem
146
147 // Pool structures
148 unsigned long PoolStart[6];
149 unsigned long PoolSize[6];
150 unsigned long PoolAln[6];
151
152 // Package name lookup
153 unsigned long HashTable[512]; // Package
154 };
155</example>
156<taglist>
157<tag>Signature<item>
158This must contain the hex value 0x98FE76DC which is designed to verify
159that the system loading the image has the same byte order and byte size as
160the system saving the image
161
162<tag>MajorVersion
163<tag>MinorVersion<item>
164These contain the version of the cache file, currently 0.2.
165
166<tag>Dirty<item>
167Dirty is true if the cache file was opened for reading, the client expects
168to have written things to it and have not fully synced it. The file should
169be erased and rebuilt if it is true.
170
171<tag>HeaderSz
172<tag>PackageSz
173<tag>PackageFileSz
174<tag>VersionSz
175<tag>DependencySz
176<tag>ProvidesSz<item>
177*Sz contains the sizeof() that particular structure. It is used as an
178extra consistancy check on the structure of the file.
179
180If any of the size values do not exactly match what the client expects then
181the client should refuse the load the file.
182
183<tag>PackageCount
184<tag>VersionCount
185<tag>DependsCount
186<tag>PackageFileCount<item>
187These indicate the number of each structure contianed in the cache.
188PackageCount is especially usefull for generating user state structures.
189See Package::Id for more info.
190
191<tag>FileList<item>
192This contains the index of the first PackageFile structure. The PackageFile
193structures are singely linked lists that represent all package files that
194have been merged into the cache.
195
196<tag>StringList<item>
197This contains a list of all the unique strings (string item type strings) in
198the cache. The parser reads this list into memory so it can match strings
199against it.
200
201<tag>PoolStart
202<tag>PoolSize
203<tag>PoolAln<item>
204The Pool structures manage the allocation pools that the generator uses.
205Start indicates the first byte of the pool, Size is the number of bytes
206remaining in the pool and Aln (alignment) is the structure size of the pool.
207An Aln of 0 indicates the slot is empty. There should be the same number of
208slots as there are structure types. The generator stores this information
209so future additions can make use of any unused pool blocks.
210
211<tag>HashTable<item>
212HashTable is a hash table that provides indexing for all of the packages.
213Each package name is inserted into the hash table using the following has
214function:
215<example>
216 unsigned long Hash(string Str)
217 {
218 unsigned long Hash = 0;
219 for (const char *I = Str.begin(); I != Str.end(); I++)
220 Hash += *I * ((Str.end() - I + 1));
221 return Hash % _count(Head.HashTable);
222 }
223</example>
224<p>
225By iterating over each entry in the hash table it is possible to iterate over
226the entire list of packages. Hash Collisions are handled with a singely linked
227list of packages based at the hash item. The linked list contains only
228packages that macth the hashing function.
229
230</taglist>
231 <!-- }}} -->
232<!-- Package {{{ -->
233<!-- ===================================================================== -->
234<sect>Package
235<p>
236This contians information for a single unique package. There can be any
237number of versions of a given package. Package exists in a singly
238linked list of package records starting at the hash index of the name in
239the Header->HashTable.
240<example>
241 struct Pacakge
242 {
243 // Pointers
244 unsigned long Name; // Stringtable
245 unsigned long VersionList; // Version
246 unsigned long TargetVer; // Version
247 unsigned long CurrentVer; // Version
248 unsigned long TargetDist; // StringTable (StringItem)
249 unsigned long Section; // StringTable (StringItem)
250
251 // Linked lists
252 unsigned long NextPackage; // Package
253 unsigned long RevDepends; // Dependency
254 unsigned long ProvidesList; // Provides
255
256 // Install/Remove/Purge etc
257 unsigned char SelectedState; // What
258 unsigned char InstState; // Flags
259 unsigned char CurrentState; // State
260
261 // Unique ID for this pkg
262 unsigned short ID;
263 unsigned short Flags;
264 };
265</example>
266
267<taglist>
268<tag>Name<item>
269Name of the package.
270
271<tag>VersionList<item>
272Base of a singely linked list of version structures. Each structure
273represents a unique version of the package. The version structures
274contain links into PackageFile and the original text file as well as
275detailed infromation about the size and dependencies of the specific
276package. In this way multiple versions of a package can be cleanly handled
277by the system. Furthermore, this linked list is guarenteed to be sorted
278from Highest version to lowest version with no duplicate entries.
279
280<tag>TargetVer
281<tag>CurrentVer<item>
282This is an index (pointer) to the sub version that is being targeted for
283upgrading. CurrentVer is an index to the installed version, either can be
2840.
285
286<tag>TargetDist<item>
287This indicates the target distribution. Automatic upgrades should not go
288outside of the specified dist. If it is 0 then the global target dist should
289be used. The string should be contained in the StringItem list.
290
291<tag>Section<item>
292This indicates the deduced section. It should be "Unknown" or the section
293of the last parsed item.
294
295<tag>NextPackage<item>
296Next link in this hash item. This linked list is based at Header.HashTable
297and contains only packages with the same hash value.
298
299<tag>RevDepends<item>
300Reverse Depends is a linked list of all dependencies linked to this package.
301
302<tag>ProvidesList<item>
303This is a linked list of all provides for this package name.
304
305<tag>SelectedState
306<tag>InstState
307<tag>CurrentState<item>
308These corrispond to the 3 items in the Status field found in the status
309file. See the section on defines for the possible values.
310<p>
311SelectedState is the state that the user wishes the package to be
312in.
313<p>
314InstState is the installation state of the package. This normally
315should be Ok, but if the installation had an accident it may be otherwise.
316<p>
317CurrentState indicates if the package is installed, partially installed or
318not installed.
319
320<tag>ID<item>
321ID is a value from 0 to Header->PackageCount. It is a unique value assigned
322by the generator. This allows clients to create an array of size PackageCount
323and use it to store state information for the package map. For instance the
324status file emitter uses this to track which packages have been emitted
325already.
326
327<tag>Flags<item>
328Flags are some usefull indicators of the package's state.
329
330</taglist>
331
332 <!-- }}} -->
333<!-- PackageFile {{{ -->
334<!-- ===================================================================== -->
335<sect>PackageFile
336<p>
337This contians information for a single package file. Package files are
338referenced by Version structures. This is a singly linked list based from
339Header.FileList
340<example>
341 struct PackageFile
342 {
343 // Names
344 unsigned long FileName; // Stringtable
345 unsigned long Version; // Stringtable
346 unsigned long Distribution; // Stringtable
347 unsigned long Size;
348
349 // Linked list
350 unsigned long NextFile; // PackageFile
351 unsigned short ID;
352 unsigned short Flags;
353 time_t mtime; // Modification time
354 };
355</example>
356<taglist>
357
358<tag>FileName<item>
359Refers the the physical disk file that this PacakgeFile represents.
360
361<tag>Version<item>
362Version is the given version, ie 1.3.1, 2.4_revision_1 etc.
363
364<tag>Distribution<item>
365Distribution is the symbolic name for this PackageFile, hamm,bo,rexx etc
366
367<tag>Size<item>
368Size is provided as a simple check to ensure that the package file has not
369been altered.
370
371<tag>ID<item>
372See Package::ID.
373
374<tag>Flags<item>
375Provides some flags for the PackageFile, see the section on defines.
376
377<tag>mtime<item>
378Modification time for the file at time of cache generation.
379
380</taglist>
381
382 <!-- }}} -->
383<!-- Version {{{ -->
384<!-- ===================================================================== -->
385<sect>Version
386<p>
387This contians the information for a single version of a package. This is a
388singley linked list based from Package.Versionlist.
389
390<p>
391The version list is always sorted from highest version to lowest version by
392the generator. Also there may not be any duplicate entries in the list (same
393VerStr).
394
395<example>
396 struct Version
397 {
398 unsigned long VerStr; // Stringtable
399 unsigned long File; // PackageFile
400 unsigned long Section; // StringTable (StringItem)
401
402 // Lists
403 unsigned long NextVer; // Version
404 unsigned long DependsList; // Dependency
405 unsigned long ParentPkg; // Package
406 unsigned long ProvidesList; // Provides
407
408 unsigned long Offset;
409 unsigned long Size;
410 unsigned long InstalledSize;
411 unsigned short ID;
412 unsigned char Priority;
413 };
414</example>
415<taglist>
416
417<tag>VerStr<item>
418This is the complete version string.
419
420<tag>File<item>
421References the PackageFile that this version came out of. File can be used
422to determine what distribution the Version applies to. If File is 0 then
423this is a blank version. The structure should also have a 0 in all other
424fields excluding VerStr and Possibly NextVer.
425
426<tag>Section<item>
427This string indicates which section it is part of. The string should be
428contained in the StringItem list.
429
430<tag>NextVer<item>
431Next step in the linked list.
432
433<tag>DependsList<item>
434This is the base of the dependency list.
435
436<tag>ParentPkg<item>
437This links the version to the owning package, allowing reverse dependencies
438to determine the package.
439
440<tag>ProvidesList<item>
441Head of the linked list of Provides::NextPkgProv, forward provides.
442
443<tag>Offset<item>
444The byte offset of the first line of this item in the specified
445PackageFile
446
447<tag>Size
448<tag>InstalledSize<item>
449The archive size for this version. For debian this is the size of the .deb
450file. Installed size is the uncompressed size for this version
451
452<tag>ID<item>
453See Package::ID.
454
455<tag>Priority<item>
456This is the parsed priority value of the package.
457</taglist>
458
459 <!-- }}} -->
460<!-- Dependency {{{ -->
461<!-- ===================================================================== -->
462<sect>Dependency
463<p>
464Dependency contains the information for a single dependency record. The records
465are split up like this to ease processing by the client. The base of list
466linked list is Version.DependsList. All forms of dependencies are recorded
467here including Conflicts, Suggests and Recommends.
468
469<p>
470Multiple depends on the same package must be grouped together in
471the Dependency lists. Clients should assume this is always true.
472
473<example>
474 struct Dependency
475 {
476 unsigned long Version; // Stringtable
477 unsigned long Package; // Package
478 unsigned long NextDepends; // Dependency
479 unsigned long NextRevDepends; // Reverse dependency linking
480 unsigned long ParentVer; // Upwards parent version link
481
482 // Specific types of depends
483 unsigned char Type;
484 unsigned char CompareOp;
485 unsigned short ID;
486 };
487</example>
488<taglist>
489<tag>Version<item>
490The string form of the version that the dependency is applied against.
491
492<tag>Package<item>
493The index of the package file this depends applies to. If the package file
494does not already exist when the dependency is inserted a blank one (no
495version records) should be created.
496
497<tag>NextDepends<item>
498Linked list based off a Version structure of all the dependencies in that
499version.
500
501<tag>NextRevDepends<item>
502Reverse dependency linking, based off a Package structure. This linked list
503is a list of all packages that have a depends line for a given package.
504
505<tag>ParentVer<item>
506Parent version linking, allows the reverse dependency list to link
507back to the version and package that the dependency are for.
508
509<tag>Type<item>
510Describes weather it is depends, predepends, recommends, suggests, etc.
511
512<tag>CompareOp<item>
513Describes the comparison operator specified on the depends line. If the high
514bit is set then it is a logical or with the previous record.
515
516<tag>ID<item>
517See Package::ID.
518
519</taglist>
520
521 <!-- }}} -->
522<!-- Provides {{{ -->
523<!-- ===================================================================== -->
524<sect>Provides
525<p>
526Provides handles virtual packages. When a Provides: line is encountered
527a new provides record is added associating the package with a virtual
528package name. The provides structures are linked off the package structures.
529This simplifies the analysis of dependencies and other aspects A provides
530refers to a specific version of a specific package, not all versions need to
531provide that provides.
532
533<p>
534There is a linked list of provided package names started from each
535version that provides packages. This is the forwards provides mechanism.
536<example>
537 struct Provides
538 {
539 unsigned long ParentPkg; // Package
540 unsigned long Version; // Version
541 unsigned long ProvideVersion; // Stringtable
542 unsigned long NextProvides; // Provides
543 unsigned long NextPkgProv; // Provides
544 };
545</example>
546<taglist>
547<tag>ParentPkg<item>
548The index of the package that head of this linked list is in. ParentPkg->Name
549is the name of the provides.
550
551<tag>Version<item>
552The index of the version this provide line applies to.
553
554<tag>ProvideVersion<item>
555Each provides can specify a version in the provides line. This version allows
556dependencies to depend on specific versions of a Provides, as well as allowing
557Provides to override existing packages. This is experimental.
558
559<tag>NextProvides<item>
560Next link in the singly linked list of provides (based off package)
561
562<tag>NextPkgProv<item>
563Next link in the singly linked list of provides for 'Version'.
564
565</taglist>
566
567 <!-- }}} -->
568<!-- StringItem {{{ -->
569<!-- ===================================================================== -->
570<sect>StringItem
571<p>
572StringItem is used for generating single instances of strings. Some things
573like Section Name are are usefull to have as unique tags. It is part of
574a linked list based at Header::StringList.
575<example>
576 struct StringItem
577 {
578 unsigned long String; // Stringtable
579 unsigned long NextItem; // StringItem
580 };
581</example>
582<taglist>
583<tag>String<item>
584The string this refers to.
585
586<tag>NextItem<item>
587Next link in the chain.
588</taglist>
589 <!-- }}} -->
590<!-- StringTable {{{ -->
591<!-- ===================================================================== -->
592<sect>StringTable
593<p>
594All strings are simply inlined any place in the file that is natural for the
595writer. The client should make no assumptions about the positioning of
596strings. All stringtable values point to a byte offset from the start of the
597file that a null terminated string will begin.
598 <!-- }}} -->
599<!-- Defines {{{ -->
600<!-- ===================================================================== -->
601<sect>Defines
602<p>
603Several structures use variables to indicate things. Here is a list of all
604of them.
605
606<sect1>Definitions for Dependency::Type
607<p>
608<example>
609#define pkgDEP_Depends 1
610#define pkgDEP_PreDepends 2
611#define pkgDEP_Suggests 3
612#define pkgDEP_Recommends 4
613#define pkgDEP_Conflicts 5
614#define pkgDEP_Replaces 6
615</example>
616</sect1>
617
618<sect1>Definitions for Dependency::CompareOp
619<p>
620<example>
621#define pkgOP_OR 0x10
622#define pkgOP_LESSEQ 0x1
623#define pkgOP_GREATEREQ 0x2
624#define pkgOP_LESS 0x3
625#define pkgOP_GREATER 0x4
626#define pkgOP_EQUALS 0x5
627</example>
628The lower 4 bits are used to indicate what operator is being specified and
629the upper 4 bits are flags. pkgOP_OR indicates that the next package is
630or'd with the current package.
631</sect1>
632
633<sect1>Definitions for Package::SelectedState
634<p>
635<example>
636#define pkgSTATE_Unkown 0
637#define pkgSTATE_Install 1
638#define pkgSTATE_Hold 2
639#define pkgSTATE_DeInstall 3
640#define pkgSTATE_Purge 4
641</example>
642</sect1>
643
644<sect1>Definitions for Package::InstState
645<p>
646<example>
647#define pkgSTATE_Ok 0
648#define pkgSTATE_ReInstReq 1
649#define pkgSTATE_Hold 2
650#define pkgSTATE_HoldReInstReq 3
651</example>
652</sect1>
653
654<sect1>Definitions for Package::CurrentState
655<p>
656<example>
657#define pkgSTATE_NotInstalled 0
658#define pkgSTATE_UnPacked 1
659#define pkgSTATE_HalfConfigured 2
660#define pkgSTATE_UnInstalled 3
661#define pkgSTATE_HalfInstalled 4
662#define pkgSTATE_ConfigFiles 5
663#define pkgSTATE_Installed 6
664</example>
665</sect1>
666
667<sect1>Definitions for Package::Flags
668<p>
669<example>
670#define pkgFLAG_Auto (1 << 0)
671#define pkgFLAG_New (1 << 1)
672#define pkgFLAG_Obsolete (1 << 2)
673#define pkgFLAG_Essential (1 << 3)
674#define pkgFLAG_ImmediateConf (1 << 4)
675</example>
676</sect1>
677
678<sect1>Definitions for Version::Priority
679<p>
680Zero is used for unparsable or absent Priority fields.
681<example>
682#define pkgPRIO_Important 1
683#define pkgPRIO_Required 2
684#define pkgPRIO_Standard 3
685#define pkgPRIO_Optional 4
686#define pkgPRIO_Extra 5
687</example>
688</sect1>
689
690<sect1>Definitions for PackageFile::Flags
691<p>
692<example>
693#define pkgFLAG_NotSource (1 << 0)
694</example>
695</sect1>
696
697 <!-- }}} -->
698
699<chapt>Notes on the Generator
700<!-- Notes on the Generator {{{ -->
701<!-- ===================================================================== -->
702<p>
703The pkgCache::MergePackageFile function is currently the only generator of
704the cache file. It implements a conversion from the normal textual package
705file into the cache file.
706
707<p>
708The generator assumes any package declaration with a
709Status: line is a 'Status of the package' type of package declaration.
710A Package with a Target-Version field should also really have a status field.
711The processing of a Target-Version field can create a place-holder Version
712structure that is empty to refer to the specified version (See Version
713for info on what a empty Version looks like). The Target-Version syntax
714allows the specification of a specific version and a target distribution.
715
716<p>
717Different section names on different versions is supported, but I
718do not expect to use it. To simplify the GUI it will mearly use the section
719in the Package structure. This should be okay as I hope sections do not change
720much.
721
722<p>
723The generator goes through a number of post processing steps after producing
724a disk file. It sorts all of the version lists to be in descending order
725and then generates the reverse dependency lists for all of the packages.
726ID numbers and count values are also generated in the post processing step.
727
728<p>
729It is possible to extend many of the structures in the cache with extra data.
730This is done by using the ID member. ID will be a unique number from 0 to
731Header->??Count. For example
732<example>
733struct MyPkgData;
734MyPkgData *Data = new MyPkgData[Header->PackageCount];
735Data[Package->ID]->Item = 0;
736</example>
737This provides a one way reference between package structures and user data. To
738get a two way reference would require a member inside the MyPkgData structure.
739
740<p>
741The generators use of free space pools tend to make the package file quite
742large, and quite full of blank space. This could be fixed with sparse files.
743
744 <!-- }}} -->
745
746<chapt>Future Directions
747<!-- Future Directions {{{ -->
748<!-- ===================================================================== -->
749<p>
750Some good directions to take the cache file is into a cache directory that
751contains many associated caches that cache other important bits of
752information. (/var/cache/apt, FHS2)
753
754<p>
755Caching of the info/*.list is an excellent place to start, by generating all
756the list files into a tree structure and reverse linking them to the package
757structures in the main cache file major speed gains in dpkg might be achived.
758
759 <!-- }}} -->
760
761</book>