1 <!doctype debiandoc system>
2 <!-- -*- mode: sgml; mode: fold -*- -->
4 <title>APT Cache File Format</title>
6 <author>Jason Gunthorpe <email>jgg@debian.org</email></author>
7 <version>$Id: cache.sgml,v 1.7 1999/05/23 22:55:55 jgg Exp $</version>
10 This document describes the complete implementation and format of the APT
11 Cache file. The APT Cache file is a way for APT to parse and store a
12 large number of package files for display in the UI. It's primary design
13 goal is to make display of a single package in the tree very fast by
14 pre-linking important things like dependencies and provides.
16 The specification doubles as documentation for one of the in-memory
17 structures used by the package library and the APT GUI.
22 Copyright © Jason Gunthorpe, 1997-1998.
24 APT and this document are free software; you can redistribute them and/or
25 modify them under the terms of the GNU General Public License as published
26 by the Free Software Foundation; either version 2 of the License, or (at your
27 option) any later version.
30 For more details, on Debian GNU/Linux systems, see the file
31 /usr/doc/copyright/GPL for the full license.
38 <!-- ===================================================================== -->
42 This document describes the implementation of an architecture
43 dependent binary cache file. The goal of this cache file is two fold,
44 firstly to speed loading and processing of the package file array and
45 secondly to reduce memory consumption of the package file array.
48 The implementation is aimed at an environment with many primary package
49 files, for instance someone that has a Package file for their CD-ROM, a
50 Package file for the latest version of the distribution on the CD-ROM and a
51 package file for the development version. Always present is the information
52 contained in the status file which might be considered a separate package
56 Please understand, this is designed as a -CACHE FILE- it is not ment to be
57 used on any system other than the one it was created for. It is not ment to
58 be authoritative either, ie if a system crash or software failure occures it
59 must be perfectly acceptable for the cache file to be in an inconsistant
60 state. Furthermore at any time the cache file may be erased without losing
64 Also the structures and storage layout is optimized for use by the APT
65 GUI and may not be suitable for all purposes. However it should be possible
66 to extend it with associate cache files that contain other information.
69 To keep memory use down the cache file only contains often used fields and
70 fields that are inexepensive to store, the Package file has a full list of
71 fields. Also the client may assume that all items are perfectly valid and
72 need not perform checks against their correctness. Removal of information
73 from the cache is possible, but blanks will be left in the file, and
74 unused strings will also be present. The recommended implementation is to
75 simply rebuild the cache each time any of the data files change. It is
76 possible to add a new package file to the cache without any negative side
79 <sect1>Note on Pointer access
81 Every item in every structure is stored as the index to that structure.
82 What this means is that once the files is mmaped every data access has to
83 go through a fixup stage to get a real memory pointer. This is done
84 by taking the index, multiplying it by the type size and then adding
85 it to the start address of the memory block. This sounds complex, but
86 in C it is a single array dereference. Because all items are aligned to
87 their size and indexs are stored as multiples of the size of the structure
88 the format is immediately portable to all possible architectures - BUT the
89 generated files are -NOT-.
92 This scheme allows code like this to be written:
94 void *Map = mmap(...);
95 Package *PkgList = (Package *)Map;
96 Header *Head = (Header *)Map;
97 char *Strings = (char *)Map;
98 cout << (Strings + PkgList[Head->HashTable[0]]->Name) << endl;
101 Notice the lack of casting or multiplication. The net result is to return
102 the name of the first package in the first hash bucket, without error
106 The generator uses allocation pools to group similarly sized structures in
107 large blocks to eliminate any alignment overhead. The generator also
108 assures that no structures overlap and all indexes are unique. Although
109 at first glance it may seem like there is the potential for two structures
110 to exist at the same point the generator never allows this to happen.
111 (See the discussion of free space pools)
116 <!-- ===================================================================== -->
119 This is the first item in the file.
123 // Signature information
124 unsigned long Signature;
129 // Size of structure values
130 unsigned short HeaderSz;
131 unsigned short PackageSz;
132 unsigned short PackageFileSz;
133 unsigned short VersionSz;
134 unsigned short DependencySz;
135 unsigned short ProvidesSz;
136 unsigned short VerFileSz;
139 unsigned long PackageCount;
140 unsigned long VersionCount;
141 unsigned long DependsCount;
142 unsigned long PackageFileCount;
143 unsigned long MaxVerFileSize;
146 unsigned long FileList; // PackageFile
147 unsigned long StringList; // StringItem
152 unsigned long ItemSize;
157 // Package name lookup
158 unsigned long HashTable[512]; // Package
163 This must contain the hex value 0x98FE76DC which is designed to verify
164 that the system loading the image has the same byte order and byte size as
165 the system saving the image
168 <tag>MinorVersion<item>
169 These contain the version of the cache file, currently 0.2.
172 Dirty is true if the cache file was opened for reading, the client expects
173 to have written things to it and have not fully synced it. The file should
174 be erased and rebuilt if it is true.
182 <tag>ProvidesSz<item>
183 *Sz contains the sizeof() that particular structure. It is used as an
184 extra consistancy check on the structure of the file.
186 If any of the size values do not exactly match what the client expects then
187 the client should refuse the load the file.
192 <tag>PackageFileCount<item>
193 These indicate the number of each structure contianed in the cache.
194 PackageCount is especially usefull for generating user state structures.
195 See Package::Id for more info.
197 <tag>MaxVerFileSize<item>
198 The maximum size of a raw entry from the original Package file
199 (ie VerFile::Size) is stored here.
202 This contains the index of the first PackageFile structure. The PackageFile
203 structures are singely linked lists that represent all package files that
204 have been merged into the cache.
206 <tag>StringList<item>
207 This contains a list of all the unique strings (string item type strings) in
208 the cache. The parser reads this list into memory so it can match strings
212 The Pool structures manage the allocation pools that the generator uses.
213 Start indicates the first byte of the pool, Count is the number of objects
214 remaining in the pool and ItemSize is the structure size (alignment factor)
215 of the pool. An ItemSize of 0 indicates the pool is empty. There should be
216 the same number of pools as there are structure types. The generator
217 stores this information so future additions can make use of any unused pool
221 HashTable is a hash table that provides indexing for all of the packages.
222 Each package name is inserted into the hash table using the following has
225 unsigned long Hash(string Str)
227 unsigned long Hash = 0;
228 for (const char *I = Str.begin(); I != Str.end(); I++)
229 Hash += *I * ((Str.end() - I + 1));
230 return Hash % _count(Head.HashTable);
234 By iterating over each entry in the hash table it is possible to iterate over
235 the entire list of packages. Hash Collisions are handled with a singely linked
236 list of packages based at the hash item. The linked list contains only
237 packages that macth the hashing function.
242 <!-- ===================================================================== -->
245 This contians information for a single unique package. There can be any
246 number of versions of a given package. Package exists in a singly
247 linked list of package records starting at the hash index of the name in
248 the Header->HashTable.
253 unsigned long Name; // Stringtable
254 unsigned long VersionList; // Version
255 unsigned long TargetVer; // Version
256 unsigned long CurrentVer; // Version
257 unsigned long TargetDist; // StringTable (StringItem)
258 unsigned long Section; // StringTable (StringItem)
261 unsigned long NextPackage; // Package
262 unsigned long RevDepends; // Dependency
263 unsigned long ProvidesList; // Provides
265 // Install/Remove/Purge etc
266 unsigned char SelectedState; // What
267 unsigned char InstState; // Flags
268 unsigned char CurrentState; // State
270 // Unique ID for this pkg
280 <tag>VersionList<item>
281 Base of a singely linked list of version structures. Each structure
282 represents a unique version of the package. The version structures
283 contain links into PackageFile and the original text file as well as
284 detailed infromation about the size and dependencies of the specific
285 package. In this way multiple versions of a package can be cleanly handled
286 by the system. Furthermore, this linked list is guarenteed to be sorted
287 from Highest version to lowest version with no duplicate entries.
290 <tag>CurrentVer<item>
291 This is an index (pointer) to the sub version that is being targeted for
292 upgrading. CurrentVer is an index to the installed version, either can be
295 <tag>TargetDist<item>
296 This indicates the target distribution. Automatic upgrades should not go
297 outside of the specified dist. If it is 0 then the global target dist should
298 be used. The string should be contained in the StringItem list.
301 This indicates the deduced section. It should be "Unknown" or the section
302 of the last parsed item.
304 <tag>NextPackage<item>
305 Next link in this hash item. This linked list is based at Header.HashTable
306 and contains only packages with the same hash value.
308 <tag>RevDepends<item>
309 Reverse Depends is a linked list of all dependencies linked to this package.
311 <tag>ProvidesList<item>
312 This is a linked list of all provides for this package name.
316 <tag>CurrentState<item>
317 These corrispond to the 3 items in the Status field found in the status
318 file. See the section on defines for the possible values.
320 SelectedState is the state that the user wishes the package to be
323 InstState is the installation state of the package. This normally
324 should be Ok, but if the installation had an accident it may be otherwise.
326 CurrentState indicates if the package is installed, partially installed or
330 ID is a value from 0 to Header->PackageCount. It is a unique value assigned
331 by the generator. This allows clients to create an array of size PackageCount
332 and use it to store state information for the package map. For instance the
333 status file emitter uses this to track which packages have been emitted
337 Flags are some usefull indicators of the package's state.
342 <!-- PackageFile {{{ -->
343 <!-- ===================================================================== -->
346 This contians information for a single package file. Package files are
347 referenced by Version structures. This is a singly linked list based from
353 unsigned long FileName; // Stringtable
354 unsigned long Archive; // Stringtable
355 unsigned long Component; // Stringtable
356 unsigned long Version; // Stringtable
357 unsigned long Origin; // Stringtable
358 unsigned long Label; // Stringtable
359 unsigned long Architecture; // Stringtable
363 unsigned long NextFile; // PackageFile
366 time_t mtime; // Modification time
372 Refers the the physical disk file that this PacakgeFile represents.
380 <tag>NotAutomatic<item>
381 This is the release information. Please see the files document for a
382 description of what the release information means.
385 Size is provided as a simple check to ensure that the package file has not
392 Provides some flags for the PackageFile, see the section on defines.
395 Modification time for the file at time of cache generation.
401 <!-- ===================================================================== -->
404 This contians the information for a single version of a package. This is a
405 singley linked list based from Package.Versionlist.
408 The version list is always sorted from highest version to lowest version by
409 the generator. Also there may not be any duplicate entries in the list (same
415 unsigned long VerStr; // Stringtable
416 unsigned long Section; // StringTable (StringItem)
417 unsigned long Arch; // StringTable
420 unsigned long FileList; // VerFile
421 unsigned long NextVer; // Version
422 unsigned long DependsList; // Dependency
423 unsigned long ParentPkg; // Package
424 unsigned long ProvidesList; // Provides
427 unsigned long InstalledSize;
430 unsigned char Priority;
436 This is the complete version string.
439 References the all the PackageFile's that this version came out of. FileList
440 can be used to determine what distribution(s) the Version applies to. If
441 FileList is 0 then this is a blank version. The structure should also have
442 a 0 in all other fields excluding VerStr and Possibly NextVer.
445 This string indicates which section it is part of. The string should be
446 contained in the StringItem list.
449 Architecture the package was compiled for.
452 Next step in the linked list.
454 <tag>DependsList<item>
455 This is the base of the dependency list.
458 This links the version to the owning package, allowing reverse dependencies
459 to determine the package.
461 <tag>ProvidesList<item>
462 Head of the linked list of Provides::NextPkgProv, forward provides.
465 <tag>InstalledSize<item>
466 The archive size for this version. For debian this is the size of the .deb
467 file. Installed size is the uncompressed size for this version
470 This is a characteristic value representing this package. No two packages
471 in existance should have the same VerStr and Hash with different contents.
477 This is the parsed priority value of the package.
481 <!-- Dependency {{{ -->
482 <!-- ===================================================================== -->
485 Dependency contains the information for a single dependency record. The records
486 are split up like this to ease processing by the client. The base of list
487 linked list is Version.DependsList. All forms of dependencies are recorded
488 here including Conflicts, Suggests and Recommends.
491 Multiple depends on the same package must be grouped together in
492 the Dependency lists. Clients should assume this is always true.
497 unsigned long Version; // Stringtable
498 unsigned long Package; // Package
499 unsigned long NextDepends; // Dependency
500 unsigned long NextRevDepends; // Reverse dependency linking
501 unsigned long ParentVer; // Upwards parent version link
503 // Specific types of depends
505 unsigned char CompareOp;
511 The string form of the version that the dependency is applied against.
514 The index of the package file this depends applies to. If the package file
515 does not already exist when the dependency is inserted a blank one (no
516 version records) should be created.
518 <tag>NextDepends<item>
519 Linked list based off a Version structure of all the dependencies in that
522 <tag>NextRevDepends<item>
523 Reverse dependency linking, based off a Package structure. This linked list
524 is a list of all packages that have a depends line for a given package.
527 Parent version linking, allows the reverse dependency list to link
528 back to the version and package that the dependency are for.
531 Describes weather it is depends, predepends, recommends, suggests, etc.
534 Describes the comparison operator specified on the depends line. If the high
535 bit is set then it is a logical or with the previous record.
543 <!-- Provides {{{ -->
544 <!-- ===================================================================== -->
547 Provides handles virtual packages. When a Provides: line is encountered
548 a new provides record is added associating the package with a virtual
549 package name. The provides structures are linked off the package structures.
550 This simplifies the analysis of dependencies and other aspects A provides
551 refers to a specific version of a specific package, not all versions need to
552 provide that provides.
555 There is a linked list of provided package names started from each
556 version that provides packages. This is the forwards provides mechanism.
560 unsigned long ParentPkg; // Package
561 unsigned long Version; // Version
562 unsigned long ProvideVersion; // Stringtable
563 unsigned long NextProvides; // Provides
564 unsigned long NextPkgProv; // Provides
569 The index of the package that head of this linked list is in. ParentPkg->Name
570 is the name of the provides.
573 The index of the version this provide line applies to.
575 <tag>ProvideVersion<item>
576 Each provides can specify a version in the provides line. This version allows
577 dependencies to depend on specific versions of a Provides, as well as allowing
578 Provides to override existing packages. This is experimental.
580 <tag>NextProvides<item>
581 Next link in the singly linked list of provides (based off package)
583 <tag>NextPkgProv<item>
584 Next link in the singly linked list of provides for 'Version'.
590 <!-- ===================================================================== -->
593 VerFile associates a version with a PackageFile, this allows a full
594 description of all Versions in all files (and hence all sources) under
598 struct pkgCache::VerFile
600 unsigned long File; // PackageFile
601 unsigned long NextFile; // PkgVerFile
602 unsigned long Offset;
608 The index of the package file that this version was found in.
611 The next step in the linked list.
615 These describe the exact position in the package file for the section from
620 <!-- StringItem {{{ -->
621 <!-- ===================================================================== -->
624 StringItem is used for generating single instances of strings. Some things
625 like Section Name are are usefull to have as unique tags. It is part of
626 a linked list based at Header::StringList.
630 unsigned long String; // Stringtable
631 unsigned long NextItem; // StringItem
636 The string this refers to.
639 Next link in the chain.
642 <!-- StringTable {{{ -->
643 <!-- ===================================================================== -->
646 All strings are simply inlined any place in the file that is natural for the
647 writer. The client should make no assumptions about the positioning of
648 strings. All stringtable values point to a byte offset from the start of the
649 file that a null terminated string will begin.
652 <!-- ===================================================================== -->
655 Several structures use variables to indicate things. Here is a list of all
658 <sect1>Definitions for Dependency::Type
661 #define pkgDEP_Depends 1
662 #define pkgDEP_PreDepends 2
663 #define pkgDEP_Suggests 3
664 #define pkgDEP_Recommends 4
665 #define pkgDEP_Conflicts 5
666 #define pkgDEP_Replaces 6
670 <sect1>Definitions for Dependency::CompareOp
673 #define pkgOP_OR 0x10
674 #define pkgOP_LESSEQ 0x1
675 #define pkgOP_GREATEREQ 0x2
676 #define pkgOP_LESS 0x3
677 #define pkgOP_GREATER 0x4
678 #define pkgOP_EQUALS 0x5
680 The lower 4 bits are used to indicate what operator is being specified and
681 the upper 4 bits are flags. pkgOP_OR indicates that the next package is
682 or'd with the current package.
685 <sect1>Definitions for Package::SelectedState
688 #define pkgSTATE_Unkown 0
689 #define pkgSTATE_Install 1
690 #define pkgSTATE_Hold 2
691 #define pkgSTATE_DeInstall 3
692 #define pkgSTATE_Purge 4
696 <sect1>Definitions for Package::InstState
699 #define pkgSTATE_Ok 0
700 #define pkgSTATE_ReInstReq 1
701 #define pkgSTATE_Hold 2
702 #define pkgSTATE_HoldReInstReq 3
706 <sect1>Definitions for Package::CurrentState
709 #define pkgSTATE_NotInstalled 0
710 #define pkgSTATE_UnPacked 1
711 #define pkgSTATE_HalfConfigured 2
712 #define pkgSTATE_UnInstalled 3
713 #define pkgSTATE_HalfInstalled 4
714 #define pkgSTATE_ConfigFiles 5
715 #define pkgSTATE_Installed 6
719 <sect1>Definitions for Package::Flags
722 #define pkgFLAG_Auto (1 << 0)
723 #define pkgFLAG_New (1 << 1)
724 #define pkgFLAG_Obsolete (1 << 2)
725 #define pkgFLAG_Essential (1 << 3)
726 #define pkgFLAG_ImmediateConf (1 << 4)
730 <sect1>Definitions for Version::Priority
732 Zero is used for unparsable or absent Priority fields.
734 #define pkgPRIO_Important 1
735 #define pkgPRIO_Required 2
736 #define pkgPRIO_Standard 3
737 #define pkgPRIO_Optional 4
738 #define pkgPRIO_Extra 5
742 <sect1>Definitions for PackageFile::Flags
745 #define pkgFLAG_NotSource (1 << 0)
746 #define pkgFLAG_NotAutomatic (1 << 1)
752 <chapt>Notes on the Generator
753 <!-- Notes on the Generator {{{ -->
754 <!-- ===================================================================== -->
756 The pkgCache::MergePackageFile function is currently the only generator of
757 the cache file. It implements a conversion from the normal textual package
758 file into the cache file.
761 The generator assumes any package declaration with a
762 Status: line is a 'Status of the package' type of package declaration.
763 A Package with a Target-Version field should also really have a status field.
764 The processing of a Target-Version field can create a place-holder Version
765 structure that is empty to refer to the specified version (See Version
766 for info on what a empty Version looks like). The Target-Version syntax
767 allows the specification of a specific version and a target distribution.
770 Different section names on different versions is supported, but I
771 do not expect to use it. To simplify the GUI it will mearly use the section
772 in the Package structure. This should be okay as I hope sections do not change
776 The generator goes through a number of post processing steps after producing
777 a disk file. It sorts all of the version lists to be in descending order
778 and then generates the reverse dependency lists for all of the packages.
779 ID numbers and count values are also generated in the post processing step.
782 It is possible to extend many of the structures in the cache with extra data.
783 This is done by using the ID member. ID will be a unique number from 0 to
784 Header->??Count. For example
787 MyPkgData *Data = new MyPkgData[Header->PackageCount];
788 Data[Package->ID]->Item = 0;
790 This provides a one way reference between package structures and user data. To
791 get a two way reference would require a member inside the MyPkgData structure.
794 The generators use of free space pools tend to make the package file quite
795 large, and quite full of blank space. This could be fixed with sparse files.
799 <chapt>Future Directions
800 <!-- Future Directions {{{ -->
801 <!-- ===================================================================== -->
803 Some good directions to take the cache file is into a cache directory that
804 contains many associated caches that cache other important bits of
805 information. (/var/cache/apt, FHS2)
808 Caching of the info/*.list is an excellent place to start, by generating all
809 the list files into a tree structure and reverse linking them to the package
810 structures in the main cache file major speed gains in dpkg might be achived.