]> git.saurik.com Git - apt.git/blame_incremental - doc/cache.sgml
Working cache generator
[apt.git] / doc / cache.sgml
... / ...
CommitLineData
1<!doctype debiandoc system>
2<!-- -*- mode: sgml; mode: fold -*- -->
3<book>
4<title>APT Cache File Format</title>
5
6<author>Jason Gunthorpe <email>jgg@debian.org</email></author>
7<version>$Id: cache.sgml,v 1.2 1998/07/05 05:43:09 jgg Exp $</version>
8
9<abstract>
10This document describes the complete implementation and format of the APT
11Cache file. The APT Cache file is a way for APT to parse and store a
12large number of package files for display in the UI. It's primary design
13goal is to make display of a single package in the tree very fast by
14pre-linking important things like dependencies and provides.
15
16The specification doubles as documentation for one of the in-memory
17structures used by the package library and the APT GUI.
18
19</abstract>
20
21<copyright>
22Copyright &copy; Jason Gunthorpe, 1997-1998.
23<p>
24APT and this document are free software; you can redistribute them and/or
25modify them under the terms of the GNU General Public License as published
26by the Free Software Foundation; either version 2 of the License, or (at your
27option) any later version.
28
29<p>
30For more details, on Debian GNU/Linux systems, see the file
31/usr/doc/copyright/GPL for the full license.
32</copyright>
33
34<toc sect>
35
36<chapt>Introduction
37<!-- Purpose {{{ -->
38<!-- ===================================================================== -->
39<sect>Purpose
40
41<p>
42This document describes the implementation of an architecture
43dependent binary cache file. The goal of this cache file is two fold,
44firstly to speed loading and processing of the package file array and
45secondly to reduce memory consumption of the package file array.
46
47<p>
48The implementation is aimed at an environment with many primary package
49files, for instance someone that has a Package file for their CD-ROM, a
50Package file for the latest version of the distribution on the CD-ROM and a
51package file for the development version. Always present is the information
52contained in the status file which might be considered a separate package
53file.
54
55<p>
56Please understand, this is designed as a -CACHE FILE- it is not ment to be
57used on any system other than the one it was created for. It is not ment to
58be authoritative either, ie if a system crash or software failure occures it
59must be perfectly acceptable for the cache file to be in an inconsistant
60state. Furthermore at any time the cache file may be erased without losing
61any information.
62
63<p>
64Also the structures and storage layout is optimized for use by the APT
65GUI and may not be suitable for all purposes. However it should be possible
66to extend it with associate cache files that contain other information.
67
68<p>
69To keep memory use down the cache file only contains often used fields and
70fields that are inexepensive to store, the Package file has a full list of
71fields. Also the client may assume that all items are perfectly valid and
72need not perform checks against their correctness. Removal of information
73from the cache is possible, but blanks will be left in the file, and
74unused strings will also be present. The recommended implementation is to
75simply rebuild the cache each time any of the data files change. It is
76possible to add a new package file to the cache without any negative side
77effects.
78
79<sect1>Note on Pointer access
80<p>
81Every item in every structure is stored as the index to that structure.
82What this means is that once the files is mmaped every data access has to
83go through a fixup stage to get a real memory pointer. This is done
84by taking the index, multiplying it by the type size and then adding
85it to the start address of the memory block. This sounds complex, but
86in C it is a single array dereference. Because all items are aligned to
87their size and indexs are stored as multiples of the size of the structure
88the format is immediately portable to all possible architectures - BUT the
89generated files are -NOT-.
90
91<p>
92This scheme allows code like this to be written:
93<example>
94 void *Map = mmap(...);
95 Package *PkgList = (Package *)Map;
96 Header *Head = (Header *)Map;
97 char *Strings = (char *)Map;
98 cout << (Strings + PkgList[Head->HashTable[0]]->Name) << endl;
99</example>
100<p>
101Notice the lack of casting or multiplication. The net result is to return
102the name of the first package in the first hash bucket, without error
103checks.
104
105<p>
106The generator uses allocation pools to group similarly sized structures in
107large blocks to eliminate any alignment overhead. The generator also
108assures that no structures overlap and all indexes are unique. Although
109at first glance it may seem like there is the potential for two structures
110to exist at the same point the generator never allows this to happen.
111(See the discussion of free space pools)
112 <!-- }}} -->
113
114<chapt>Structures
115<!-- Header {{{ -->
116<!-- ===================================================================== -->
117<sect>Header
118<p>
119This is the first item in the file.
120<example>
121 struct Header
122 {
123 // Signature information
124 unsigned long Signature;
125 short MajorVersion;
126 short MinorVersion;
127 bool Dirty;
128
129 // Size of structure values
130 unsigned short HeaderSz;
131 unsigned short PackageSz;
132 unsigned short PackageFileSz;
133 unsigned short VersionSz;
134 unsigned short DependencySz;
135 unsigned short ProvidesSz;
136 unsigned short VerFileSz;
137
138 // Structure counts
139 unsigned long PackageCount;
140 unsigned long VersionCount;
141 unsigned long DependsCount;
142 unsigned long PackageFileCount;
143
144 // Offsets
145 unsigned long FileList; // PackageFile
146 unsigned long StringList; // StringItem
147
148 // Allocation pools
149 struct
150 {
151 unsigned long ItemSize;
152 unsigned long Start;
153 unsigned long Count;
154 } Pools[7];
155
156 // Package name lookup
157 unsigned long HashTable[512]; // Package
158 };
159</example>
160<taglist>
161<tag>Signature<item>
162This must contain the hex value 0x98FE76DC which is designed to verify
163that the system loading the image has the same byte order and byte size as
164the system saving the image
165
166<tag>MajorVersion
167<tag>MinorVersion<item>
168These contain the version of the cache file, currently 0.2.
169
170<tag>Dirty<item>
171Dirty is true if the cache file was opened for reading, the client expects
172to have written things to it and have not fully synced it. The file should
173be erased and rebuilt if it is true.
174
175<tag>HeaderSz
176<tag>PackageSz
177<tag>PackageFileSz
178<tag>VersionSz
179<tag>DependencySz
180<tag>VerFileSz
181<tag>ProvidesSz<item>
182*Sz contains the sizeof() that particular structure. It is used as an
183extra consistancy check on the structure of the file.
184
185If any of the size values do not exactly match what the client expects then
186the client should refuse the load the file.
187
188<tag>PackageCount
189<tag>VersionCount
190<tag>DependsCount
191<tag>PackageFileCount<item>
192These indicate the number of each structure contianed in the cache.
193PackageCount is especially usefull for generating user state structures.
194See Package::Id for more info.
195
196<tag>FileList<item>
197This contains the index of the first PackageFile structure. The PackageFile
198structures are singely linked lists that represent all package files that
199have been merged into the cache.
200
201<tag>StringList<item>
202This contains a list of all the unique strings (string item type strings) in
203the cache. The parser reads this list into memory so it can match strings
204against it.
205
206<tag>Pools<item>
207The Pool structures manage the allocation pools that the generator uses.
208Start indicates the first byte of the pool, Count is the number of objects
209remaining in the pool and ItemSize is the structure size (alignment factor)
210of the pool. An ItemSize of 0 indicates the pool is empty. There should be
211the same number of pools as there are structure types. The generator
212stores this information so future additions can make use of any unused pool
213blocks.
214
215<tag>HashTable<item>
216HashTable is a hash table that provides indexing for all of the packages.
217Each package name is inserted into the hash table using the following has
218function:
219<example>
220 unsigned long Hash(string Str)
221 {
222 unsigned long Hash = 0;
223 for (const char *I = Str.begin(); I != Str.end(); I++)
224 Hash += *I * ((Str.end() - I + 1));
225 return Hash % _count(Head.HashTable);
226 }
227</example>
228<p>
229By iterating over each entry in the hash table it is possible to iterate over
230the entire list of packages. Hash Collisions are handled with a singely linked
231list of packages based at the hash item. The linked list contains only
232packages that macth the hashing function.
233
234</taglist>
235 <!-- }}} -->
236<!-- Package {{{ -->
237<!-- ===================================================================== -->
238<sect>Package
239<p>
240This contians information for a single unique package. There can be any
241number of versions of a given package. Package exists in a singly
242linked list of package records starting at the hash index of the name in
243the Header->HashTable.
244<example>
245 struct Pacakge
246 {
247 // Pointers
248 unsigned long Name; // Stringtable
249 unsigned long VersionList; // Version
250 unsigned long TargetVer; // Version
251 unsigned long CurrentVer; // Version
252 unsigned long TargetDist; // StringTable (StringItem)
253 unsigned long Section; // StringTable (StringItem)
254
255 // Linked lists
256 unsigned long NextPackage; // Package
257 unsigned long RevDepends; // Dependency
258 unsigned long ProvidesList; // Provides
259
260 // Install/Remove/Purge etc
261 unsigned char SelectedState; // What
262 unsigned char InstState; // Flags
263 unsigned char CurrentState; // State
264
265 // Unique ID for this pkg
266 unsigned short ID;
267 unsigned long Flags;
268 };
269</example>
270
271<taglist>
272<tag>Name<item>
273Name of the package.
274
275<tag>VersionList<item>
276Base of a singely linked list of version structures. Each structure
277represents a unique version of the package. The version structures
278contain links into PackageFile and the original text file as well as
279detailed infromation about the size and dependencies of the specific
280package. In this way multiple versions of a package can be cleanly handled
281by the system. Furthermore, this linked list is guarenteed to be sorted
282from Highest version to lowest version with no duplicate entries.
283
284<tag>TargetVer
285<tag>CurrentVer<item>
286This is an index (pointer) to the sub version that is being targeted for
287upgrading. CurrentVer is an index to the installed version, either can be
2880.
289
290<tag>TargetDist<item>
291This indicates the target distribution. Automatic upgrades should not go
292outside of the specified dist. If it is 0 then the global target dist should
293be used. The string should be contained in the StringItem list.
294
295<tag>Section<item>
296This indicates the deduced section. It should be "Unknown" or the section
297of the last parsed item.
298
299<tag>NextPackage<item>
300Next link in this hash item. This linked list is based at Header.HashTable
301and contains only packages with the same hash value.
302
303<tag>RevDepends<item>
304Reverse Depends is a linked list of all dependencies linked to this package.
305
306<tag>ProvidesList<item>
307This is a linked list of all provides for this package name.
308
309<tag>SelectedState
310<tag>InstState
311<tag>CurrentState<item>
312These corrispond to the 3 items in the Status field found in the status
313file. See the section on defines for the possible values.
314<p>
315SelectedState is the state that the user wishes the package to be
316in.
317<p>
318InstState is the installation state of the package. This normally
319should be Ok, but if the installation had an accident it may be otherwise.
320<p>
321CurrentState indicates if the package is installed, partially installed or
322not installed.
323
324<tag>ID<item>
325ID is a value from 0 to Header->PackageCount. It is a unique value assigned
326by the generator. This allows clients to create an array of size PackageCount
327and use it to store state information for the package map. For instance the
328status file emitter uses this to track which packages have been emitted
329already.
330
331<tag>Flags<item>
332Flags are some usefull indicators of the package's state.
333
334</taglist>
335
336 <!-- }}} -->
337<!-- PackageFile {{{ -->
338<!-- ===================================================================== -->
339<sect>PackageFile
340<p>
341This contians information for a single package file. Package files are
342referenced by Version structures. This is a singly linked list based from
343Header.FileList
344<example>
345 struct PackageFile
346 {
347 // Names
348 unsigned long FileName; // Stringtable
349 unsigned long Version; // Stringtable
350 unsigned long Distribution; // Stringtable
351 unsigned long Size;
352
353 // Linked list
354 unsigned long NextFile; // PackageFile
355 unsigned short ID;
356 unsigned long Flags;
357 time_t mtime; // Modification time
358 };
359</example>
360<taglist>
361
362<tag>FileName<item>
363Refers the the physical disk file that this PacakgeFile represents.
364
365<tag>Version<item>
366Version is the given version, ie 1.3.1, 2.4_revision_1 etc.
367
368<tag>Distribution<item>
369Distribution is the symbolic name for this PackageFile, hamm,bo,rexx etc
370
371<tag>Size<item>
372Size is provided as a simple check to ensure that the package file has not
373been altered.
374
375<tag>ID<item>
376See Package::ID.
377
378<tag>Flags<item>
379Provides some flags for the PackageFile, see the section on defines.
380
381<tag>mtime<item>
382Modification time for the file at time of cache generation.
383
384</taglist>
385
386 <!-- }}} -->
387<!-- Version {{{ -->
388<!-- ===================================================================== -->
389<sect>Version
390<p>
391This contians the information for a single version of a package. This is a
392singley linked list based from Package.Versionlist.
393
394<p>
395The version list is always sorted from highest version to lowest version by
396the generator. Also there may not be any duplicate entries in the list (same
397VerStr).
398
399<example>
400 struct Version
401 {
402 unsigned long VerStr; // Stringtable
403 unsigned long Section; // StringTable (StringItem)
404
405 // Lists
406 unsigned long FileList; // VerFile
407 unsigned long NextVer; // Version
408 unsigned long DependsList; // Dependency
409 unsigned long ParentPkg; // Package
410 unsigned long ProvidesList; // Provides
411
412 unsigned long Size;
413 unsigned long InstalledSize;
414 unsigned short ID;
415 unsigned char Priority;
416 };
417</example>
418<taglist>
419
420<tag>VerStr<item>
421This is the complete version string.
422
423<tag>FileList<item>
424References the all the PackageFile's that this version came out of. FileList
425can be used to determine what distribution(s) the Version applies to. If
426FileList is 0 then this is a blank version. The structure should also have
427a 0 in all other fields excluding VerStr and Possibly NextVer.
428
429<tag>Section<item>
430This string indicates which section it is part of. The string should be
431contained in the StringItem list.
432
433<tag>NextVer<item>
434Next step in the linked list.
435
436<tag>DependsList<item>
437This is the base of the dependency list.
438
439<tag>ParentPkg<item>
440This links the version to the owning package, allowing reverse dependencies
441to determine the package.
442
443<tag>ProvidesList<item>
444Head of the linked list of Provides::NextPkgProv, forward provides.
445
446<tag>Size
447<tag>InstalledSize<item>
448The archive size for this version. For debian this is the size of the .deb
449file. Installed size is the uncompressed size for this version
450
451<tag>ID<item>
452See Package::ID.
453
454<tag>Priority<item>
455This is the parsed priority value of the package.
456</taglist>
457
458 <!-- }}} -->
459<!-- Dependency {{{ -->
460<!-- ===================================================================== -->
461<sect>Dependency
462<p>
463Dependency contains the information for a single dependency record. The records
464are split up like this to ease processing by the client. The base of list
465linked list is Version.DependsList. All forms of dependencies are recorded
466here including Conflicts, Suggests and Recommends.
467
468<p>
469Multiple depends on the same package must be grouped together in
470the Dependency lists. Clients should assume this is always true.
471
472<example>
473 struct Dependency
474 {
475 unsigned long Version; // Stringtable
476 unsigned long Package; // Package
477 unsigned long NextDepends; // Dependency
478 unsigned long NextRevDepends; // Reverse dependency linking
479 unsigned long ParentVer; // Upwards parent version link
480
481 // Specific types of depends
482 unsigned char Type;
483 unsigned char CompareOp;
484 unsigned short ID;
485 };
486</example>
487<taglist>
488<tag>Version<item>
489The string form of the version that the dependency is applied against.
490
491<tag>Package<item>
492The index of the package file this depends applies to. If the package file
493does not already exist when the dependency is inserted a blank one (no
494version records) should be created.
495
496<tag>NextDepends<item>
497Linked list based off a Version structure of all the dependencies in that
498version.
499
500<tag>NextRevDepends<item>
501Reverse dependency linking, based off a Package structure. This linked list
502is a list of all packages that have a depends line for a given package.
503
504<tag>ParentVer<item>
505Parent version linking, allows the reverse dependency list to link
506back to the version and package that the dependency are for.
507
508<tag>Type<item>
509Describes weather it is depends, predepends, recommends, suggests, etc.
510
511<tag>CompareOp<item>
512Describes the comparison operator specified on the depends line. If the high
513bit is set then it is a logical or with the previous record.
514
515<tag>ID<item>
516See Package::ID.
517
518</taglist>
519
520 <!-- }}} -->
521<!-- Provides {{{ -->
522<!-- ===================================================================== -->
523<sect>Provides
524<p>
525Provides handles virtual packages. When a Provides: line is encountered
526a new provides record is added associating the package with a virtual
527package name. The provides structures are linked off the package structures.
528This simplifies the analysis of dependencies and other aspects A provides
529refers to a specific version of a specific package, not all versions need to
530provide that provides.
531
532<p>
533There is a linked list of provided package names started from each
534version that provides packages. This is the forwards provides mechanism.
535<example>
536 struct Provides
537 {
538 unsigned long ParentPkg; // Package
539 unsigned long Version; // Version
540 unsigned long ProvideVersion; // Stringtable
541 unsigned long NextProvides; // Provides
542 unsigned long NextPkgProv; // Provides
543 };
544</example>
545<taglist>
546<tag>ParentPkg<item>
547The index of the package that head of this linked list is in. ParentPkg->Name
548is the name of the provides.
549
550<tag>Version<item>
551The index of the version this provide line applies to.
552
553<tag>ProvideVersion<item>
554Each provides can specify a version in the provides line. This version allows
555dependencies to depend on specific versions of a Provides, as well as allowing
556Provides to override existing packages. This is experimental.
557
558<tag>NextProvides<item>
559Next link in the singly linked list of provides (based off package)
560
561<tag>NextPkgProv<item>
562Next link in the singly linked list of provides for 'Version'.
563
564</taglist>
565
566 <!-- }}} -->
567<!-- VerFile {{{ -->
568<!-- ===================================================================== -->
569<sect>VerFile
570<p>
571VerFile associates a version with a PackageFile, this allows a full
572description of all Versions in all files (and hence all sources) under
573consideration.
574
575<example>
576 struct pkgCache::VerFile
577 {
578 unsigned long File; // PackageFile
579 unsigned long NextFile; // PkgVerFile
580 unsigned long Offset;
581 unsigned short Size;
582 }
583</example>
584<taglist>
585<tag>File<item>
586The index of the package file that this version was found in.
587
588<tag>NextFile<item>
589The next step in the linked list.
590
591<tag>Offset
592<tag>Size<item>
593These describe the exact position in the package file for the section from
594this version.
595</taglist>
596
597 <!-- }}} -->
598<!-- StringItem {{{ -->
599<!-- ===================================================================== -->
600<sect>StringItem
601<p>
602StringItem is used for generating single instances of strings. Some things
603like Section Name are are usefull to have as unique tags. It is part of
604a linked list based at Header::StringList.
605<example>
606 struct StringItem
607 {
608 unsigned long String; // Stringtable
609 unsigned long NextItem; // StringItem
610 };
611</example>
612<taglist>
613<tag>String<item>
614The string this refers to.
615
616<tag>NextItem<item>
617Next link in the chain.
618</taglist>
619 <!-- }}} -->
620<!-- StringTable {{{ -->
621<!-- ===================================================================== -->
622<sect>StringTable
623<p>
624All strings are simply inlined any place in the file that is natural for the
625writer. The client should make no assumptions about the positioning of
626strings. All stringtable values point to a byte offset from the start of the
627file that a null terminated string will begin.
628 <!-- }}} -->
629<!-- Defines {{{ -->
630<!-- ===================================================================== -->
631<sect>Defines
632<p>
633Several structures use variables to indicate things. Here is a list of all
634of them.
635
636<sect1>Definitions for Dependency::Type
637<p>
638<example>
639#define pkgDEP_Depends 1
640#define pkgDEP_PreDepends 2
641#define pkgDEP_Suggests 3
642#define pkgDEP_Recommends 4
643#define pkgDEP_Conflicts 5
644#define pkgDEP_Replaces 6
645</example>
646</sect1>
647
648<sect1>Definitions for Dependency::CompareOp
649<p>
650<example>
651#define pkgOP_OR 0x10
652#define pkgOP_LESSEQ 0x1
653#define pkgOP_GREATEREQ 0x2
654#define pkgOP_LESS 0x3
655#define pkgOP_GREATER 0x4
656#define pkgOP_EQUALS 0x5
657</example>
658The lower 4 bits are used to indicate what operator is being specified and
659the upper 4 bits are flags. pkgOP_OR indicates that the next package is
660or'd with the current package.
661</sect1>
662
663<sect1>Definitions for Package::SelectedState
664<p>
665<example>
666#define pkgSTATE_Unkown 0
667#define pkgSTATE_Install 1
668#define pkgSTATE_Hold 2
669#define pkgSTATE_DeInstall 3
670#define pkgSTATE_Purge 4
671</example>
672</sect1>
673
674<sect1>Definitions for Package::InstState
675<p>
676<example>
677#define pkgSTATE_Ok 0
678#define pkgSTATE_ReInstReq 1
679#define pkgSTATE_Hold 2
680#define pkgSTATE_HoldReInstReq 3
681</example>
682</sect1>
683
684<sect1>Definitions for Package::CurrentState
685<p>
686<example>
687#define pkgSTATE_NotInstalled 0
688#define pkgSTATE_UnPacked 1
689#define pkgSTATE_HalfConfigured 2
690#define pkgSTATE_UnInstalled 3
691#define pkgSTATE_HalfInstalled 4
692#define pkgSTATE_ConfigFiles 5
693#define pkgSTATE_Installed 6
694</example>
695</sect1>
696
697<sect1>Definitions for Package::Flags
698<p>
699<example>
700#define pkgFLAG_Auto (1 << 0)
701#define pkgFLAG_New (1 << 1)
702#define pkgFLAG_Obsolete (1 << 2)
703#define pkgFLAG_Essential (1 << 3)
704#define pkgFLAG_ImmediateConf (1 << 4)
705</example>
706</sect1>
707
708<sect1>Definitions for Version::Priority
709<p>
710Zero is used for unparsable or absent Priority fields.
711<example>
712#define pkgPRIO_Important 1
713#define pkgPRIO_Required 2
714#define pkgPRIO_Standard 3
715#define pkgPRIO_Optional 4
716#define pkgPRIO_Extra 5
717</example>
718</sect1>
719
720<sect1>Definitions for PackageFile::Flags
721<p>
722<example>
723#define pkgFLAG_NotSource (1 << 0)
724</example>
725</sect1>
726
727 <!-- }}} -->
728
729<chapt>Notes on the Generator
730<!-- Notes on the Generator {{{ -->
731<!-- ===================================================================== -->
732<p>
733The pkgCache::MergePackageFile function is currently the only generator of
734the cache file. It implements a conversion from the normal textual package
735file into the cache file.
736
737<p>
738The generator assumes any package declaration with a
739Status: line is a 'Status of the package' type of package declaration.
740A Package with a Target-Version field should also really have a status field.
741The processing of a Target-Version field can create a place-holder Version
742structure that is empty to refer to the specified version (See Version
743for info on what a empty Version looks like). The Target-Version syntax
744allows the specification of a specific version and a target distribution.
745
746<p>
747Different section names on different versions is supported, but I
748do not expect to use it. To simplify the GUI it will mearly use the section
749in the Package structure. This should be okay as I hope sections do not change
750much.
751
752<p>
753The generator goes through a number of post processing steps after producing
754a disk file. It sorts all of the version lists to be in descending order
755and then generates the reverse dependency lists for all of the packages.
756ID numbers and count values are also generated in the post processing step.
757
758<p>
759It is possible to extend many of the structures in the cache with extra data.
760This is done by using the ID member. ID will be a unique number from 0 to
761Header->??Count. For example
762<example>
763struct MyPkgData;
764MyPkgData *Data = new MyPkgData[Header->PackageCount];
765Data[Package->ID]->Item = 0;
766</example>
767This provides a one way reference between package structures and user data. To
768get a two way reference would require a member inside the MyPkgData structure.
769
770<p>
771The generators use of free space pools tend to make the package file quite
772large, and quite full of blank space. This could be fixed with sparse files.
773
774 <!-- }}} -->
775
776<chapt>Future Directions
777<!-- Future Directions {{{ -->
778<!-- ===================================================================== -->
779<p>
780Some good directions to take the cache file is into a cache directory that
781contains many associated caches that cache other important bits of
782information. (/var/cache/apt, FHS2)
783
784<p>
785Caching of the info/*.list is an excellent place to start, by generating all
786the list files into a tree structure and reverse linking them to the package
787structures in the main cache file major speed gains in dpkg might be achived.
788
789 <!-- }}} -->
790
791</book>