]> git.saurik.com Git - apt-legacy.git/blame - doc/dpkg-tech.sgml
Apparently apt-ftparchive uses Berkeley DB.
[apt-legacy.git] / doc / dpkg-tech.sgml
CommitLineData
da6ee469
JF
1<!doctype debiandoc PUBLIC "-//DebianDoc//DTD DebianDoc//EN">
2<book>
3<title>dpkg technical manual</title>
4
5<author>Tom Lees <email>tom@lpsg.demon.co.uk</email></author>
6<version>$Id: dpkg-tech.sgml,v 1.3 2003/02/12 15:05:45 doogie Exp $</version>
7
8<abstract>
9This document describes the minimum necessary workings for the APT dselect
10replacement. It gives an overall specification of what its external interface
11must look like for compatibility, and also gives details of some internal
12quirks.
13</abstract>
14
15<copyright>
16Copyright &copy; Tom Lees, 1997.
17<p>
18APT and this document are free software; you can redistribute them and/or
19modify them under the terms of the GNU General Public License as published
20by the Free Software Foundation; either version 2 of the License, or (at your
21option) any later version.
22
23<p>
24For more details, on Debian GNU/Linux systems, see the file
25/usr/share/common-licenses/GPL for the full license.
26</copyright>
27
28<toc sect>
29
30<chapt>Quick summary of dpkg's external interface
31<sect id="control">Control files
32
33<p>
34The basic dpkg package control file supports the following major features:-
35
36<list>
37<item>5 types of dependencies:-
38 <list>
39 <item>Pre-Depends, which must be satisfied before a package may be
40 unpacked
41 <item>Depends, which must be satisfied before a package may be
42 configured
43 <item>Recommends, to specify a package which if not installed may
44 severely limit the usefulness of the package
45 <item>Suggests, to specify a package which may increase the
46 productivity of the package
47 <item>Conflicts, to specify a package which must NOT be installed
48 in order for the package to be configured
49 </list>
50Each of these dependencies can specify a version and a depedency on that
51version, for example "<= 0.5-1", "== 2.7.2-1", etc. The comparators available
52are:-
53 <list>
54 <item>"&lt;&lt;" - less than
55 <item>"&lt;=" - less than or equal to
56 <item>"&gt;&gt;" - greater than
57 <item>"&gt;=" - greater than or equal to
58 <item>"==" - equal to
59 </list>
60<item>The concept of "virtual packages", which many other packages may provide,
61using the Provides mechanism. An example of this is the "httpd" virtual package,
62which all web servers should provide. Virtual package names may be used in
63dependency headers. However, current policy is that virtual packages do not
64support version numbers, so dependencies on virtual packages with versions
65will always fail.
66<item>Several other control fields, such as Package, Version, Description,
67Section, Priority, etc., which are mainly for classification purposes. The
68package name must consist entirely of lowercase characters, plus the characters
69'+', '-', and '.'. Fields can extend across multiple lines - on the second
70and subsequent lines, there is a space at the beginning instead of a field
71name and a ':'. Empty lines must consist of the text " .", which will be
72ignored, as will the initial space for other continuation lines. This feature
73is usually only used in the Description field.
74</list>
75
76<sect>The dpkg status area
77
78<p>
79The "dpkg status area" is the term used to refer to the directory where dpkg
80keeps its various status files (GNU would have you call it the dpkg shared
81state directory). This is always, on Debian systems, /var/lib/dpkg. However,
82the default directory name should not be hard-coded, but #define'd, so that
83alteration is possible (it is available via configure in dpkg 1.4.0.9 and
84above). Of course, in a library, code should be allowed to override the
85default directory, but the default should be part of the library (so that
86the user may change the dpkg admin dir simply by replacing the library).
87
88<p>
89Dpkg keeps a variety of files in its status area. These are discussed later
90on in this document, but a quick summary of the files is here:-
91
92<list>
93<item>available - this file contains a concatenation of control information
94from all the packages which dpkg knows about. This is updated using the dpkg
95commands "--update-avail &lt;file&gt;", "--merge-avail &lt;file&gt;", and
96"--clear-avail".
97<item>status - this file contains information on the following things for
98every package:-
99 <list>
100 <item>Whether it is installed, not installed, unpacked, removed,
101 failed configuration, or half-installed (deconfigured in
102 favour of another package).
103 <item>Whether it is selected as install, hold, remove, or purge.
104 <item>If it is "ok" (no installation problems), or "not-ok".
105 <item>It usually also contains the section and priority (so that
106 dselect may classify packages not in available)
107 <item>For packages which did not initially appear in the "available"
108 file when they were installed, the other control information
109 for them.
110 </list>
111 <p>
112 The exact format for the "Status:" field is:
113 <example>
114 Status: Want Flag Status
115 </example>
116 Where <var>Want</> may be one of <em>unknown</>, <em>install</>,
117 <em>hold</>, <em>deinstall</>, <em>purge</>. <var>Flag</>
118 may be one of <em>ok</>, <em>reinstreq</>, <em>hold</>,
119 <em>hold-reinstreq</>.
120 <var>Status</> may be one of <em>not-installed</>, <em>unpacked</>,
121 <em>half-configured</>, <em>installed</>, <em>half-installed</>
122 <em>config-files</>, <em>post-inst-failed</>, <em>removal-failed</>.
123 The states are as follows:-
124 <taglist>
125 <tag>not-installed
126 <item>No files are installed from the package, it has no config files
127 left, it uninstalled cleanly if it ever was installed.
128 <tag>unpacked
129 <item>The basic files have been unpacked (and are listed in
130 /var/lib/dpkg/info/[package].list. There are config files present,
131 but the postinst script has _NOT_ been run.
132 <tag>half-configured
133 <item>The package was installed and unpacked, but the postinst script
134 failed in some way.
135 <tag>installed
136 <item>All files for the package are installed, and the configuration
137 was also successful.
138 <tag>half-installed
139 <item>An attempt was made to remove the packagem but there was a failure
140 in the prerm script.
141 <tag>config-files
142 <item>The package was "removed", not "purged". The config files are left,
143 but nothing else.
144 <tag>post-inst-failed
145 <item>Old name for half-configured. Do not use.
146 <tag>removal-failed
147 <item>Old name for half-installed. Do not use.
148 </taglist>
149 The two last items are only left in dpkg for compatibility - they are
150 understood by it, but never written out in this form.
151
152 <p>
153 Please see the dpkg source code, <tt>lib/parshelp.c</tt>,
154 <em>statusinfos</>, <em>eflaginfos</> and <em>wantinfos</> for more
155 details.
156
157<item>info - this directory contains files from the control archive of every
158package currently installed. They are installed with a prefix of "&lt;packagename&gt;.".
159In addition to this, it also contains a file called &lt;package&gt;.list for every
160package, which contains a list of files. Note also that the control file is
161not copied into here; it is instead found as part of status or available.
162<item>methods - this directory is reserved for "method"-specific files - each
163"method" has a subdirectory underneath this directory (or at least, it can
164have). In addition, there is another subdirectory "mnt", where misc.
165filesystems (floppies, CDROMs, etc.) are mounted.
166<item>alternatives - directory used by the "update-alternatives" program. It
167contains one file for each "alternatives" interface, which contains information
168about all the needed symlinked files for each alternative.
169<item>diversions - file used by the "dpkg-divert" program. Each diversion takes
170three lines. The first is the package name (or ":" for user diversion), the
171second the original filename, and the third the diverted filename.
172<item>updates - directory used internally by dpkg. This is discussed later,
173in the section <ref id="updates">.
174<item>parts - temporary directory used by dpkg-split
175</list>
176
177<sect>The dpkg library files
178
179<p>
180These files are installed under /usr/lib/dpkg (usually), but
181/usr/local/lib/dpkg is also a possibility (as Debian policy dictates). Under
182this directory, there is a "methods" subdirectory. The methods subdirectory
183in turn contains any number of subdirectories for each general method
184processor (note that one set of method scripts can, and is, used for more than
185one of the methods listed under dselect).
186
187<p>
188The following files may be found in each of these subdirectories:-
189
190<list>
191<item>names - One line per method, two-digit priority to appear on menu
192at beginning, followed by a space, the name, and then another space and the
193short description.
194<item>desc.&lt;name&gt; - Contains the long description displayed by dselect
195when the cursor is put over the &lt;name&gt; method.
196<item>setup - Script or program which sets up the initial values to be used
197by this method. Called with first argument as the status area directory
198(/var/lib/dpkg), second argument as the name of the method (as in the directory
199name), and the third argument as the option (as in the names file).
200<item>install - Script/program called when the "install" option of dselect is
201run with this method. Same arguments as for setup.
202<item>update - Script/program called when the "update" option of dselect is
203run. Same arguments as for setup/install.
204</list>
205
206<sect>The "dpkg" command-line utility
207
208<sect1>"Documented" command-line interfaces
209
210<p>
211As yet unwritten. You can refer to the other manuals for now. See
212<manref name="dpkg" section="8">.
213
214<sect1>Environment variables which dpkg responds to
215
216<p>
217<list>
218<item>DPKG_NO_TSTP - if set to a non-null value, this variable causes dpkg to
219run a child shell process instead of sending itself a SIGTSTP, when the user
220selects to background the dpkg process when it asks about conffiles.
221<item>SHELL - used to determine which shell to run in the case when
222DPKG_NO_TSTP is set.
223<item>CC - used as the C compiler to call to determine the target architecture.
224The default is "gcc".
225<item>PATH - dpkg checks that it can find at least the following files in the
226path when it wants to run package installation scripts, and gives an error if
227it cannot find all of them:-
228 <list>
229 <item>ldconfig
230 <item>start-stop-daemon
231 <item>install-info
232 <item>update-rc.d
233 </list>
234</list>
235
236<sect1>Assertions
237
238<p>
239The dpkg utility itself is required for quite a number of packages, even if
240they have been installed with a tool totally separate from dpkg. The reason for
241this is that some packages, in their pre-installation scripts, check that your
242version of dpkg supports certain features. This was broken from the start, and
243it should have actually been a control file header "Dpkg-requires", or similar.
244What happens is that the configuration scripts will abort or continue according
245to the exit code of a call to dpkg, which will stop them from being wrongly
246configured.
247
248<p>
249These special command-line options, which simply return as true or false are
250all prefixed with "--assert-". Here is a list of them (without the prefix):-
251
252<list>
253<item>support-predepends - Returns success or failure according to whether
254a version of dpkg which supports predepends properly (1.1.0 or above) is
255installed, according to the database.
256<item>working-epoch - Return success or failure according to whether a version
257of dpkg which supports epochs in version properly (1.4.0.7 or above) is
258installed, according to the database.
259</list>
260
261<p>
262Both these options check the status database to see what version of the "dpkg"
263package is installed, and check it against a known working version.
264
265<sect1>--predep-package
266
267<p>
268This strange option is described as follows in the source code:
269
270<example>
271/* Print a single package which:
272 * (a) is the target of one or more relevant predependencies.
273 * (b) has itself no unsatisfied pre-dependencies.
274 * If such a package is present output is the Packages file entry,
275 * which can be massaged as appropriate.
276 * Exit status:
277 * 0 = a package printed, OK
278 * 1 = no suitable package available
279 * 2 = error
280 */
281</example>
282
283<p>
284On further inspection of the source code, it appears that what is does is
285this:-
286
287<list>
288<item>Looks at the packages in the database which are selected as "install",
289and are installed.
290<item>It then looks at the Pre-Depends information for each of these packages
291from the available file. When it find a package for which any of the
292pre-dependencies are not satisfied, it breaks from the loop through the packages.
293<item>It then looks through the unsatisfied pre-dependencies, and looks for
294packages which would satisfy this pre-dependency, stopping on the first it
295finds. If it finds none, it bombs out with an error.
296<item>It then continues this for every dependency of the initial package.
297</list>
298
299Eventually, it writes out the record of all the packages to satisfy the
300pre-dependencies. This is used by the disk method to make sure that its
301dependency ordering is correct. What happens is that all pre-depending
302packages are first installed, then it runs dpkg -iGROEB on the directory,
303which installs in the order package files are found. Since pre-dependencies
304mean that a package may not even be unpacked unless they are satisfied, it is
305necessary to do this (usually, since all the package files are unpacked in one
306phase, the configured in another, this is not needed).
307
308<chapt>dpkg-deb and .deb file internals
309
310<p>
311This chapter describes the internals to the "dpkg-deb" tool, which is used
312by "dpkg" as a back-end. dpkg-deb has its own tar extraction functions, which
313is the source of many problems, as it does not support long filenames, using
314extension blocks.
315
316<sect>The .deb archive format
317
318<p>
319The main principal of the new-format Debian archive (I won't describe the old
320format - for that have a look at deb-old.5), is that the archive really is
321an archive - as used by "ar" and friends. However, dpkg-deb uses this format
322internally, rather than calling "ar". Inside this archive, there are usually
323the folowing members:-
324
325<list>
326<item>debian-binary
327<item>control.tar.gz
328<item>data.tar.gz
329</list>
330
331<p>
332The debian-binary member consists simply of the string "2.0", indicating the
333format version. control.tar.gz contains the control files (and scripts), and
334the data.tar.gz contains the actual files to populate the filesystem with.
335Both tarfiles extract straight into the current directory. Information on the
336tar formats can be found in the GNU tar info page. Since dpkg-deb calls
337"tar -cf" to build packages, the Debian packages use the GNU extensions.
338
339<sect>The dpkg-deb command-line
340
341<p>
342dpkg-deb documents itself thoroughly with its '--help' command-line option.
343However, I am including a reference to these for completeness. dpkg-deb
344supports the following options:-
345
346<list>
347<item>--build (-b) &lt;dir&gt; - builds a .deb archive, takes a directory which
348contains all the files as an argument. Note that the directory
349&lt;dir&gt;/DEBIAN will be packed separately into the control archive.
350<item>--contents (-c) &lt;debfile&gt; - Lists the contents of ther "data.tar.gz"
351member.
352<item>--control (-e) &lt;debfile&gt; - Extracts the control archive into a
353directory called DEBIAN. Alternatively, with another argument, it will extract
354it into a different directory.
355<item>--info (-I) &lt;debfile&gt; - Prints the contents of the "control" file
356in the control archive to stdout. Alternatively, giving it other arguments will
357cause it to print the contents of those files instead.
358<item>--field (-f) &lt;debfile&gt; &lt;field&gt; ... - Prints any number of
359fields from the "control" file. Giving it extra arguments limits the fields it
360prints to only those specified. With no command-line arguments other than a
361filename, it is equivalent to -I and just the .deb filename.
362<item>--extract (-x) &lt;debfile&gt; &lt;dir&gt; - Extracts the data archive
363of a debian package under the directory &lt;dir&gt;.
364<item>--vextract (-X) &lt;debfile&gt; &lt;dir&gt; - Same as --extract, except
365it is equivalent of giving tar the '-v' option - it prints the filenames as
366it extracts them.
367<item>--fsys-tarfile &lt;debfile&gt; - This option outputs a gunzip'd version
368of data.tar.gz to stdout.
369<item>--new - sets the archive format to be used to the new Debian format
370<item>--old - sets the archive format to be used to the old Debian format
371<item>--debug - Tells dpkg-deb to produce debugging output
372<item>--nocheck - Tells dpkg-deb not to check the sanity of the control file
373<item>--help (-h) - Gives a help message
374<item>--version - Shows the version number
375<item>--licence/--license (UK/US spellings) - Shows a brief outline of the GPL
376</list>
377
378<sect1>Internal checks used by dpkg-deb when building packages
379
380<p>
381Here is a list of the internal checks used by dpkg-deb when building packages.
382It is in the order they are done.
383
384<list>
385<item>First, the output Debian archive argument, if it is given, is checked
386using stat. If it is a directory, an internal flag is set. This check is only
387made if the archive name is specified explicitly on the command-line. If the
388argument was not given, the default is the directory name, with ".deb"
389appended.
390<item>Next, the control file is checked, unless the --nocheck flag was
391specified on the command-line. dpkg-deb will bomb out if the second argument
392to --build was a directory, and --nocheck was specified. Note that dpkg-deb
393will not be able to determine the name of the package in this case. In the
394control file, the following things are checked:-
395 <list>
396 <item>The package name is checked to see if it contains any invalid
397 characters (see <ref id="control"> for this).
398 <item>The priority field is checked to see if it uses standard values,
399 and user-defined values are warned against. However, note that this
400 check is now redundant, since the control file no longer contains
401 the priority - the changes file now does this.
402 <item>The control file fields are then checked against the standard
403 list of fields which appear in control files, and any "user-defined"
404 fields are reported as warnings.
405 <item>dpkg-deb then checks that the control file contains a valid
406 version number.
407 </list>
408<item>After this, in the case where a directory was specified to build the
409.deb file in, the filename is created as "directory/pkg_ver.deb" or
410"directory/pkg_ver_arch.deb", depending on whether the control file contains
411an architecture field.
412<item>Next, dpkg-deb checks for the &lt;dir&gt;/DEBIAN directory. It complains
413if it doesn't exist, or if it has permissions &lt; 0755, or &gt; 0775.
414<item>It then checks that all the files in this subdir are either symlinks
415or plain files, and have permissions between 0555 and 0775.
416<item>The conffiles file is then checked to see if the filenames are too
417long. Warnings are produced for each that is. After this, it checks that
418the package provides initial copies of each of these conffiles, and that
419they are all plain files.
420</list>
421
422<chapt>dpkg internals
423
424<p>
425This chapter describes the internals of dpkg itself. Although the low-level
426formats are quite simple, what dpkg does in certain cases often does not
427make sense.
428
429<sect id="updates">Updates
430
431<p>
432This describes the /var/lib/dpkg/updates directory. The function of this
433directory is somewhat strange, and seems only to be used internally. A function
434called cleanupdates is called whenever the database is scanned. This function
435in turn uses <manref name="scandir" section="3">, to sort the files in this
436directory. Files who names do not consist entirely of digits are discarded.
437dpkg also causes a fatal error if any of the filenames are different lengths.
438
439<p>
440After having scanned the directory, dpkg in turn parses each file the same way
441it parses the status file (they are sorted by the scandir to be in numerical
442order). After having done this, it then writes the status information back
443to the "status" file, and removes all the "updates" files.
444
445<p>
446These files are created internally by dpkg's "checkpoint" function, and are
447cleaned up when dpkg exits cleanly.
448
449<p>
450Juding by the use of the updates directory I would call it a Journal. Inorder
451to effeciently ensure the complete integrity of the status file dpkg will
452"checkpoint" or journal all of it's activities in the updates directory. By
453merging the contents of the updates directory (in order!!) against the
454original status file it can get the precise current state of the system,
455even in the event of a system failure while dpkg is running.
456
457<p>
458The other option would be to sync-rewrite the status file after each
459operation, which would kill performance.
460
461<p>
462It is very important that any program that uses the status file abort if
463the updates directory is not empty! The user should be informed to run dpkg
464manually (what options though??) to correct the situation.
465
466<sect>What happens when dpkg reads the database
467
468<p>
469First, the status file is read. This gives dpkg an initial idea of the packages
470that are there. Next, the updates files are read in, overriding the status
471file, and if necessary, the status file is re-written, and updates files are
472removed. Finally, the available file is read. The available file is read
473with flags which preclude dpkg from updating any status information from it,
474though - installed version, etc., and is also told to record that the packages
475it reads this time are available, not installed.
476
477<p>
478More information on updates is given above.
479
480<sect>How dpkg compares version numbers
481
482<p>
483Version numbers consist of three parts: the epoch, the upstream version, and
484the Debian revision. Dpkg compares these parts in that order. If the epochs
485are different, it returns immediately, and so on.
486
487<p>
488However, the important part is how it compares the versions which are
489essentially stored as just strings. These are compared in two distinct parts:
490those consisting of numerical characters (which are evaluated, and then
491compared), and those consisting of other characters. When comparing
492non-numerical parts, they are compared as the character values (ASCII), but
493non-alphabetical characters are considered "greater than" alphabetical ones.
494Also note that longer strings (after excluding differences where numerical
495values are equal) are considered "greater than" shorter ones.
496
497<p>
498Here are a few examples of how these rules apply:-
499
500<example>
50115 > 10
5020010 == 10
503
504d.r > dsr
50532.d.r == 0032.d.r
506d.rnr < d.rnrn
507</example>
508
509</book>