It still doesn't reflect the size the cache has on the disk compared to
what is given as total size (90 vs 103 MB), but by counting all structs
in we are at least a bit closer to the reality.
A version belongs to a section and has hence a section member of its
own. A package on the other hand can have multiple versions from
different sections. This was "solved" by using the section which was
parsed first as order of sources.list defines, but that is obviously a
horribly unpredictable thing.
We therefore directly remove this struct member to free some space and
mark the access method as deprecated, which is told to return the
section of the 'newest' known version, which is at least predictable,
but possible not what it returned before – but nobody knows.
Users are way better of with the Section() as returned by the version
they are dealing with. It is likely the same for all versions of a
package, but in the few cases it isn't, it is important (like packages
moving from main/* to contrib/* or into oldlibs …).
We had a wild mixture of (unsigned) int, long and long long here without
much sense, so this commit adds a few typedefs to get some sense in the
typesystem and ensures that a ID isn't sometimes computed as int, stored
as long and compared with a long long… as this could potentially bite us
later on as the size of the archive only increases over time.
increase hashtable size for packages/groups by factor 5
It also makes the size configureable, so it can be adapted in the future
without the need for an abi break - and even by users…
The increase was long overdue as it gives a >10% decrease in runtime of
e.g. 'apt-get check -s'. Some (useless) benchmark with 69933 groups and
187796 packages without a pre-built cache:
time apt-get check -so APT::Cache-HashTableSize=1 → 20m
time apt-get check -so APT::Cache-HashTableSize=1000 → 6,41s
time apt-get check -so APT::Cache-HashTableSize=2000 → 5,64s (old)
time apt-get check -so APT::Cache-HashTableSize=3000 → 5,30s
time apt-get check -so APT::Cache-HashTableSize=5000 → 5,08s
time apt-get check -so APT::Cache-HashTableSize=6000 → 5,05s
time apt-get check -so APT::Cache-HashTableSize=7000 → 5,02s
time apt-get check -so APT::Cache-HashTableSize=8000 → 5,00s
time apt-get check -so APT::Cache-HashTableSize=9000 → 4,98s
time apt-get check -so APT::Cache-HashTableSize=10000 → 4,96s (new)
time apt-get check -so APT::Cache-HashTableSize=15000 → 4,90s
time apt-get check -so APT::Cache-HashTableSize=20000 → 4,86s
time apt-get check -so APT::Cache-HashTableSize=30000 → 4,77s
time apt-get check -so APT::Cache-HashTableSize=40000 → 4,74s
time apt-get check -so APT::Cache-HashTableSize=50000 → 4,73s
time apt-get check -so APT::Cache-HashTableSize=60000 → 4,71s
The gap increases further for operations which have more package
lookups. Factor 5 was chosen as higher values do not provide any
really significant timing advantage anymore compared to the memory
increase in my testing and there is always the possibility to increase
it now if that changes. (also most users will not have 3 releases and
4 architectures in the cache, so theirs will be much smaller and faster).
The name suggests that it is supposed to substitute a variable with a
value, but we tend to use it in a more liberal replace_all() fashion,
but this breaks if either of the parameters is empty or more importantly
if two "variable" occurrences follow each other directly.
don't send pkg from an unknown architecture via EDSP
APT's cache can include packages from architectures dpkg has no
knowledge about and can therefore not be installed for e.g. to allow
easy lookups. There is no point in telling external solvers about them
though and some of them might even be really talkative about ignoring
them if we do.
In commit 21b3eac8 I promoted the check for installable dependencies to
a pre-install check, which also reverts to a known good candidate (the
installed version) if it fails. This revert was done even for user
requested candidate switches which disabled our Broken detection so that
install requests which are impossible to satisfy do not fail anymore,
but print an (incomplete) solution proposal and then exit successfully.
Michael Vogt [Fri, 30 May 2014 12:47:56 +0000 (14:47 +0200)]
Show unauthenticated warning for source packages as well
This will show the same unauthenticated warning for source packages
as for binary packages and will not download a source package if
it is unauthenticated. This can be overridden with
--allow-unauthenticated
EDSP code uses pipes opened via an FD as sources and later for those
files modification times and filesize are read - but never really used
again. The result we get from FileFd is probably wrong, but as we don't
use it anyway, we just don't fallback if we have nothing to fallback to
Solvers are supposed to exit successfully even if they haven't found a
solution, but a solver which fails drastically (like e.g. segfaults)
should be detected and dealt with accordingly instead of ignored.
if Resolver fails, do not continue even if not broken
This can happen if the request is already a well-formed request all by
itself (e.g. the package has no dependencies), but the resolver found
a reason to not accept it as solution. Our edsp 'dump' solver e.g.
shouldn't be able to trigger install, which it does otherwise.
As outlined in #748355 apt segfaulted if it encountered a loop between a
package pre-depending on a package conflicting with the previous as it
ended up in an endless loop trying to unpack 'the other package'.
In this specific case as an essential package is involved a lot of force
needs to be applied, but can also be caused by 'normal' tight loops and
highlights a problem in how we handle breaks which we want to avoid.
The fix comes in multiple entangled changes:
1. All Smart* calls are guarded with loop detection. Some already had it,
some had parts of it, some did it incorrect, and some didn't even try.
2. temporary removes to avoid a loop (which is done if a loop is
detected) prevent the unpack of this looping package (we tried to unpack
it to avoid the conflict/breaks, but due to a loop we couldn't, so we
remove/deconfigure it instead which means we can't unpack it now)
3. handle conflicts and breaks very similar instead of duplicating most
of the code. The only remaining difference is, as it should:
deconfigure is enough for breaks, for conflicts we need the big hammer
consistently fail if Smart* packagemanager actions fail
These failure conditions come with an error message attached and the
conditions aren't workaroundable (otherwise this would have been done
instead of returning failure), so not erroring out here means that we
execute dpkg later on with a known not-working ordering adding insult
(our own error messages at the end) to injury (dpkg failure).
Michael Vogt [Thu, 22 May 2014 08:49:35 +0000 (10:49 +0200)]
Implement simple by-hash for apt update
This implements a apt update schema that get the indexfiles by the
hash instead of the name. The rational is that updates to the archive
servers/mirrors are not atomic so the client may have the previous
version of the Release file when the server updates to a new
Release file and new Packages/Sources/Translations indexes. By
keeping the files around by their hash we can still get the previous
indexfile without a hashsum mismatch.
add an additional test for arch specific conflicts
In bugreport #747261 I confirmed with this testcase that apt actually
supports the requested architecture-specific conflicts already since
2012 with commit cef094c2ec8214b2783a2ac3aa70cf835381eae1.
The old test only does simulations which are handy to check apt,
this one builds 'real' packages to see if dpkg agrees with us.
Michael Vogt [Thu, 15 May 2014 12:37:33 +0000 (14:37 +0200)]
Never parse Version/Architecture tags in a Translation-$lang file
Version/Architecture information in a Translation-$lang file is
not allowed, so don't try to parse it. This is a fix for a bugreport
where a Translation-en file contained the content of the regular
Packages file (probably due to local FS corruption). This lead to
strange error messages on file download.
add an additional test for arch specific conflicts
In bugreport #747261 I confirmed with this testcase that apt actually
supports the requested architecture-specific conflicts already since
2012 with commit cef094c2ec8214b2783a2ac3aa70cf835381eae1.
The old test only does simulations which are handy to check apt,
this one builds 'real' packages to see if dpkg agrees with us.
The cache heavily depends on the architecture(s) it is build for,
especially if you move from single- to multiarch. Adding a new
architecture to dpkg therefore has to be detected and must invalidate
the cache so that we don't operate on incorrect data.
The incorrect data will prevent us from doing otherwise sensible
actions (it doesn't allow bad things to happen) and the recovery is
simple and automatic in most cases, so this hides pretty well and is
also not as serious as it might sound at first.
Removes the 256 fields limit, deals consistently with spaces littered
all over the place and is even a tiny bit faster than before.
Even comes with a bunch of new tests to validate these claims.
parse and retrieve multiple Descriptions in one record
It seems unlikely for now that proper archives will carry multiple
Description-* stanzas in the Packages (or Translation-*) file, but
sometimes apt eats its own output as shown by the usage of the CD team
and it would be interesting to let apt output multiple translations
e.g. in 'apt-cache show'.
reenable pipelining via hashsum reordering support
Now that methods have the expected hashes available they can check if
the response from the server is what they expected. Pipelining is one of
those areas in which servers can mess up by not supporting it properly,
which forced us to disable it for the time being. Now, we check if
we got a response out of order, which we can not only use to disable
pipelining automatically for the next requests, but we can fix it up
just like the server responded in proper order for the current requests.
To ensure that this little trick works pipelining is only attempt if we
have hashsums for all the files in the chain which in theory reduces the
use of pipelining usage even on the many servers which work properly,
but in practice only the InRelease file (or similar such) will be
requested without a hashsum – and as it is the only file requested in
that stage it can't be pipelined even if we wanted to.
Some minor annoyances remain: The display of the progress we have
doesn't reflect this change, so it looks like the same package gets
downloaded multiple times while others aren't at all. Further more,
partial files are not supported in this recovery as the received data
was appended to the wrong file, so the hashsum doesn't match.
Both seem to be minor enough to reenable pipelining by default until
further notice through to test if it really solves the problem.
This therefore reverts commit 8221431757c775ee875a061b184b5f6f2330f928.
Now that we have all hashes in the acquire system, pass the info down to
the methods, so that it can use it in the request and/or to precheck the
response.