We had a wild mixture of (unsigned) int, long and long long here without
much sense, so this commit adds a few typedefs to get some sense in the
typesystem and ensures that a ID isn't sometimes computed as int, stored
as long and compared with a long long… as this could potentially bite us
later on as the size of the archive only increases over time.
increase hashtable size for packages/groups by factor 5
It also makes the size configureable, so it can be adapted in the future
without the need for an abi break - and even by users…
The increase was long overdue as it gives a >10% decrease in runtime of
e.g. 'apt-get check -s'. Some (useless) benchmark with 69933 groups and
187796 packages without a pre-built cache:
time apt-get check -so APT::Cache-HashTableSize=1 → 20m
time apt-get check -so APT::Cache-HashTableSize=1000 → 6,41s
time apt-get check -so APT::Cache-HashTableSize=2000 → 5,64s (old)
time apt-get check -so APT::Cache-HashTableSize=3000 → 5,30s
time apt-get check -so APT::Cache-HashTableSize=5000 → 5,08s
time apt-get check -so APT::Cache-HashTableSize=6000 → 5,05s
time apt-get check -so APT::Cache-HashTableSize=7000 → 5,02s
time apt-get check -so APT::Cache-HashTableSize=8000 → 5,00s
time apt-get check -so APT::Cache-HashTableSize=9000 → 4,98s
time apt-get check -so APT::Cache-HashTableSize=10000 → 4,96s (new)
time apt-get check -so APT::Cache-HashTableSize=15000 → 4,90s
time apt-get check -so APT::Cache-HashTableSize=20000 → 4,86s
time apt-get check -so APT::Cache-HashTableSize=30000 → 4,77s
time apt-get check -so APT::Cache-HashTableSize=40000 → 4,74s
time apt-get check -so APT::Cache-HashTableSize=50000 → 4,73s
time apt-get check -so APT::Cache-HashTableSize=60000 → 4,71s
The gap increases further for operations which have more package
lookups. Factor 5 was chosen as higher values do not provide any
really significant timing advantage anymore compared to the memory
increase in my testing and there is always the possibility to increase
it now if that changes. (also most users will not have 3 releases and
4 architectures in the cache, so theirs will be much smaller and faster).
The name suggests that it is supposed to substitute a variable with a
value, but we tend to use it in a more liberal replace_all() fashion,
but this breaks if either of the parameters is empty or more importantly
if two "variable" occurrences follow each other directly.
don't send pkg from an unknown architecture via EDSP
APT's cache can include packages from architectures dpkg has no
knowledge about and can therefore not be installed for e.g. to allow
easy lookups. There is no point in telling external solvers about them
though and some of them might even be really talkative about ignoring
them if we do.
In commit 21b3eac8 I promoted the check for installable dependencies to
a pre-install check, which also reverts to a known good candidate (the
installed version) if it fails. This revert was done even for user
requested candidate switches which disabled our Broken detection so that
install requests which are impossible to satisfy do not fail anymore,
but print an (incomplete) solution proposal and then exit successfully.
Michael Vogt [Fri, 30 May 2014 12:47:56 +0000 (14:47 +0200)]
Show unauthenticated warning for source packages as well
This will show the same unauthenticated warning for source packages
as for binary packages and will not download a source package if
it is unauthenticated. This can be overridden with
--allow-unauthenticated
EDSP code uses pipes opened via an FD as sources and later for those
files modification times and filesize are read - but never really used
again. The result we get from FileFd is probably wrong, but as we don't
use it anyway, we just don't fallback if we have nothing to fallback to
Solvers are supposed to exit successfully even if they haven't found a
solution, but a solver which fails drastically (like e.g. segfaults)
should be detected and dealt with accordingly instead of ignored.
if Resolver fails, do not continue even if not broken
This can happen if the request is already a well-formed request all by
itself (e.g. the package has no dependencies), but the resolver found
a reason to not accept it as solution. Our edsp 'dump' solver e.g.
shouldn't be able to trigger install, which it does otherwise.
As outlined in #748355 apt segfaulted if it encountered a loop between a
package pre-depending on a package conflicting with the previous as it
ended up in an endless loop trying to unpack 'the other package'.
In this specific case as an essential package is involved a lot of force
needs to be applied, but can also be caused by 'normal' tight loops and
highlights a problem in how we handle breaks which we want to avoid.
The fix comes in multiple entangled changes:
1. All Smart* calls are guarded with loop detection. Some already had it,
some had parts of it, some did it incorrect, and some didn't even try.
2. temporary removes to avoid a loop (which is done if a loop is
detected) prevent the unpack of this looping package (we tried to unpack
it to avoid the conflict/breaks, but due to a loop we couldn't, so we
remove/deconfigure it instead which means we can't unpack it now)
3. handle conflicts and breaks very similar instead of duplicating most
of the code. The only remaining difference is, as it should:
deconfigure is enough for breaks, for conflicts we need the big hammer
consistently fail if Smart* packagemanager actions fail
These failure conditions come with an error message attached and the
conditions aren't workaroundable (otherwise this would have been done
instead of returning failure), so not erroring out here means that we
execute dpkg later on with a known not-working ordering adding insult
(our own error messages at the end) to injury (dpkg failure).
Michael Vogt [Thu, 22 May 2014 08:49:35 +0000 (10:49 +0200)]
Implement simple by-hash for apt update
This implements a apt update schema that get the indexfiles by the
hash instead of the name. The rational is that updates to the archive
servers/mirrors are not atomic so the client may have the previous
version of the Release file when the server updates to a new
Release file and new Packages/Sources/Translations indexes. By
keeping the files around by their hash we can still get the previous
indexfile without a hashsum mismatch.
add an additional test for arch specific conflicts
In bugreport #747261 I confirmed with this testcase that apt actually
supports the requested architecture-specific conflicts already since
2012 with commit cef094c2ec8214b2783a2ac3aa70cf835381eae1.
The old test only does simulations which are handy to check apt,
this one builds 'real' packages to see if dpkg agrees with us.
Michael Vogt [Thu, 15 May 2014 12:37:33 +0000 (14:37 +0200)]
Never parse Version/Architecture tags in a Translation-$lang file
Version/Architecture information in a Translation-$lang file is
not allowed, so don't try to parse it. This is a fix for a bugreport
where a Translation-en file contained the content of the regular
Packages file (probably due to local FS corruption). This lead to
strange error messages on file download.
add an additional test for arch specific conflicts
In bugreport #747261 I confirmed with this testcase that apt actually
supports the requested architecture-specific conflicts already since
2012 with commit cef094c2ec8214b2783a2ac3aa70cf835381eae1.
The old test only does simulations which are handy to check apt,
this one builds 'real' packages to see if dpkg agrees with us.
The cache heavily depends on the architecture(s) it is build for,
especially if you move from single- to multiarch. Adding a new
architecture to dpkg therefore has to be detected and must invalidate
the cache so that we don't operate on incorrect data.
The incorrect data will prevent us from doing otherwise sensible
actions (it doesn't allow bad things to happen) and the recovery is
simple and automatic in most cases, so this hides pretty well and is
also not as serious as it might sound at first.
Removes the 256 fields limit, deals consistently with spaces littered
all over the place and is even a tiny bit faster than before.
Even comes with a bunch of new tests to validate these claims.
parse and retrieve multiple Descriptions in one record
It seems unlikely for now that proper archives will carry multiple
Description-* stanzas in the Packages (or Translation-*) file, but
sometimes apt eats its own output as shown by the usage of the CD team
and it would be interesting to let apt output multiple translations
e.g. in 'apt-cache show'.
reenable pipelining via hashsum reordering support
Now that methods have the expected hashes available they can check if
the response from the server is what they expected. Pipelining is one of
those areas in which servers can mess up by not supporting it properly,
which forced us to disable it for the time being. Now, we check if
we got a response out of order, which we can not only use to disable
pipelining automatically for the next requests, but we can fix it up
just like the server responded in proper order for the current requests.
To ensure that this little trick works pipelining is only attempt if we
have hashsums for all the files in the chain which in theory reduces the
use of pipelining usage even on the many servers which work properly,
but in practice only the InRelease file (or similar such) will be
requested without a hashsum – and as it is the only file requested in
that stage it can't be pipelined even if we wanted to.
Some minor annoyances remain: The display of the progress we have
doesn't reflect this change, so it looks like the same package gets
downloaded multiple times while others aren't at all. Further more,
partial files are not supported in this recovery as the received data
was appended to the wrong file, so the hashsum doesn't match.
Both seem to be minor enough to reenable pipelining by default until
further notice through to test if it really solves the problem.
This therefore reverts commit 8221431757c775ee875a061b184b5f6f2330f928.
Now that we have all hashes in the acquire system, pass the info down to
the methods, so that it can use it in the request and/or to precheck the
response.
deal with hashes in ftparchive more dynamic as well
Now that libapts acquire system happily passes around hashes and can be
made to support new ones without an ABI break in the future, we can
free ftparchive from all the deprecation warnings the last commit
introduced for it.
The goal here isn't to preserve ABI as we have none to keep here, but to
help avoiding introduction problems of 'new' hashes later as bugs creep
into the copy&paste parts, so short/less of them is good.
It is not very extensible to have the supported Hashes hardcoded
everywhere and especially if it is part of virtual method names.
It is also possible that a method does not support the 'best' hash
(yet), so we might end up not being able to verify a file even though we
have a common subset of supported hashes. And those are just two of the
cases in which it is handy to have a more dynamic selection.
The downside is that this is a MAJOR API break, but the HashStringList
has a string constructor for compatibility, so with a bit of luck the
few frontends playing with the acquire system directly are okay.
Collect all hashes we can get from the source record and put them into a
HashStringList so that 'apt-get source' can use it instead of using
always the MD5sum.
We therefore also deprecate the MD5 struct member in favor of the list.
While at it, the parsing of the Files is enhanced so that records which
miss "Files" (aka MD5 checksums) are still searched for other checksums
as they include just as much data, just not with a nice and catchy name.
APT supports more than just one HashString and even allows to enforce
the usage of a specific hash. This class is intended to help with
storage and passing around of the HashStrings.