Now (55153bf94ff28a23318e79aa48242244c4d82b3c) that pkgTagFile can be
told to deal with all sorts of comments we can use this mode to parse
dsc (as by catch) and debian/control files properly even in the wake of
multiline fields spliced with comments like Build-Depends.
APT usually deals with perfectly formatted files generated automatically
be other programs – and as it has to parse multiple MBs of such files it
tries to be fast rather than forgiving.
This was always a problem if we reused this parser for files with a
deb822 syntax which are mostly written by hand however, like
apt_preferences or the deb822-style sources as these can include stray
newlines and more importantly comments all over the place.
As a stopgap we had pkgUserTagSection which deals at least with comments
before and after a given stanza, but comments in between weren't really
supported and now that we support parsing debian/control for e.g.
build-dep we face the full comment problem e.g. with comments inbetween
multi-line fields (like Build-Depends).
We can't easily deal with this on the pkgTagSection level as the interface
gives access to 'raw' char-pointers for performance reasons so we would
need to optionally add a buffer here on which we could remove comments
to hand out pointers into this buffer instead. The interface is quite
large already and supports writing stanzas as well, which does not
support comments at all either. So while in future it might make sense
to have a parser setup which deals with and keeps comments in this
commit we opt for the simpler solution for now: We officially declare
that pkgTagSection does not support comments and instead expect the
caller to deal with them, which in our case is pkgTagFile:
pkgTagFile is extended with an additional mode which can deal with
comments by dropping them from the buffer which will later form the
input of pkgTagSection. The actual implementation is slightly more
complex than this sentence suggests at first on one hand to have good
performance and on the other to allow jumping directly to stanzas with
offsets collected in a previous run (like our cache generation does it
for example).
BufferedFileFdPrivate: Make InternalFlush() save against errors
Previously, if flush errored inside the loop, data could have
already been written to the wrapped descriptor without having
been removed from the buffer.
Also try to work around EINTR here. A better solution might be
to have the individual privates detect an interrupt and return
0 in such a case, instead of relying on errno being untouched
in between the syscall and the return from InternalWrite.
A default of -9 is too much for them, this will use 674 MB,
according to the xz manual page. Level -6 on the other hand
only needs 94 MB memory for compression.
This causes autopkgtest failures in the test-compressed-indexes
test, as not enough memory exists to proceed.
Change the other compression levels to 6 as well: The gzip
and bzip2 FileFd backends do not read them, and use their
code's default level which is 6, so do the same for external
methods.
Regression introduced in 8710a36a01c0cb1648926792c2ad05185535558e,
but such fields are unlikely in practice as it is just as simple to not
have a field at all with the same result of not having a value.
allow repositories to forbid arch:all for specific index targets
Debian has a Packages file for arch:all already, but the arch:any files
contain arch:all packages as well, so downloading it would be a total
waste of resources. Getting this solved is on the list of things to do,
but it is also the hardest part – for index targets like Contents the
situation is much easier and less server/client implementations are
involved so we might not want to stall them.
A repository can now declare via:
No-Support-for-Architecture-all: Packages
that even if an arch:all Packages exists, it shouldn't be downloaded, so
that support for Contents files can be added now.
The field uses the name of the target from the apt configuration for
simplicity and is negative by design as this field is intended to be
supported/needed only for a "short" time (one or two Debian releases).
While this commit theoretically supports any target, its expected to
only see "Packages" as a value in reality.
debListParser: ParseDepends: Only query native arch if needed
This makes the code parsing architecture lists slower, but on
the other hand, improves the more generic case of reading
dependencies from Packages files.
Change InternalReadLine to always use buffer.read() return value
This is mostly a documentation issue, as the size we want to
read is always less than or equal to the size of the buffer,
so the return value will be the same as the size argument.
Nonetheless, people wondered about it, and it seems clearer
to just always use the return value.
This further improves our performance, and rred on uncompressed
files now spents 78% of its time in writing. Which means that
we should really look at buffering those.
Use a hardcoded buffer size of 4096 to fix performance
The code uses memmove() to move parts of the buffer to the
front when the buffer is only partially read. By simply
reading one page at a time, the maximum size of bytes that
must be moved has a hard limit, and performance improves:
In one test case, consisting of a 430 MB Contents file,
and a 75K PDiff, applying the PDiff previously took about
48 seconds and now completes in 2 seconds.
Further speed up can be achieved by buffering writes, they
account for about 60% of the run-time now.
We don't need the buffer that often - only for ReadLine - as it is only
occasionally used, so it is actually more efficient to allocate it if
needed instead of statically by default. It also allows the caller to
influence the buffer size instead of hardcoding it.
The default implementation of ReadLine was very naive by just reading
each character one-by-one. That is kinda okay for libraries implementing
compression as they have internal buffers (but still not great), but
while working with files directly or via a pipe as there is no buffer
there so all those reads are in fact system calls.
This commit introduces an internal buffer in the FileFd implementation
which is only used by ReadLine. The more low-level Read and all other
actions remain unbuffered – they just changed to deal with potential
"left-overs" in the buffer correctly.
If we use the library to compress xz, still try to understand and pick
up the arguments we would have used to call xz to figure out which level
the user wants us to use instead of defaulting to level 6 (which is the
default level of xz).
follow dpkg and xz and use CRC64 for xz compression
dpkg switched from CRC32 to CRC64 in 777915108d9d36d022dc4fc4151a615fc95e5032 with the message:
| This is the default CRC used by the xz command-line tool, align with
| it and switch from CRC32 to CRC64. It should provide slightly better
| detection against damaged data, at a negligible speed difference.
shuffle compressor-specific code into private subclasses
This isn't implementing any new features, it is "just" moving code
around from FileFd methods which decided on each call how to handle the
request by including all logic for all possible compressor backends in
the method body to a model in which backend-specifics are implemented in
a FileFdPrivate subclass. This avoids a big chunk of #ifdef's and should
make it a tiny bit more obvious which backend uses which code.
The execution of the idea is slightly uglified by the need to preserve
ABI and API which causes liberal befriending.
The output changes slightly between different versions, which we already
dealt with in the main testcase for apt-key, but there are two more
which do not test both versions explicitly and so still had gpg1 output
to check against as this is the default at the moment.
The presents (even of an empty) secring.gpg is indication enough for
gpg2 to tigger the migration code which not only produces a bunch of
output on each apt-key call, but also takes a while to complete as an
agent needs to be started and all that.
We workaround the first part by forcing the migration to happen always
in a call we forced into silence, but that leaves us with an agent to
start all the time – with a bit of reordering we can make it so that we
do not explicitly create the secring, but let gpg create it if needed,
which prevents the migration from being triggered and we have at least a
bit less of a need for an agent. Changes - even to public only keyrings
- still require one, but such actions are infrequent in comparison to
verification calls, so that should be a net improvement.
apt-key creates internally a script (since ~1.1) which it will call to
avoid dealing with an array of different options in the code itself, but
while writing this script it wraps the values in "", which will cause
the shell to evaluate its content upon execution.
To make 'use' of this either set a absolute gpg command or TMPDIR to
something as interesting as:
"/tmp/This is fü\$\$ing cràzy, \$(man man | head -n1 | cut -d' ' -f1)\$!"
If such paths can be encountered in reality is a different question…
Pino Toscano [Sat, 19 Dec 2015 11:06:53 +0000 (12:06 +0100)]
CopyFile: avoid failing on EOF on some systems
On EOF, ToRead will be 0, which might trigger on some systems (e.g.
on the Hurd) an error due to the invalid byte count passed to write().
The whole loop already checks for ToRead != 0, so perform the writing
step only when there was actual data read.
Pino Toscano [Sat, 19 Dec 2015 11:00:43 +0000 (12:00 +0100)]
CopyFile: fix BufSize to a sane value
Commit e977b8b9234ac5db32f2f0ad7e183139b988340d tries to make BufSize
calculated based on the size of the buffer; the problem is that
std::unique_ptr::size() returns a pointer to the data, so sizeof()
equals to the size of a pointer (later divided by sizeof(char), which
is 1). The result is that the CopyFile copies in chunks of 8 bytes,
which is not exactly ideal...
As solution, declare BufSize in advance, and use its value to allocate
the Buf array.
Pino Toscano [Sat, 19 Dec 2015 11:09:18 +0000 (12:09 +0100)]
Fix FileUtlTest.GetTempDir failure when run as root
Testing /usr as TMPDIR assumes that GetTempDir() cannot use it
because it cannot write to it; this is true for non-root users, but
not so much for root.
Since root can access everything, perform this particular test case
only when not running as root.
support regex and co in 'apt-cache policy $pkg' again
Regression of 1e064088bf7b3e29cd36d30760fb3e4143a1a49a (1.1~exp4) which
moved code around and renamed methods heavily ending up calling the
wrong method matching packagenames only instead of calling the full
array. Most commands work with versions, so this managed to fly under
the radar for quite a while.
show a more descriptive error for weak Release files
If we can't work with the hashes we parsed from the Release file we
display now an error message if the Release file includes only weak
hashes instead of downloading the indexes and failing to verify them
with "Hash Sum mismatch" even through the hashes didn't mismatch (they
were just weak).
If for some (unlikely) reason we have got weak hashes only for
individual targets we will show a warning to this effect (again, befor
downloading and failing the index itself).
redirect which stderr to /dev/null for consistency
The "standard" which (debianutils) has no output whatsoever on stderr,
bash and dash which use this implementation therefore haven't either.
In zsh 'which' is a shell built-in – and has no stderr output either, it
does print an error message on stdout…
So, realistically, a redirection isn't needed at all, but it also can't
hurt (<- I have said that before in this context ->) so why not for
consistency with… well, not with "command -v" as that hasn't an error
message either. Lets say for consistency with my mental image of shell,
as I am still a bit puzzled by zsh's which and now could imagine even
more strange things in other shells.
Reversing the parsing order ensures that we parse weaker hashes (like
SHA1) before we touch newer/stronger hashes (like SHA256) as the weaker
ones will usually be there for a longer time already with data already
present, which we would discard if we start with the strong one first.
The discarding is visible in the debug logs:
File X wasn't in the list for the first parsed hash! (history)
File X wasn't in the list for the first parsed hash! (patches)
which if file X is part of the patch-path means apt will not find a path and
fallback to acquire the whole file instead needlessly.
If file X isn't part of the patch-path that is no problem, so that
effects only the update-call which updates with patches coming from
before and after the addition of a new hash.
With the package names now normalized to lower case, the caches
of affected systems need to be rebuild. Adjust the minor version
to trigger such a rebuild.
Convert package names from Packages files to lower case
dpkg does that when reading package files, so we should do
the same. This only deals with parsing names from binary
package paragraphs, it does not look at source package names
and/or the list of binaries in a dsc file.
In e75e5879 'replace "which" with "command -v" for portability' I missed
that command -v isn't actually required to be available in debian, so
for the 5 files we are using it:
Two (abicheck/run_abi_test & test/integration/framework) are called in
environments were I believe sh is at least dash or 'better' as the first
one is "interactive" for apt developers and the later is sourced by ~200
tests in the same directory run by hand and ci-services – for the later
we have pulled some uglier hacks for worser things already, so if there
should actually end up needing something more compatible we will notice
eventually (and the later actually had a command -v call for some time
already and nobody came running).
debian/rules and debian/apt.cron.daily I switched back to which as that
is more or less debian-specific or at least highly non-critical.
That leaves cmdline/apt-key.in with a bunch of calls where I will
implement that functionality in shell as this is relatively short-lived
as it is used to detect wget (for net-update, which Michael wants to
revive and in that process will properly use apt-helper instead of wget)
and to detect gpg vs. gpg2 systems, where the earlier is supposed to go
away in the longrun (or the later, but by replacing the earlier…).
[and this gpg/gpg2 detection is new in sid, so I have some sympathy for
that being a problem now.]