more explicit MarkRequired algorithm code (part 2)
As the previous commit, this shouldn't change behavior at all, but
beside being more explicit and perhaps faster its also considerably
shorter (granted, mostly by if0-block elimination).
Piling everything in a single if statement always made my head wobble,
but it hasn't even a benefit as the most common case of a package which
isn't installed passes all of the old if and lands in the non-existent
else-part of the inner if. So beside a subjective cleanup of what goes
on this implementation should also be a bit faster.
write auto-bits before calling dpkg & again after if needed
Writing first means that even in the event of a power-failure the
autobit is saved for future processing instead of "forgotten" so that
the package is treated as manually installed.
In some cases we have to re-run the writing after dpkg is done through
as dpkg can let packages disappear and in such cases apt will move
autobits around (or in that case non-autobits) which we need to store.
if reading of autobit state failed, let write fail
If we can't read the old file we can't just move forward as that would
discard potentially discard old data (especially other fields). We let
it fail only after we are done writing the new file so a user has the
chance to look into and merge the new data (which is otherwise
discarded).
We deploy atomic renames for some files, but these renames also happen
if something about the file failed which isn't really the point of the
exercise…
Avoiding the use of GCC >= 5 stuff lets use go back to 4.8 simplifying
the travis setup again as well as reducing the backport requirements in
general.
if conf unset, don't read / as conf/pref/sources dir
Usually these config options are set to sensible values, but if init
isn't run or the user interferes with configuration clearing or similar
the options could indeed carry an empty value, which will result in
FindDir returning a '/'. That feels kinda wrong, but as a public
interface there isn't much we can do about it and instead make it so
that we get the special file /dev/null back we know how to deal with in
such cases.
Julian noticed on IRC that I fall victim to a lovely false friend by
calling referring to a 'planer' all the time even through these are
machines to e.g. remove splinters from woodwork ("make stuff plane").
The term I meant is written in german in this way (= with a single n)
but in english there are two, aka: 'planner'.
As that is unreleased code switching all instances without any
transitional provisions. Also the reason why its skipped in changelog.
Fix buffer overflow in debListParser::VersionHash()
If a package file is formatted in a way that that no space
follows a deprecated "<", we would reformat it to "<=" and
increase the length of the output by 1, which can break.
Under normal circumstances with "<=" this should not be an
issue.
In 385d9f2f23057bc5808b5e013e77ba16d1c94da4 I implemented the storage of
scenario files based on enabling this by default for EIPP, but I
implemented it first optionally for EDSP to have it independent.
The reasons mentioned in the earlier commit (debugging and bugreports)
obviously apply here, especially as EIPP solutions aren't user approved,
nearly impossible to verify before starting the execution and at the
time of error the scenario has changed already, so that reproducing the
issue becomes hard(er).
Freeing 'Install' for future use as an interface for "dpkg --install",
which is currently not used by any existent planer, so the
implementation of it itself will be delayed until then.
A rather special need option, but the internal planer supports this and
we have a testcase for it & sometimes it is hit (as a bug through). The
option itself mostly serves as a reminder for implementors that they
should be careful with removes and especially temporary removes if they
perform any.
APT has 3 modes: no immediate configuration, all packages are configured
immediately and its default mode of configuring essentials and
pseudo-essentials immediately only. While this seems like a job of
different planers at first, it might be handy to have it as an option,
too, in case a planer (like apts internal one) supports different modes
where the introduction of individual planers would be counter intuitive.
For the order there is no inherent difference between delete or purge,
so we don't tell the planer about this and instead decide in apt if a
package should be purged or not which also allows us to not tell the
planer about rc-only purges as we can trivially do this on our on as
there is no need to plan such purges.
eipp: provide the internal planer as an external one
Testing the current implementation can benefit from being able to be
feed an EIPP request and produce a fully compliant response. It is also
a great test for EIPP in general.
We can trim generation time and size of the EIPP scenario considerable
if we we avoid telling the planers about "uninteresting" packages.
This is one of the simpler but already very effective reductions:
Do not tell planers about versions which are neither installed nor are
to be installed as they have no effect on the plan we don't need to tell
the planer about them. EDSP solvers need to know about all versions for
better choice and error messages, but planers really don't.
The very first step in introducing the "external installation planer
protocol" (short: EIPP) as part of my GSoC2016 project.
The description reads: APT-based tools like apt-get, aptitude, synaptic,
… work with the user to figure out how their system should look like
after they are done installing/removing packages and their dependencies.
The actual installation/removal of packages is done by dpkg with the
constrain that dependencies must be fulfilled at any point in time (e.g.
to run maintainer scripts).
Historically APT has a super micro-management approach to this task
which hasn't aged that well over the years mostly ignoring changes in
dpkg and growing into an unmaintainable mess hardly anyone can debug and
everyone fears to touch – especially as more and more requirements are
tacked onto it like handling cycles and triggers, dealing with
"important" packages first, package sources on removable media, touch
minimal groups to be able to interrupt the process if needed (e.g.
unattended-upgrades) which not only sky-rocket complexity but also can
be mutually exclusive as you e.g. can't have minimal groups and minimal
trigger executions at the same time.
Seen in #828011 if we fail to parse a header field like Last-Modified we
end up interpreting the data as response header for coming requests in
case we don't rotate to a new server in DNS rotation.
In 3bdff17c894d0c3d0f813d358fc45d7a263f3552 we did it for the datetime
parsing, but we use the same style in the parsing for pdiff (where the
size of the file is in the middle of the three fields) so imbueing here
as well is a good idea.
wu-ftpd sends the response without parens, whereas we expect
them.
I did not test the patch, but it should work. I added another
return true if Pos is still npos after the second find to make
sure we don't add npos to the string.
Thanks: Lukasz Stelmach for the initial patch Closes: #420940
Rewritten in 9febc2b238e1e322dce1f94ecbed46d595893b52 for c++ locales
usage and rewritten again in 1d742e01470bba27715a8191c50adde4b39c2f19 to
avoid a currently present stdlibc++6 bug in the std::get_time
implementation. The later implementation uses still stringstreams for
parsing, but forgot to explicitly reset the locale to something sane
(for parsing english dates that is), so date and especially the parsing
of a number is depending on the locale. Turns out, the French (among
others) format their numbers with space as thousand separator so for
some reason the stdlibc++6 thinks its a good idea to interpret the
entire datetime string as a single number instead of realizing that in
"25 Jun …" the later parts can't reasonably be part of that number even
through there are spaces there…
add insecure (and weak) allow-options for sources.list
Weak had no dedicated option before and Insecure and Downgrade were both
global options, which given the effect they all have on security is
rather bad. Setting them for individual repositories only isn't great
but at least slightly better and also more consistent with other
settings for repositories.
ensure filesize of deb is included in the hashes list
Filesize is a silly hash all by itself, but in combination with others
it can be a strong opponent, so ensuring that it is in the list of
hashes and hence checked by the normal course of action the acquire
process takes is a good thing.
add [weak] tag to hash errors to indicate insufficiency
For "Hash Sum mismatch" that info doesn't make a whole lot of
difference, but for the new insufficient info message an indicator that
while this hashes are there and even match, they aren't enough from a
security standpoint.
Downloading and saying "Hash Sum mismatch" isn't very friendly from a
user POV, so with this change we try to detect such cases early on and
report it, preferably before download even started.
forbid insecure repositories by default expect in apt-get
With this commit all APT-based clients default to refusing to work with
unsigned or otherwise insufficently secured repositories. In terms of
apt and apt-get this changes nothing, but it effects all tools using
libapt like aptitude, synaptic or packagekit.
The exception remains apt-get for stretch for now as this might break
too many scripts/usecases too quickly.
The documentation is updated and extended to reflect how to opt out or
in on this behaviour change.
Handling the extra check (and force requirement) for downgrades in
security in our AllowInsecureRepositories checker helps in having this
check everywhere instead of just in the most common place and requiring
a little extra force in such cases is always good.
handle weak-security repositories as unauthenticated
APT can be forced to deal with repositories which have no security
features whatsoever, so just giving up on repositories which "just" fail
our current criteria of good security features is the wrong incentive.
Of course, repositories are better of fixing their setup to provide the
minimum of security features, but sometimes this isn't possible:
Historic repositories for example which do not change (anymore).
That also fixes problem with repositories which are marked as trusted,
but are providing only weak security features which would fail the
parsing of the Release file.
run update post-invokes even on (partial) failures
Unsecure repositories result in error messages by default which causes
the acquire run to fail hard, but non-failing repositories are still
updated just like in the slightly less hard-failures which got this
behaviour in 35664152e47a1d4d712fd52e0f0a2dc8ed359d32.
Dominic Benson [Mon, 20 Jun 2016 12:47:46 +0000 (13:47 +0100)]
Check for cached hash entries to determine which (if any) hash types
need to be generated for the current file.
In 1.0.9, each hash type was handled by a separate method, each of
which checked the cache. It looks like when these code paths were
unified (in a311fb96b84757ef8628e6a754232614a53b7891) the cache
checks were not incorporated into the new method.
implement and document DIRECT for auto-detect-proxy
There is a subtile difference between an empty setting and "DIRECT" in
the configuration as the later overrides the generic settings while the
earlier does not. Also, non-zero exitcodes should really be reported as
an error rather than silently discarded.
Commands are probably better of always having output through as the
fall through to the generic proxy settings is likely not intended. As
documenting and implementing this more consistently is kind of a
regression through, it is split off into the next commit.
avoid std::get_time usage to sidestep libstdc++6 bug
As reported upstream in
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=71556
the implementation of std::get_time is currently not as accepting as
strptime is, especially in how hours should be formatted.
Just reverting 9febc2b238e1e322dce1f94ecbed46d595893b52 would be
possible, but then we would reopen the problems fixed by it, so instead
I opted here for a rewrite of the parsing logic which makes this method
a lot longer, but at least it provides the same benefits as the rewrite
in std::get_time was intended to give us and decouples us from the fix
of the issue in the standard library implementation of GCC.
merge sources.list lines based on Release filename
Merging by URI means that having sources lines with different URI
methods results in 'strange' warning and error messages, which aren't
very friendly from a user point of view as not encoding the method in
the filename is effectivly an implementation detail.
Merging by filename removes these messages and makes everything "work"
even if it isn't working the way it is configured as the indexes aren't
acquired over the method given, but over the first method for this
release file (which argueably is an implementation detail stemming from
the filename encoding, too).
So either direction isn't perfectly "right", but personally I prefer
"magic" over strange error messages (and doing a full-circle detection
of this with its own messages which would need to be translated feels
like way too much effort for dubious gain).
http: don't hang on redirect with length + connection close
Most servers who close the connection do not send a content-length as
this is redundant information usually, but some might and while testing
with our server and with 'aptwebserver::response-header::Connection' set
to 'close' I noticed that http hangs after a redirect in such cases, so
if we have the information, just use it instead of discarding it.
We usually use absolute paths to specific the location of dpkg, apt-key
and the like, but there is nothing wrong with using just the command
name and instead let exec(3) make the lookup in PATH.
We had a wild mixture before, so opting for the more accepting option
out of the two seems about right especially as it makes no difference in
the default case as apt uses absolute paths.
We had an old FIXME saying that it is probably pointless to do this if
we limit by length of the commandline already and I completely agree.
The splitting is bad enough if it must be done, so we should only do it
if we have to (as in absolute length of commandline) and, but that is
just a remark, it is unlikely that we ever have/had a call triggering
this as the default value was ~32000 items…
don't explicitly configure the last round of packages
We end our operation by calling "dpkg --configure -a", so instead of
running a (big) configure run with all packages mentioned explicitly
before this, we simply skip them and let them be handled by this call
implicitly.
There isn't really an observeable gain to be had here from a speed
point, but it helps in avoiding an (uncommon) problem of having a too
long commandline passed to dpkg, which we would split up (probably
incorrectly).
Most (if not all) solvers should be able to run perfectly fine without
root privileges as they get the entire state they are supposed to work
on via stdin and do not perform any action directly, but just pass
suggestions on via stdout.
The new default is to run them all as _apt hence, but each solver can
configure another user if it chooses/must. The security benefits are
minimal at best, but it helps preventing silly mistakes (see 35f3ed061f10a25a3fb28bc988fddbb976344c4d) and that is always good.
Note that our 'apt' and 'dump' solver already dropped privileges if they
had them.
It wasn't noticeable before, but now with the (optional) logging it can
be observed that we decide in the internal path two times if an internal
or external solver should be used (and hence with logging, it is
attempted twice), so if we are in the internal path call the internal
resolver directly, which means those internal methods need to be public
– but we can hide them based on the symbol at least.
edsp: optionally store a compressed copy of the last scenario
For bugreports and co it could be handy to have the scenario and all the
settings used in it around later for inspection for EDSP like protocols.
EDSP might not be the most interesting as the user can still interrupt
the process before the solution is applied and users tend to have an
opinion on the "rightness" of a solution, so it is disabled by default.
EDSP(-like) protocols are one-shot processes working on data which
exists only as long as they run (as they get feed via a pipe), so trying
to write a cache for it is pretty pointless, especially as it will
usually fail as the cache files tend to be owned by root, but the
process is run as a unpriviledged user (either _apt if called by root,
otherwise the user of the caller).
So this was in fact only observeable with our testcases which run as
non-root and the worst which happens is that a valid cache is overridden
with an invalid one which the next run will detect and not use.
With have better ways to compare, manipulate and work with strings, so
use it instead of counting string length by hand with is a wonder it
hasn't failed yet. Ignoreable from a changelog perspective as there is
no behaviour change.
The classes are all marked as hidden, so changing them is no problem ABI
wise and will help with introducing protocols similar to EDSP.
The change has no observeable behavior difference, its just code
juggling.
The script takes the version from the changelog, but if it lacks behind
and the symbols file already includes symbols tagged for the next
version the helper prints incorrect lines as NEW for these symbols, but
ideally it shouldn't print them at all as the symbol is already dealt
with.
edsp: use a stanza based interface for solution writing
EDSP had a WriteSolution method to write out the entire solution based
on the inspection of a given pkgDepCache, but that is rather inflexible
both for EDSP itself and for other EDSP like-protocols. It seems better
to use a smaller scope in printing just a single stanza based on a given
version as there is more reuse potential.
Currently an EDSP solver gets send basically all versions which means
the absolute count is the same, but that might not be true forever (and
with the skipping of rc-only versions it kinda is already) and even if
it were true, segfaulting on bad input seems wrong.
apt-key: change to / before find to satisfy its CWD needs
First seen on hurd, but easily reproducible on all systems by removing
the 'execution' bit from the current working directory and watching some
tests (mostly the no-output expecting tests) fail due to find printing:
"find: Failed to restore initial working directory: …"
Samuel Thibault says in the bugreport:
| To do its work, find first records the $PWD, then goes to
| /etc/apt/trusted.gpg.d/ to find the files, and then goes back to $PWD.
|
| On Linux, getting $PWD from the 700 directory happens to work by luck
| (POSIX says that getcwd can return [EACCES]: Search permission was denied
| for the current directory, or read or search permission was denied for a
| directory above the current directory in the file hierarchy). And going
| back to $PWD fails, and thus find returns 1, but at least it emitted its
| output.
|
| On Hurd, getting $PWD from the 700 directory fails, and find thus aborts
| immediately, without emitting any output, and thus no keyring is found.
|
| So, to summarize, the issue is that since apt-get update runs find as a
| non-root user, running it from a 700 directory breaks find.
Solved as suggested by changing to '/' before running find, with some
paranoia extra care taking to ensure the paths we give to find are really
absolute paths first (they really should, but TMPDIR=. or a similar
Dir::Etc::trustedparts setting could exist somewhere in the wild).
The commit takes also the opportunity to make these lines slightly less
error ignoring and the two find calls using (mostly) the same parameters.
Thanks: Samuel Thibault for 'finding' the culprit! Closes: 826043
ignore std::locale exeception on non-existent "" locale
In 8b79c94af7f7cf2e5e5342294bc6e5a908cacabf changing to usage of C++ way
of setting the locale causes us to be terminated in case of usage of an
ungenerated locale as LC_ALL (or similar) – but we don't want to fail
here, we just want to carry on as before with setlocale which we call in
that case just for good measure.
This reduces the number of symbols by about 10%. Unfortunately,
it does not seem to cover all the weird std::vector and friend
template expansions.
ABI should not brake due to that change: It was never specified
before whether an inline symbol was exported or not; so no library
could rely on its presence. Instead, the symbols were exported in
each library/program needing it and and then merged into a common
one by the dynamic linker.
Also update the symbol files to account for the removed symbols.
try to detect sudo spawned root-shell in prefixing
It is a try as the we need to inspect SUDO_COMMAND which could be
anything – apt, apt-get, in /usr/bin, in a $DPKG_ROOT "chroot", build
from source, aliases, …
The best we can do is look if the SHELL variable is equal to the
SUDO_COMMAND which would mean a shell was invoked. That isn't fail-safe
if different shells are involved as sub-shells have the tendency of not
overriding the SHELL so a bash started from within zsh can happily
pretend to be still zsh, so we could have a look at /etc/shells for a
list, but oh well, we have to stop somewhere I guess.
This sudo-prefixing feature is a gimmick after all.
The std::put_time and std::get_time introduced in 9febc2b238e1e322dce1f94ecbed46d595893b52 are part of C++11, but not
implemented in GCC until version 5. std::put_time could actually be
worked around via using the facets put() directly, but get() isn't
implemented so that doesn't really help.
We require various tools from wily (which also means we can't build apt
on Debian stable) already, so requiring gcc-5 is just one more instead
of a big step [and an ignoreable change for changelog anyhow].
It also helps in testing what will actually be used (in terms of the
c++11 std ABI) instead of the old ABI.
We use a wild mixture of C and C++ ways of generating output, so having
a consistent world-view in both styles sounds like a good idea and
should help in preventing regressions.
accept only the expected UTC timezones in date parsing
HTTP/1.1 hardcodes GMT (RFC 7231 §7.1.1.1) and what is good enough for the
internet must be good enough for us™ as we reuse the implementation
internally to parse (most) dates we encounter in various places like the
Release files with their Date and Valid-Until header fields.
Implementing a fully timezone aware parser just feels too hard for no
effective benefit as it would take 5+ years (= until LTS's are out of
fashion) until a repository could use non-UTC dates and expect it to
work. Not counting non-apt implementations which might or might not
only want to encounter UTC here as well.
As a bonus, this eliminates the use of an instance of setlocale in
libapt.
Setting the C++ locale via std::locale::global(std::locale("")); which
would otherwise default to the default C locale (aka: unaffected by
setlocale) effects the formatting of numeric types in IO streams, which
for output for humans is perfectly sensible, but breaks our many text
interfaces used and parsed by us and others without expecting the
numbers to be formatted.
libapt allows to configure compressors to be used by its system via
configuration implemented in 03bef78461c6f443187b60799402624326843396,
but that was never really documented and also only partly working, which
also explains why the tests weren't using it…
The debian/rules file tries to guess in which directory it is supposed
to be building, but that guess is always ./build – if it wasn't it
would fail later as not all rules take alternatives into acount.
So, as this is clearly not used lets remove this complexity instead of
fixing it up.
The code moving in eb1f04dda07c2b69549ad9fd793cca0e91841b3e
moved the acquire stuff above the simulation exit, so before getting
locks (and creating/chmod directories) we should be checking if we
should actually really do it…