]>
Commit | Line | Data |
---|---|---|
89c4ed63 A |
1 | Requirements for Recursive Caching Resolver |
2 | (a.k.a. Treeshrew, Unbound-C) | |
3 | By W.C.A. Wijngaards, NLnet Labs, October 2006. | |
4 | ||
5 | Contents | |
6 | 1. Introduction | |
7 | 2. History | |
8 | 3. Goals | |
9 | 4. Non-Goals | |
10 | ||
11 | ||
12 | 1. Introduction | |
13 | --------------- | |
14 | This is the requirements document for a DNS name server and aims to | |
15 | document the goals and non-goals of the project. The DNS (the Domain | |
16 | Name System) is a global, replicated database that uses a hierarchical | |
17 | structure for queries. | |
18 | ||
19 | Data in the DNS is stored in Resource Record sets (RR sets), and has a | |
20 | time to live (TTL). During this time the data can be cached. It is | |
21 | thus useful to cache data to speed up future lookups. A server that | |
22 | looks up data in the DNS for clients and caches previous answers to | |
23 | speed up processing is called a caching, recursive nameserver. | |
24 | ||
25 | This project aims to develop such a nameserver in modular components, so | |
26 | that also DNSSEC (secure DNS) validation and stub-resolvers (that do not | |
27 | run as a server, but a linked into an application) are easily possible. | |
28 | ||
29 | The main components are the Validator that validates the security | |
30 | fingerprints on data sets, the Iterator that sends queries to the | |
31 | hierarchical DNS servers that own the data and the Cache that stores | |
32 | data from previous queries. The networking and query management code | |
33 | then interface with the modules to perform the necessary processing. | |
34 | ||
35 | In Section 2 the origins of the Unbound project are documented. Section | |
36 | 3 lists the goals, while Section 4 lists the explicit non-goals of the | |
37 | project. Section 5 discusses choices made during development. | |
38 | ||
39 | ||
40 | 2. History | |
41 | ---------- | |
42 | The unbound resolver project started by Bill Manning, David Blacka, and | |
43 | Matt Larson (from the University of California and from Verisign), that | |
44 | created a Java based prototype resolver called Unbound. The basic | |
45 | design decisions of clean modules was executed. | |
46 | ||
47 | The Java prototype worked very well, with contributions from Geoff | |
48 | Sisson and Roy Arends from Nominet. Around 2006 the idea came to create | |
49 | a full-fledged C implementation ready for deployed use. NLnet Labs | |
50 | volunteered to write this implementation. | |
51 | ||
52 | ||
53 | 3. Goals | |
54 | -------- | |
55 | o A validating recursive DNS resolver. | |
56 | o Code diversity in the DNS resolver monoculture. | |
57 | o Drop-in replacement for BIND apart from config. | |
58 | o DNSSEC support. | |
59 | o Fully RFC compliant. | |
60 | o High performance | |
61 | * even with validation. | |
62 | o Used as | |
63 | * stub resolver. | |
64 | * full caching name server. | |
65 | * resolver library. | |
66 | o Elegant design of validator, resolver, cache modules. | |
67 | * provide the ability to pick and choose modules. | |
68 | o Robust. | |
69 | o In C, open source: The BSD license. | |
70 | o Highly portable, targets include modern Unix systems, such as *BSD, | |
71 | solaris, linux, and maybe also the windows platform. | |
72 | o Smallest as possible component that does the job. | |
73 | o Stub-zones can be configured (local data or AS112 zones). | |
74 | ||
75 | ||
76 | 4. Non-Goals | |
77 | ------------ | |
78 | o An authoritative name server. | |
79 | o Too many Features. | |
80 | ||
81 | ||
82 | 5. Choices | |
83 | ---------- | |
84 | o rfc2181 decourages duplicates RRs in RRsets. unbound does not create | |
85 | duplicates, but when presented with duplicates on the wire from the | |
86 | authoritative servers, does not perform duplicate removal. | |
87 | It does do some rrsig duplicate removal, in the msgparser, for dnssec qtype | |
88 | rrsig and any, because of special rrsig processing in the msgparser. | |
89 | o The harden-glue feature, when yes all out of zone glue is deleted, when | |
90 | no out of zone glue is used for further resolving, is more complicated | |
91 | than that, see below. | |
92 | Main points: | |
93 | * rfc2182 trust handling is used. | |
94 | * data is let through only in very specific cases | |
95 | * spoofability remains possible. | |
96 | Not all glue is let through (despite the name of the option). Only glue | |
97 | which is present in a delegation, of type A and AAAA, where the name is | |
98 | present in the NS record in the authority section is let through. | |
99 | The glue that is let through is stored in the cache (marked as 'from the | |
100 | additional section'). And will then be used for sending queries to. It | |
101 | will not be present in the reply to the client (if RD is off). | |
102 | A direct query for that name will attempt to get a msg into the message | |
103 | cache. Since A and AAAA queries are not synthesized by the unbound cache, | |
104 | this query will be (eventually) sent to the authoritative server and its | |
105 | answer will be put in the cache, marked as 'from the answer section' and | |
106 | thus remove the 'from the additional section' data, and this record is | |
107 | returned to the client. | |
108 | The message has a TTL smaller or equal to the TTL of the answer RR. | |
109 | If the cache memory is low; the answer RR may be dropped, and a glue | |
110 | RR may be inserted, within the message TTL time, and thus return the | |
111 | spoofed glue to a client. When the message expires, it is refetched and | |
112 | the cached RR is updated with the correct content. | |
113 | The server can be spoofed by getting it to visit a especially prepared | |
114 | domain. This domain then inserts an address for another authoritative | |
115 | server into the cache, when visiting that other domain, this address may | |
116 | then be used to send queries to. And fake answers may be returned. | |
117 | If the other domain is signed by DNSSEC, the fakes will be detected. | |
118 | ||
119 | In summary, the harden glue feature presents a security risk if | |
120 | disabled. Disabling the feature leads to possible better performance | |
121 | as more glue is present for the recursive service to use. The feature | |
122 | is implemented so as to minimise the security risk, while trying to | |
123 | keep this performance gain. | |
124 | o The method by which dnssec-lameness is detected is not secure. DNSSEC lame | |
125 | is when a server has the zone in question, but lacks dnssec data, such as | |
126 | signatures. The method to detect dnssec lameness looks at nonvalidated | |
127 | data from the parent of a zone. This can be used, by spoofing the parent, | |
128 | to create a false sense of dnssec-lameness in the child, or a false sense | |
129 | or dnssec-non-lameness in the child. The first results in the server marked | |
130 | lame, and not used for 900 seconds, and the second will result in a | |
131 | validator failure (SERVFAIL again), when the query is validated later on. | |
132 | ||
133 | Concluding, a spoof of the parent delegation can be used for many cases | |
134 | of denial of service. I.e. a completely different NS set could be returned, | |
135 | or the information withheld. All of these alterations can be caught by | |
136 | the validator if the parent is signed, and result in 900 seconds bogus. | |
137 | The dnssec-lameness detection is used to detect operator failures, | |
138 | before the validator will properly verify the messages. | |
139 | ||
140 | Also for zones for which no chain of trust exists, but a DS is given by the | |
141 | parent, dnssec-lameness detection enables. This delivers dnssec to our | |
142 | clients when possible (for client validators). | |
143 | ||
144 | The following issue needs to be resolved: | |
145 | a server that serves both a parent and child zone, where | |
146 | parent is signed, but child is not. The server must not be marked | |
147 | lame for the parent zone, because the child answer is not signed. | |
148 | Instead of a false positive, we want false negatives; failure to | |
149 | detect dnssec-lameness is less of a problem than marking honest | |
150 | servers lame. dnssec-lameness is a config error and deserves the trouble. | |
151 | So, only messages that identify the zone are used to mark the zone | |
152 | lame. The zone is identified by SOA or NS RRsets in the answer/auth. | |
153 | That includes almost all negative responses and also A, AAAA qtypes. | |
154 | That would be most responses from servers. | |
155 | For referrals, delegations that add a single label can be checked to be | |
156 | from their zone, this covers most delegation-centric zones. | |
157 | ||
158 | So possibly, for complicated setups, with multiple (parent-child) zones | |
159 | on a server, dnssec-lameness detection does not work - no dnssec-lameness | |
160 | is detected. Instead the zone that is dnssec-lame becomes bogus. | |
161 | ||
162 | o authority features. | |
163 | This is a recursive server, and authority features are out of scope. | |
164 | However, some authority features are expected in a recursor. Things like | |
165 | localhost, reverse lookup for 127.0.0.1, or blocking AS112 traffic. | |
166 | Also redirection of domain names with fixed data is needed by service | |
167 | providers. Limited support is added specifically to address this. | |
168 | ||
169 | Adding full authority support, requires much more code, and more complex | |
170 | maintenance. | |
171 | ||
172 | The limited support allows adding some static data (for localhost and so), | |
173 | and to respond with a fixed rcode (NXDOMAIN) for domains (such as AS112). | |
174 | ||
175 | You can put authority data on a separate server, and set the server in | |
176 | unbound.conf as stub for those zones, this allows clients to access data | |
177 | from the server without making unbound authoritative for the zones. | |
178 | ||
179 | o the access control denies queries before any other processing. | |
180 | This denies queries that are not authoritative, or version.bind, or any. | |
181 | And thus prevents cache-snooping (denied hosts cannot make non-recursive | |
182 | queries and get answers from the cache). | |
183 | ||
184 | o If a client makes a query without RD bit, in the case of a returned | |
185 | message from cache which is: | |
186 | answer section: empty | |
187 | auth section: NS record present, no SOA record, no DS record, | |
188 | maybe NSEC or NSEC3 records present. | |
189 | additional: A records or other relevant records. | |
190 | A SOA record would indicate that this was a NODATA answer. | |
191 | A DS records would indicate a referral. | |
192 | Absence of NS record would indicate a NODATA answer as well. | |
193 | ||
194 | Then the receiver does not know whether this was a referral | |
195 | with attempt at no-DS proof) or a nodata answer with attempt | |
196 | at no-data proof. It could be determined by attempting to prove | |
197 | either condition; and looking if only one is valid, but both | |
198 | proofs could be valid, or neither could be valid, which creates | |
199 | doubt. This case is validated by unbound as a 'referral' which | |
200 | ascertains that RRSIGs are OK (and not omitted), but does not | |
201 | check NSEC/NSEC3. | |
202 | ||
203 | o Case preservation | |
204 | Unbound preserves the casing received from authority servers as best | |
205 | as possible. It compresses without case, so case can get lost there. | |
206 | The casing from the query name is used in preference to the casing | |
207 | of the authority server. This is the same as BIND. RFC4343 allows either | |
208 | behaviour. | |
209 | ||
210 | o Denial of service protection | |
211 | If many queries are made, and they are made to names for which the | |
212 | authority servers do not respond, then the requestlist for unbound | |
213 | fills up fast. This results in denial of service for new queries. | |
214 | To combat this the first 50% of the requestlist can run to completion. | |
215 | The last 50% of the requestlist get (200 msec) at least and are replaced | |
216 | by newer queries when older (LIFO). | |
217 | When a new query comes in, and a place in the first 50% is available, this | |
218 | is preferred. Otherwise, it can replace older queries out of the last 50%. | |
219 | Thus, even long queries get a 50% chance to be resolved. And many 'short' | |
220 | one or two round-trip resolves can be done in the last 50% of the list. | |
221 | The timeout can be configured. | |
222 | ||
223 | o EDNS fallback. Is done according to the EDNS RFC (and update draft-00). | |
224 | Unbound assumes EDNS 0 support for the first query. Then it can detect | |
225 | support (if the servers replies) or non-support (on a NOTIMPL or FORMERR). | |
226 | Some middleboxes drop EDNS 0 queries, mainly when forwarding, not when | |
227 | routing packets. To detect this, when timeouts keep happening, as the | |
228 | timeout approached 5-10 seconds, and EDNS status has not been detected yet, | |
229 | a single probe query is sent. This probe has a sub-second timeout, and | |
230 | if the server responds (quickly) without EDNS, this is cached for 15 min. | |
231 | This works very well when detecting an address that you use much - like | |
232 | a forwarder address - which is where the middleboxes need to be detected. | |
233 | Otherwise, it results in a 5 second wait time before EDNS timeout is | |
234 | detected, which is slow but it works at least. | |
235 | It minimizes the chances of a dropped query making a (DNSSEC) EDNS server | |
236 | falsely EDNS-nonsupporting, and thus DNSSEC-bogus, works well with | |
237 | middleboxes, and can detect the occasional authority that drops EDNS. | |
238 | For some boxes it is necessary to probe for every failing query, a | |
239 | reassurance that the DNS server does EDNS does not mean that path can | |
240 | take large DNS answers. | |
241 | ||
242 | o 0x20 backoff. | |
243 | The draft describes to back off to the next server, and go through all | |
244 | servers several times. Unbound goes on get the full list of nameserver | |
245 | addresses, and then makes 3 * number of addresses queries. | |
246 | They are sent to a random server, but no one address more than 4 times. | |
247 | It succeeds if one has 0x20 intact, or else all are equal. | |
248 | Otherwise, servfail is returned to the client. | |
249 | ||
250 | o NXDOMAIN and SOA serial numbers. | |
251 | Unbound keeps TTL values for message formats, and thus rcodes, such | |
252 | as NXDOMAIN. Also it keeps the latest rrsets in the rrset cache. | |
253 | So it will faithfully negative cache for the exact TTL as originally | |
254 | specified for an NXDOMAIN message, but send a newer SOA record if | |
255 | this has been found in the mean time. In point, this could lead to a | |
256 | negative cached NXDOMAIN reply with a SOA RR where the serial number | |
257 | indicates a zone version where this domain is not any longer NXDOMAIN. | |
258 | These situations become consistent once the original TTL expires. | |
259 | If the domain is DNSSEC signed, by the way, then NSEC records are | |
260 | updated more carefully. If one of the NSEC records in an NXDOMAIN is | |
261 | updated from another query, the NXDOMAIN is dropped from the cache, | |
262 | and queried for again, so that its proof can be checked again. | |
263 | ||
264 | o SOA records in negative cached answers for DS queries. | |
265 | The current unbound code uses a negative cache for queries for type DS. | |
266 | This speeds up building chains of trust, and uses NSEC and NSEC3 | |
267 | (optout) information to speed up lookups. When used internally, | |
268 | the bare NSEC(3) information is sufficient, probably picked up from | |
269 | a referral. When answering to clients, a SOA record is needed for | |
270 | the correct message format, a SOA record is picked from the cache | |
271 | (and may not actually match the serial number of the SOA for which the | |
272 | NSEC and NSEC3 records were obtained) if available otherwise network | |
273 | queries are performed to get the data. | |
274 | ||
275 | o Parent and child with different nameserver information. | |
276 | A misconfiguration that sometimes happens is where the parent and child | |
277 | have different NS, glue information. The child is authoritative, and | |
278 | unbound will not trust information from the parent nameservers as the | |
279 | final answer. To help lookups, unbound will however use the parent-side | |
280 | version of the glue as a last resort lookup. This resolves lookups for | |
281 | those misconfigured domains where the servers reported by the parent | |
282 | are the only ones working, and servers reported by the child do not. | |
283 | ||
284 | o Failure of validation and probing. | |
285 | Retries on a validation failure are now 5x to a different nameserver IP | |
286 | (if possible), and then it gives up, for one name, type, class entry in | |
287 | the message cache. If a DNSKEY or DS fails in the chain of trust in the | |
288 | key cache additionally, after the probing, a bad key entry is created that | |
289 | makes the entire zone bogus for 900 seconds. This is a fixed value at | |
290 | this time and is conservative in sending probes. It makes the compound | |
291 | effect of many resolvers less and easier to handle, but penalizes | |
292 | individual resolvers by having less probes and a longer time before fixes | |
293 | are picked up. | |
294 |