Internet DRAFT - draft-pappas-dnsop-long-ttl
draft-pappas-dnsop-long-ttl
Network Working Group V. Pappas, Ed.
Internet-Draft IBM
Intended status: Standards Track E. Osterweil, Ed.
Expires: August 26, 2012 Verisign
February 23, 2012
Improving DNS Service Availability by Using Long TTL Values
draft-pappas-dnsop-long-ttl-04.txt
Abstract
Due to the hierarchical tree structure of the Domain Name System
[RFC1034][RFC1035], losing all of the authoritative servers that
serve a zone can disrupt services to not only that zone but all of
its descendants. This problem is particularly severe if all the
authoritative servers of the root zone, or of a top level domain's
zone, fail. Although proper placement of secondary servers, as
discussed in [RFC2182], can be an effective means against isolated
failures, it is insufficient to protect the DNS service against a
Distributed Denial of Service (DDoS) attack. This document proposes
to reduce the impact of DDoS attacks against top level DNS servers by
setting long TTL values for NS records and their associated A and
AAAA records. Our proposed changes are purely operational and can be
deployed incrementally.
Status of this Memo
This Internet-Draft is submitted in full conformance with the
provisions of BCP 78 and BCP 79.
Internet-Drafts are working documents of the Internet Engineering
Task Force (IETF). Note that other groups may also distribute
working documents as Internet-Drafts. The list of current Internet-
Drafts is at http://datatracker.ietf.org/drafts/current/.
Internet-Drafts are draft documents valid for a maximum of six months
and may be updated, replaced, or obsoleted by other documents at any
time. It is inappropriate to use Internet-Drafts as reference
material or to cite them other than as "work in progress."
This Internet-Draft will expire on August 26, 2012.
Copyright Notice
Copyright (c) 2012 IETF Trust and the persons identified as the
document authors. All rights reserved.
Pappas & Osterweil Expires August 26, 2012 [Page 1]
Internet-Draft Improving DNS Service Availability February 2012
This document is subject to BCP 78 and the IETF Trust's Legal
Provisions Relating to IETF Documents
(http://trustee.ietf.org/license-info) in effect on the date of
publication of this document. Please review these documents
carefully, as they describe your rights and restrictions with respect
to this document. Code Components extracted from this document must
include Simplified BSD License text as described in Section 4.e of
the Trust Legal Provisions and are provided without warranty as
described in the Simplified BSD License.
Table of Contents
1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.1. Terminology . . . . . . . . . . . . . . . . . . . . . . . 4
1.2. Conventions . . . . . . . . . . . . . . . . . . . . . . . 5
2. Recommendations . . . . . . . . . . . . . . . . . . . . . . . 6
2.1. Examples . . . . . . . . . . . . . . . . . . . . . . . . . 7
3. Operational Issues . . . . . . . . . . . . . . . . . . . . . . 9
3.1. Cache Coherency . . . . . . . . . . . . . . . . . . . . . 9
3.2. Routine Cache Maintenance Issues . . . . . . . . . . . . . 10
3.3. Implementation Issues . . . . . . . . . . . . . . . . . . 10
4. Security Considerations . . . . . . . . . . . . . . . . . . . 11
5. Contributing Authors . . . . . . . . . . . . . . . . . . . . . 12
6. Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . 13
7. References . . . . . . . . . . . . . . . . . . . . . . . . . . 14
7.1. Normative References . . . . . . . . . . . . . . . . . . . 14
7.2. Informative References . . . . . . . . . . . . . . . . . . 14
Appendix A. Measurements . . . . . . . . . . . . . . . . . . . . 15
A.1. Frequency of Infra-RR Changes . . . . . . . . . . . . . . 15
A.2. Effectiveness of Long TTL Values . . . . . . . . . . . . . 15
Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 17
Pappas & Osterweil Expires August 26, 2012 [Page 2]
Internet-Draft Improving DNS Service Availability February 2012
1. Introduction
[RFC2182] provides operational guidelines for selecting and operating
authoritative servers to maximize a zone's availability. Proper
placement of authoritative servers can be an effective means to guard
DNS service against unintentional failures or errors, but it is not
well suited to protect DNS services against intentional attacks (like
Distributed Denial of Service attacks). A Distributed Denial of
Service (DDoS) attack could target all of the authoritative servers
for a zone, regardless of where they are placed. By disabling all of
a zone's authoritative servers, an attacker can disrupt service for
that zone and all the zones below it. In particular, attacks against
domains such as the root, generic Top Level Domains (gTLDs), country
code Top Level Domains (ccTLDs), and other zones serving popular DNS
domains (such as .co.uk or .co.jp) could have a severe global impact.
For example, knocking out all of the root zone servers may
effectively render the entire Internet unreachable. Successful
attacks against all authoritative servers for a large generic top
level domain (gTLD) such as ".com" can also impact availability for
over one hundred million DNS zones.
Currently, some of the root and gTLD and ccTLDs servers use shared
unicast addresses [RFC3258] to improve availability during denial of
service attacks. This approach can be effective only when the number
of replicated servers is large and when they are placed in diverse
geographic locations. However the interactions between shared
unicast addresses and BGP routing dynamics are still not fully
understood. Furthermore, the use of shared unicast introduces one
entry in the global BGP routing for every shared unicast enabled
server. Therefore, using a shared unicast address is a solution that
can not be applied easily for every DNS zone. In this document we
propose an alternative and simpler approach for improving
availability in the face of denial of service attacks, by
recommending longer TTL values for certain DNS resource records
(RRs). However, in contrast to shared unicast our recommendation
does not protect the attacked zone, but rather all the child zones
under the attacked zone.
This document proposes a recommendation based on the observation that
DNS caching can effectively help mitigate the impact of denial of
service attacks. A caching resolver only consults an authoritative
server if the requested data is not already present in the cache.
The cache contains both specific records such as www.example.com and
infrastructure records such as the name servers for example.com. In
this document, we focus primarily on the caching of infrastructure
records (defined formally in the next section) and show how setting
long TTL values on these records can help mitigate the impact of DDoS
attacks. For example, consider the case of a successful attack
Pappas & Osterweil Expires August 26, 2012 [Page 3]
Internet-Draft Improving DNS Service Availability February 2012
against all of the DNS root servers and suppose all root servers are
unavailable for some time period P. Despite the attack, resolvers can
still access commonly used gTLDs and ccTLDs as long as the root's NS
records and their corresponding A/AAAA resource record sets (RRsets)
remain in a locally available cache during the period P. Generally
speaking, access to the root servers is only used for looking up top
level domain entries that are not presently available in the cache.
Similar arguments apply to attacks against servers of other top level
domains, or any DNS domain for that matter. If the NS and associated
A/AAAA RRsets for a domain are cached, an attack against higher level
domains will have little or no impact on descendant domains. When
DNSSEC is used, additional RRsets must also be cached in order to
weather the attack.
1.1. Terminology
We use the DNS terminology introduced in [RFC1034], [RFC1035],
[SIG88DNS], [RFC2181] and [RFC4034]. Furthermore, this section
introduces some additional terminology used in this draft:
Application Layer Resource Records (AL-RRs): The set of RRs that can
potentially be queried by a stub-resolver. In other words RRs that
are used by end-host applications. These records include almost all
types of RRs, such as A, AAAA, MX, CNAME, SRV, etc.
Infrastructure Resource Records (Infra-RRs): The set of RRs that are
used only in order to (securely) resolve a zone. NS and DS RRs are
by definition Infra-RRs. The A and AAAA RRs are Infra-RRs if and
only if the name associated with the A or AAAA RR exactly matches a
name in the data portion of some NS RR. An NSEC RR is an Infra-RR if
and only if its owner name is a delegation. A DNSKEY RR is an
Infra-RR if and only if it matches a DS RR or is configured as a
trust anchor in some resolver. An RRSIG RR is an Infra-RR if and
only if it signs an infrastructure RRset. All other resource records
defined at the time of this draft are Application Layer resource
records.
Parent Zone (PZ): The zone defined right above the referenced zone.
Child Zone (CZ): A zone defined right below the referenced zone.
Authoritative Copy (AC): Some RRs (or RRsets) are defined in more
than one zone. For example the NS RRset for a zone is defined both
at the zone and at its parent zone. [RFC2181] describes how caches
determine which copy of an RRset is most trustworthy and
authoritative.
Pappas & Osterweil Expires August 26, 2012 [Page 4]
Internet-Draft Improving DNS Service Availability February 2012
1.2. Conventions
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
"SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
document are to be interpreted as described in [RFC2119].
Pappas & Osterweil Expires August 26, 2012 [Page 5]
Internet-Draft Improving DNS Service Availability February 2012
2. Recommendations
Based on the measured TTL values of RRs [IMW01CACHE], data and
infrastructure RRs are treated almost equally when setting their TTL
values. Measurements show that the TTL values of infrastructure RRs
(Infra-RRs), NS RRsets specifically, range from as short as 0 seconds
(!) to as long as one week, with most TTLs set to 12 or less hours.
Similarly, TTL values for Application Layer RRs (AL-RRs) exhibit
similar value variations, with a smaller mean value though. These
measurement results suggest that many DNS operators do not
distinguish the semantic difference between Infra-RRs and AL-RRs when
setting their TTL values.
In this draft we argue that Infra-RRs and AL-RRs SHOULD be treated
differently when setting their TTL values. The main rationale is the
following: Infra-RRs are mainly used by the DNS system itself and as
such tend to be relatively stable records, while AL-RRs are primarily
used by applications and thus tend to be more dynamic. As such,
Infra-RRs can afford to have longer TTL values. More specifically we
propose that Infra-RRs SHOULD have longer TTL values than those
observed (12 or less hours), and we recommend that their TTL value
SHOULD be in the order of days. This specific value recommendation
is based on a measurement study (Appendix A.2), which shows that TTL
values of 3 to 7 days can considerably improve the overall
availability of the DNS system when faced with denial of service
attacks.
We recommend this range of TTL values both for the authoritative copy
of the Infra-RRs, as well as for the copy of the Infra-RRs stored at
the parent zone. These TTL values SHOULD be applied for both copies
of Infra-RRs for the following reasons: A) If they are applied only
at the parent's copy, then a resolver will always replace the
parent's copy with the authoritative copy (lower TTL values),
whenever it receives a reply that contains or attaches the
authoritative copy. B) If they are applied only at the authoritative
copy, then it is possible that a resolver will only use the parent's
copy. For example most resolvers cache the NS RRs of most TLDs from
replies that usually come from the root zone. These RRs are rarely
replaced by their authoritative copy, given that TLDs usually reply
with referrals for their child zones (and thus they do not attach
their own NS records in these replies). In addition, some name
server implementations offer a configuration option called "minimal
responses." When this option is enabled, name servers will omit the
ADDITIONAL section of DNS responses, and thus not include glue
records.
Pappas & Osterweil Expires August 26, 2012 [Page 6]
Internet-Draft Improving DNS Service Availability February 2012
2.1. Examples
In this section we provide, some example configurations for setting
the TTL value of Infra-RRs. The first example shows the
configuration of a zone (zone1.example.) when all its NS records have
names that belong to the same branch of the DNS tree:
$example.
zone1.example. 432000 NS a.zone1.example.
zone1.example. 432000 NS b.zone1.example.
a.zone1.example. 432000 A 10.1.0.1
b.zone1.example. 432000 A 10.1.0.2
$zone1.example.
zone1.example. 432000 NS a.zone1.example.
zone1.example. 432000 NS b.zone1.example.
a.zone1.example. 432000 A 10.1.0.1
b.zone1.example. 432000 A 10.1.0.2
The following example shows the configuration of a zone
(sub1.zone1.example.), when one of its servers has a name that
belongs to a different branch of the DNS tree:
$zone1.example.
sub1.zone1.example. 432000 NS a.sub1.zone1.example.
sub1.zone1.example. 432000 NS b.sub2.zone2.example.
a.sub1.zone1.example. 432000 A 10.1.1.1
$sub1.zone1.example.
sub1.zone1.example. 432000 NS a.sub1.zone1.example.
sub1.zone1.example. 432000 NS b.sub2.zone2.example.
a.sub1.zone1.example. 432000 A 10.1.1.1
$sub2.zone2.example.
b.sub2.zone2.example. 432000 A 10.2.2.2
Finally the following example shows the configuration for a DNSSEC
enabled zone (zone2.example.):
Pappas & Osterweil Expires August 26, 2012 [Page 7]
Internet-Draft Improving DNS Service Availability February 2012
$example.
zone2.example. 432000 NS a.zone2.example.
zone2.example. 432000 NS b.zone2.example.
zone2.example. 432000 DS ...
zone2.example. 432000 RRSIG ...
zone2.example. 432000 NSEC ...
$zone2.example.
zone2.example. 432000 NS a.zone2.example.
zone2.example. 432000 NS b.zone2.example.
zone2.example. 432000 RRSIG ...
zone2.example. 432000 DNSKEY ...
zone2.example. 432000 RRSIG ...
zone2.example. 432000 NSEC ...
Pappas & Osterweil Expires August 26, 2012 [Page 8]
Internet-Draft Improving DNS Service Availability February 2012
3. Operational Issues
3.1. Cache Coherency
Increasing the TTL value of Infra-RRs can cause some cached Infra-RRs
to be inconsistent with the Infra-RRs provided by the authoritative
servers for longer periods of time, when these Infra-RRs change. We
believe this cache coherency problem is not an issue for most cases,
for the following reasons: First, Infra-RRs tend to be relatively
stable records (Appendix A.1, [HOT05DNS]), and thus this
inconsistency issue is expected to be a rare event. Second, Infra-
RRs inconsistencies for all records except NS and the associated
A/AAAA RRs can be easily identified and corrected by a resolver (by
querying the authoritative zone for these RRs).
Perhaps the most worrisome case occurs when there are inconsistencies
between NS Infra-RRs and their corresponding A/AAAA Infra-RRs. In
such a case, resolvers are still able to correct inconsistencies as
long as the NS and A/AAAA RRs have not changed for at least one name
server. That is, as long as there is at least one of the former NS+
A/AAAA mappings that corresponds to a reachable name server, the
resolver will eventually contact this server and will be able to
replace the new NS and A/AAAA Infra-RRs (assuming that these records
are attached in the reply). In the case that all NS and A/AAAA
Infra-RRs have changed then the resolver may or may not be able to
overcome Infra-RR inconsistency. This depends on the resolvers
implementation. Some resolvers tend to contact the parent zone in
such a case (and thus correct the inconsistency), while others return
an error code (and thus stay inconsistent until the RRs expire).
Given that different implementations behave differently in the case
where all NS and A/AAAA Infra-RRs change at the same time, we
recommend the following for zones that implement longer TTL values
for Infra-RRs. The zone administrator SHOULD gradually move to the
new Infra-RRs, by incrementally changing the NS and A/AAAA RRs (not
all of them at once), or she/he SHOULD maintain that availability of
the old set of servers (which have been removed from the zone) until
all the old Infra-RRs have expired from the caches of any possible
resolver. This can be ensured by keeping servers available for at
least an interval equal to the former Infra-RRs TTL (plus some
jitter).
Another approach that an operator MAY follow is to gradually reduce
the TTL value of Infra-RRs before the change, and restore the longer
TTL values after the change. Clearly a zone administrator MAY follow
any of the above strategies when changing the NS and A/AAAA records
to a completely different set.
Pappas & Osterweil Expires August 26, 2012 [Page 9]
Internet-Draft Improving DNS Service Availability February 2012
3.2. Routine Cache Maintenance Issues
A subtle issue may arise in cases where DNS operators manually (or
periodically) manage their caches' state. For example, some
operators may (as a matter of operational prudence) periodically
flush their resolvers' caches. If, for example, an operator flushes
his/her caches every day, then the long TTLs on Infra-RRs would not
benefit them for any period longer than his/her own inter-flush
period. That is, if caches get flushed every 24 hours, then any TTL
longer than 24 hours would effectively be truncated. Thus, the
operational tradeoffs that prompt this sort of practice may need to
be reconsidered in the face of DDoS threats.
3.3. Implementation Issues
This document does not require or recommend any implementation
changes of either the authoritative server software or the resolver
software. On the other hand, we should point out that some changes
may be beneficial when zones implement longer TTL values for Infra-
RRs.
Pappas & Osterweil Expires August 26, 2012 [Page 10]
Internet-Draft Improving DNS Service Availability February 2012
4. Security Considerations
This document prescribes an operational practice that facilitates DNS
queries during prolonged outages. Such outages may result from
extended DDoS attacks against key servers in the DNS. The use of
long TTL values does not reduce the attack surface of targeted
servers to DDoS attacks. However, the use of long TTL values extends
the amount of time a DDoS attack must be waged before it impacts an
entire DNS subtree, and directly affects the effectiveness of a DDoS
to the global DNS. While a DDoS may disrupt the availability of some
critical authoritative servers, the NS records for the zones that are
delegated by them will be available in remote caches for much longer.
The extended buffer between the start of a DDoS and related outages
from its effects provides a greater window for operators (DNS,
routing, etc.) to collaborate and mitigate it before its effects
become crippling. Therefore, while a DDoS is no less likely, the
operational window for remediating its effects can be dramatically
enhanced.
Pappas & Osterweil Expires August 26, 2012 [Page 11]
Internet-Draft Improving DNS Service Availability February 2012
5. Contributing Authors
The editors wish to express that the thoughts, opinions, text, and
work of this document are due to the following set of authors:
Vasileios Pappas
IBM Research, Watson Research Center
vpappas@us.ibm.com
Eric Osterweil
Verisign, Inc.
eosterweil@verisign.com
Danny McPherson
Verisign, Inc.
dmcpherson@tcb.net
Duane Wessels
Verisign, Inc.
dwessels@verisign.com
Matt Larson
Verisign, Inc.
mlarson@verisign.com
Dan Massey
Colorado State University,
massey@cs.colostate.edu
Lixia Zhang
University of California, Los Angeles
lixia@cs.ucla.edu
Pappas & Osterweil Expires August 26, 2012 [Page 12]
Internet-Draft Improving DNS Service Availability February 2012
6. Acknowledgments
We would like to express our thanks to Greg Minshall for an early
discussion on the feasibility of using long TTLs to improve DNS
availability, to Pete Resnick for his support and the suggestion of
using one week or even longer TTL values, and to Rob Austin and
Patrik Faltstrom who also provided constructive comments to our
proposal.
Pappas & Osterweil Expires August 26, 2012 [Page 13]
Internet-Draft Improving DNS Service Availability February 2012
7. References
7.1. Normative References
[RFC1034] Mockapetris, P., "Domain names - concepts and facilities",
STD 13, RFC 1034, November 1987.
[RFC1035] Mockapetris, P., "Domain names - implementation and
specification", STD 13, RFC 1035, November 1987.
[RFC2119] Bradner, S., "Key words for use in RFCs to Indicate
Requirement Levels", BCP 14, RFC 2119, March 1997.
[RFC2181] Elz, R. and R. Bush, "Clarifications to the DNS
Specification", RFC 2181, July 1997.
[RFC2182] Elz, R., Bush, R., Bradner, S., and M. Patton, "Selection
and Operation of Secondary DNS Servers", BCP 16, RFC 2182,
July 1997.
[RFC3258] Hardie, T., "Distributing Authoritative Name Servers via
Shared Unicast Addresses", RFC 3258, April 2002.
[RFC4034] Arends, R., Austein, R., Larson, M., Massey, D., and S.
Rose, "Resource Records for the DNS Security Extensions",
RFC 4034, March 2005.
7.2. Informative References
[HOT05DNS]
Handley, M. and A. Greenhalgh, "The Case for Pushing DNS",
HotNets, 2005.
[IMW01CACHE]
Jung, J., Sit, E., Balakrishnan, H., and R. Morris, "DNS
Performance and the Effectiveness of Caching", IMW, 2001.
[SIG88DNS]
Mockapetris, P. and K. Dunlap, "Development of the Domain
Name System", SIGCOMM, 1988.
Pappas & Osterweil Expires August 26, 2012 [Page 14]
Internet-Draft Improving DNS Service Availability February 2012
Appendix A. Measurements
A.1. Frequency of Infra-RR Changes
To assess the stability of deployed DNS servers, we conducted a
measurement study. From a crawl over 15 million DNS zones (the crawl
was initiated at DMOZ.ORG), we randomly selected 100,000 zones and
measured their infrastructure RRsets over a 4-month period.
During this 4-month period we queried each of the 100,000 zones twice
a day to obtain its infrastructure RRset. Our data shows that 75% of
the measured zones did not change either the NS or corresponding A
RRsets during the entire study period. 11% of the zones showed
changes to their NS RRset during this 4-month period, and 5% of the
zones made the changes in less than 2 months. The A records of all
the measured zone servers had more changes than the NS RRsets: 22% of
the zones had their servers' A records changed within 4 months, and
10% of the zones made servers' A record changes in less than 2
months. All in all, our measurement results show that the DNS
servers, in the majority of the zones, are very stable. Even those
servers that made changes during our measurement period show that
their DNS server changes are rather infrequent.
A.2. Effectiveness of Long TTL Values
In order to gauge the effectiveness of a longer TTL value for the DNS
infrastructure records, we used a real DNS trace that was captured by
a UCLA caching server for 2 weeks. Based on this trace, we simulated
a DoS attack on all root and TLD servers and we measured the
percentage of queries that weren't resolved (excluding negative
answers from the root and TLD zones), in the case of measured TTL
values, and in the case of a hypothetical TTL value of 3, 5, 7, and 9
days for all zones. The attack duration was 3, 6, 12 and 24 hours,
and started at the eighth day (in simulation time). The following
table shows the absolute number as well as the percentage of the
queries that did not resolve for each case of attack duration and TTL
value:
Pappas & Osterweil Expires August 26, 2012 [Page 15]
Internet-Draft Improving DNS Service Availability February 2012
---------------------------------------------------------------------
| || Attack Duration (Hours) |
---------------------------------------------------------------------
| || 3 | 6 | 12 | 24 |
| TTL ||-------------------------------------------------------------
|(day)|| 7776 Queries | 13799 Queries| 23586 Queries| 53636 Queries |
|--------------------------------------------------------------------
| - || 2227 - 28.6% | 3829 - 27.7% | 6807 - 28.8% | 17099 - 31.8% |
| 3 || 1132 - 14.5% | 1884 - 13.6% | 3154 - 13.3% | 7218 - 13.4% |
| 5 || 917 - 11.7% | 1530 - 11.0% | 2562 - 10.8% | 5947 - 11.0% |
| 7 || 767 - 9.8% | 1256 - 9.1% | 2092 - 8.8% | 4766 - 8.8% |
| 9 || 711 - 9.1% | 1165 - 8.4% | 1898 - 8.0% | 4157 - 7.7% |
---------------------------------------------------------------------
Clearly, we see that by using a longer TTL value we can increase the
overall system availability under denial of service attacks. The
table shows that with a TTL value of seven days we can decrease the
impact of such an attack at the root and TLD servers by 70%,
independent of the attack duration. Also the table shows that by
increasing the TTL value, we are more resilient to attacks. Based on
these results we believe that a TTL value of seven days is adequate
enough to considerably improve the resilience of the DNS system
against denial of service attacks.
Pappas & Osterweil Expires August 26, 2012 [Page 16]
Internet-Draft Improving DNS Service Availability February 2012
Authors' Addresses
Vasileios Pappas (editor)
IBM Research, Watson Research Center
19 Skyline Drive
Hawthorne, NY 10532
US
Email: vpappas@us.ibm.com
Eric Osterweil (editor)
Verisign, Inc.
12061 Bluemont Way
Reston, VA 20190
US
Email: eosterweil@verisign.com
Pappas & Osterweil Expires August 26, 2012 [Page 17]