Domain Name System Operations | A. Dulaunoy |
Internet-Draft | CIRCL |
Intended status: Informational | A. Kaplan |
Expires: December 15, 2017 | CERT.at |
P. Vixie | |
H. Stern | |
Farsight Security, Inc. | |
June 13, 2017 |
Passive DNS - Common Output Format
draft-dulaunoy-dnsop-passive-dns-cof-03
This document describes a common output format of Passive DNS Servers which clients can query. The output format description includes also in addition a common semantic for each Passive DNS system. By having multiple Passive DNS Systems adhere to the same output format for queries, users of multiple Passive DNS servers will be able to combine result sets easily.
This Internet-Draft is submitted in full conformance with the provisions of BCP 78 and BCP 79.
Internet-Drafts are working documents of the Internet Engineering Task Force (IETF). Note that other groups may also distribute working documents as Internet-Drafts. The list of current Internet-Drafts is at http://datatracker.ietf.org/drafts/current/.
Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress."
This Internet-Draft will expire on December 15, 2017.
Copyright (c) 2017 IETF Trust and the persons identified as the document authors. All rights reserved.
This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (http://trustee.ietf.org/license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Code Components extracted from this document must include Simplified BSD License text as described in Section 4.e of the Trust Legal Provisions and are provided without warranty as described in the Simplified BSD License.
Passive DNS is a technique described by Florian Weimer in 2005 in Passive DNS replication, F Weimer - 17th Annual FIRST Conference on Computer Security. Since then multiple Passive DNS implementations were created and evolved over time. Users of these Passive DNS servers may query a server (often via WHOIS or HTTP REST), parse the results and process them in other applications.
There are multiple implementations of Passive DNS software. Users of passive DNS query each implementation and aggregate the results for their search. This document describes the output format of four Passive DNS Systems ([DNSDB], [PDNSCERTAT], [PDNSCIRCL] and [PDNSCOF]) which are in use today and which already share a nearly identical output format. As the format and the meaning of output fields from each Passive DNS need to be consistent, we propose in this document a solution to commonly name each field along with their corresponding interpretation. The format follows a simple key-value structure in JSON format. The benefit of having a consistent Passive DNS output format is that multiple client implementations can query different servers without having to have a separate parser for each individual server. passivedns-client currently implements multiple parsers due to a lack of standardization. The document does not describe the protocol (e.g. WHOIS, HTTP REST) nor the query format used to query the Passive DNS. Neither does this document describe "pre-recursor" Passive DNS Systems. Both of these are separate topics and deserve their own RFC document.
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in RFC 2119.
As a Passive DNS servers can include protection mechanisms for their operation, results might be different due to those protection measures. These mechanisms filter out DNS answers if they fail some criteria. The bailiwick algorithm protects the Passive DNS Database from cache poisoning attacks. Another limitation that clients querying the database need to be aware of is that each query simply gets a snapshot-answer of the time of querying. Clients MUST NOT rely on consistent answers. Nor must they assume that answers must be identical across multiple Passive DNS Servers.
The formatting of the answer follows the JSON format. In fact, it is a subset of the full JSON language. Notable differences are the modified definition of whitespace ("ws"). The order of the fields is not significant for the same resource type.
The intent of this output format is to be easily parsable by scripts. Each JSON object is expressed on a single line to be processed by the client line-by-line. Every implementation MUST support the JSON output format.
Examples of JSON output are in the appendix.
Formal grammar as defined in ABNF
answer = entries entries = * ( entry CR) entry = "{" keyvallist "}" keyvallist = [ member *( value-separator member ) ] member = qm field qm name-separator value name-separator = ws %x3A ws ; a ":" colon value = value ; as defined in the JSON RFC value-separator = ws %x2C ws ; , comma. As defined in JSON field = "rrname" | "rrtype" | "rdata" | "time_first" | "time_last" | "count" | "bailiwick" | "sensor_id" | "zone_time_first" | "zone_time_last" | "origin" | futureField futureField = string CR = %x0D qm = %x22 ; " a quotation mark ws = *( %x20 | ; Space %x09 ; Horizontal tab )
Note that value is defined in JSON and has the exact same specification as there. The same goes for the definition of string.
Implementation MUST support all the mandatory fields.
Uniqueness property: the tuple (rrname,rrtype,rdata) will always be unique within one answer per server. While rrname and rrtype are always individual JSON primitive types (strings, numbers, booleans or null), rdata MAY return multiple resource records or a single record. When multiple resource records are returned, rdata MUST be a JSON array. In the case of a single resource record is returned, rdata MUST be a JSON string.
This field returns the name of the queried resource.
This field returns the resource record type as seen by the passive DNS. The key is rrtype and the value is in the interpreted record type represented as a JSON string. If the value cannot be interpreted the decimal value is returned following the principle of transparency as described in RFC 3597. Then the decimal value is represented as a JSON number. The resource record type can be any values as described by IANA in the DNS parameters document in the section 'Resource Record (RR) TYPEs' (http://www.iana.org/assignments/dns-parameters). Currently known and supported textual descriptions of rrtypes are: A, AAAA, CNAME, PTR, SOA, TXT, DNAME, NS, SRV, RP, NAPTR, HINFO, A6. A client MUST be able to understand these textual rrtype values represented as a JSON string. In addition, a client MUST be able to handle a decimal value (as mentioned above) as answer represented as a JSON number.
This field returns the resource records of the queried resource. When multiple resource records are returned, rdata MUST be a JSON array. In the case of a single resource record is returned, rdata MUST be a JSON string. Each resource record is represented as a JSON string. Each resource record MUST be escaped as defined in section 2.6 of RFC4627. Depending on the rrtype, this can be an IPv4 or IPv6 address, a domain name (as in the case of CNAMEs), an SPF record, etc. A client MUST be able to interpret any value which is legal as the right hand side in a DNS master file RFC 1035 and RFC 1034. If the rdata came from an unknown DNS resource records, the server must follow the transparency principle as described in RFC 3597.
This field returns the first time that the record / unique tuple (rrname, rrtype, rdata) has been seen by the passive DNS. The date is expressed in seconds (decimal) since 1st of January 1970 (Unix timestamp). The time zone MUST be UTC. This field is represented as a JSON number.
This field returns the last time that the unique tuple (rrname, rrtype, rdata) record has been seen by the passive DNS. The date is expressed in seconds (decimal) since 1st of January 1970 (Unix timestamp). The time zone MUST be UTC. This field is represented as a JSON number.
Implementations SHOULD support one or more fields.
Specifies how many authoritative DNS answers were received at the Passive DNS Server's collectors with exactly the given set of values as answers (i.e. same data in the answer set - compare with the uniqueness property in "Mandatory Fields"). The number of requests is expressed as a decimal value. This field is represented as a JSON number.
The bailiwick is the best estimate of the apex of the zone where this data is authoritative.
Implementations MAY support the following fields:
This field returns the sensor information where the record was seen. It is represented as a JSON string.
If the data originate from sensors or probes which are part of a publicly-known gathering or measurement system (e.g. RIPE Atlas), a JSON string SHOULD be prefixed.
This field returns the first time that the unique tuple (rrname, rrtype, rdata) record has been seen via master file import. The date is expressed in seconds (decimal) since 1st of January 1970 (Unix timestamp). The time zone MUST be UTC. This field is represented as a JSON number.
This field returns the last time that the unique tuple (rrname, rrtype, rdata) record has been seen via master file import. The date is expressed in seconds (decimal) since 1st of January 1970 (Unix timestamp). The time zone MUST be UTC. This field is represented as a JSON number.
Specifies the resource origin of the Passive DNS response. This field is represented as a Uniform Resource Identifier (URI).
In accordance with [RFC6648], designers of new passive DNS applications that would need additional fields can request and register new field name at https://github.com/adulau/pdns-qof/wiki/Additional-Fields.
Thanks to the Passive DNS developers who contributed to the document.
This memo includes no request to IANA.
Passive DNS Servers capture DNS answers from multiple collecting points ("sensors") which are located on the Internet-facing side of DNS recursors ("post-recursor passive DNS"). In this process, they intentionally omit the source IP, source port, destination IP and destination port from the captured packets. Since the data is captured "post-recursor", the timing information (who queries what) is lost, since the recursor will cache the results. Furthermore, since multiple sensors feed into a passive DNS server, the resulting data gets mixed together, reducing the likelihood that Passive DNS Servers are able to find out much about the actual person querying the DNS records nor who actually sent the query. In this sense, passive DNS Servers are similar to keeping an archive of all previous phone books - if public DNS records can be compared to phone numbers - as they often are. Nevertheless, the authors strongly encourage Passive DNS implementors to take special care of privacy issues. bortzmeyer-dnsop-dns-privacy is an excellent starting point for this. Finally, the overall recommendations in RFC6973 should be taken into consideration when designing any application which uses Passive DNS data.
In some cases, Passive DNS output might contain confidential information and its access might be restricted. When a user is querying multiple Passive DNS and aggregating the data, the sensitivity of the data must be considered.
[BAILIWICK] | "Passive DNS Hardening", 2010. |
[CACHEPOISONING] | "Black ops 2008: It’s the end of the cache as we know it.", 2008. |
[DNSDB] | "DNSDB API", 2013. |
[PDNSCERTAT] | "pDNS presentation at 4th Centr R&D workshop Frankfurt Jun 5th 2012", 2012. |
[PDNSCIRCL] | "CIRCL Passive DNS", 2012. |
[PDNSCLIENT] | "Queries 5 major Passive DNS databases: BFK, CERTEE, DNSParse, ISC, and VirusTotal.", 2013. |
[PDNSCOF] | "Passive DNS server interface using the common output format", 2013. |
[REST] | "Representational State Transfer (REST)", 2000. |
[WEIMERPDNS] | "Passive DNS Replication", 2005. |
[I-D.narten-iana-considerations-rfc2434bis] | Narten, T. and H. Alvestrand, "Guidelines for Writing an IANA Considerations Section in RFCs", Internet-Draft draft-narten-iana-considerations-rfc2434bis-09, March 2008. |
[RFC3552] | Rescorla, E. and B. Korver, "Guidelines for Writing RFC Text on Security Considerations", BCP 72, RFC 3552, DOI 10.17487/RFC3552, July 2003. |
The JSON output are represented on multiple lines for readability but each JSON object should on a single line.
If you query a passive DNS for the rrname www.ietf.org, the passive dns common output format can be:
{"count": 102, "time_first": 1298412391, "rrtype": "AAAA", "rrname": "www.ietf.org", "rdata": "2001:1890:1112:1::20", "time_last": 1302506851} {"count": 59, "time_first": 1384865833, "rrtype": "A", "rrname": "www.ietf.org", "rdata": "4.31.198.44", "time_last": 1389022219}
If you query a passive DNS for the rrname ietf.org, the passive dns common output format can be:
{"count": 109877, "time_first": 1298398002, "rrtype": "NS", "rrname": "ietf.org", "rdata": "ns1.yyz1.afilias-nst.info", "time_last": 1389095375} {"count": 4, "time_first": 1298495035, "rrtype": "A", "rrname": "ietf.org", "rdata": "64.170.98.32", "time_last": 1298495035} {"count": 9, "time_first": 1317037550, "rrtype": "AAAA", "rrname": "ietf.org", "rdata": "2001:1890:123a::1:1e", "time_last": 1330209752}
Please note that in the examples above, any backslashes "\" can be ignored and are an artefact of the tools which produced this document.