Internet Engineering Task Force | S. D'Antonio |
Internet-Draft | CINI Consortium/University of Napoli "Parthenope" |
Intended status: Standards Track | T. Zseby |
Expires: November 24, 2011 | Fraunhofer Institute FOKUS |
C. Henke | |
Technische Universitat Berlin | |
L. Peluso | |
University of Napoli | |
May 23, 2011 |
Flow Selection Techniques
draft-ietf-ipfix-flow-selection-tech-06.txt
Flow selection is the process of selecting a subset of flows from all flows observed at an observation point. Flow selection reduces the effort of post-processing flow data and transferring flow records. This document describes motivations for flow selection and presents flow selection techniques. It provides an information model for configuring flow selection techniques and discusses what information about a flow selection process should be exported.
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in RFC 2119 [RFC2119].
This Internet-Draft is submitted in full conformance with the provisions of BCP 78 and BCP 79.
Internet-Drafts are working documents of the Internet Engineering Task Force (IETF). Note that other groups may also distribute working documents as Internet-Drafts. The list of current Internet- Drafts is at http://datatracker.ietf.org/drafts/current/.
Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress."
This Internet-Draft will expire on November 24, 2011.
Copyright (c) 2011 IETF Trust and the persons identified as the document authors. All rights reserved.
This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (http://trustee.ietf.org/license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Code Components extracted from this document must include Simplified BSD License text as described in Section 4.e of the Trust Legal Provisions and are provided without warranty as described in the Simplified BSD License.
This document may contain material from IETF Documents or IETF Contributions published or made publicly available before November 10, 2008. The person(s) controlling the copyright in some of this material may not have granted the IETF Trust the right to allow modifications of such material outside the IETF Standards Process. Without obtaining an adequate license from the person(s) controlling the copyright in such materials, this document may not be modified outside the IETF Standards Process, and derivative works of it may not be created outside the IETF Standards Process, except to format it for publication as an RFC or to translate it into languages other than English.
This document describes flow selection techniques for traffic measurements. A flow is defined as a set of packets with common properties as described in [RFC5101]. Flow selection can be done to limit the resource demands for capturing, storing, exporting and post-processing of flow records. It also can be used to select a particular set of flows that are of interest to a specific application. This document provides a cathegorization of flow selection techniques and describes configuration and reporting parameters for them. In order to be compliant with this document, at least one of proposed flow selection schemes MUST be implemented. That means that the configuration parameters as well as the reporting information elements for this particular scheme MUST be supported.
This document also addresses configuration and reporting parameters for flow-state dependent packet selection as described in [RFC5475], although the technique is categorized as packet selection. The reason is, thta flow-state dependent packet selection techniques often aim at the reduction of resources for flow capturing and flow processing. Furthermore, they were only briefly discussed in [RFC5475]. Therefore we included configuration and reporting considerations for such techniques in this document.
This document is consistent with the terminology introduced in [RFC5101], [RFC5470], [RFC5475] and [RFC3917]. As in [RFC5101] and [RFC5476], the first letter of each IPFIX-specific and PSAMP-specific term is capitalized along with the flow selection specific terms defined here.
* Classification
* Flow Selection Process
* Flow Selection State
* Flow Selector
* Complete Flow
* Flow Filtering
* Flow Sampling
* Aggregation Process
Flow selection differs from packet selection described in [RFC5475]. Packet selection techniques consider packets as basic element and the parent population consists of all packets observed at an observation point. In contrast to this the basic elements in flow selection are the flows. The parent population consists of all observed flows and the selection process operates on the flows. The major characteristics of flow selection are the following:
There are some techniques that are difficult to unambiguously categorize into one of the categories. We here give some guidance how to categorize such techniques:
Packet(s) coming in to Observation Point(s) | | v v +----------------+---------------------------+ +-----+-------+ | Metering Process | | | | | | | | packet header capturing | | | | | |...| Metering | | timestamping | | Process N | | | | | | | packet selection | | | | | | | | | classification | | | | | | | | | flow state dependent packet selection | | | | | | | | | flow selection before aggregation (*) | | | | | | | | | aggregation | | | | | | | | | flow selection after aggregation (*) | | | +--------|-----------------------------------+ +-----|-------+ Flow Records Flow Records | | +----------------------+----------------------+ | +----------------------|-----------------+ | Exporting Process | | | v | | flow selection before export(*) | | | | | v | | flow export | +----------------------+-----------------+ | IPFIX (Flow Records) v +-------------------------|-----------------------+ | IPFIX Mediator | | | v | | Collecting Process(es) | | | | | Intermediate Flow Selection Process (*) | | | | | Exporting Process(es) | +-------------------------|-----------------------+ v IPFIX (*) indicates where flow selection can take place.
Figure 1 shows the IPFIX reference model as defined in [RFC5470], and extends it by introducing the functional components where flow selection can take place.
In the aggregation process the packet information is used to update the flow entries in the flow cache. Flow selection that is applied before aggregation equals a packet selection process. The flow still consists of individual packets. Those are then selected based on the classification information, i.e. based on the flow they belong to. Flow selection before aggregation can be based on the fields of the flow key (also on a hash value over these fields), but not based on characteristics that are only available after aggregation (e.g. flow size, flow duration). Flow selection before aggregation is applied to reduce resources for all succeeding processes (aggregation, exporting process) or select specific flows of interest in case such flow characteristics are already observable at packet level (e.g. flows to specific IPs). In contrast, flow state dependent packet selection is a packet selection method, because it does not necessarily select Complete Flows. Flow selection before aggregation and flow state dependent packet selection can be applied in arbitrary order.
Flow selection after aggregation is usually applied to reduce the flows to those that are of interest to a particular application and to unload flow export and flow postprocessing. Since the flow cache entries are already generated by the aggregation process flow selection after aggregation can also depend on flow characteristics that are only visible after the aggregation of packets, such as flow size and flow duration.
The Exporting Process may implement policies for exporting only a subset of the flow records which have been stored in the system memory. Flow selection in the exporting process may select only the subset of flow records which are of interest to the users application, or select only as many flow records than can be handled by the available resources ( e.g. limited flow cache size and export link capacity).
As shown in Figure 1, flow selection can be performed as an intermediate process within an IPFIX Mediator [RFC6183]. The Intermediate Selection Process takes a flow record stream as its input and retrieves a record stream. The Intermediate Selection Process can again apply a flow selection technique to obtain flows of interest for the application. Further the Intermediate Selection Process can base its selection decision on the correlation of data from different observation points, e.g by only selecting flows that were at least recorded on two observation points.
A flow selection technique selects either all packets or none of a flow, otherwise the technique has to be considered as packet selection. We distinguish between Flow Filtering and Flow Sampling.
Flow Filtering is a deterministic function on the IPFIX flow record content. In case that the relevant flow characteristics are already observable at packet level (e.g. flow keys) Flow Filtering can be applied before aggregation at packet level.
Flow Filtering can be done similarly to Property Match Filtering for packet selection described in [RFC5475]. The difference is that, instead of packet fields, flow record fields are here used to derive the selection decision. Property Match Filtering is typically used to select a specific subset of the flows that are of interest to a particular application (e.g. all flows to a specific destination, all large flows, etc.). Properties on which the filtering is based can be for example flow keys, the flow size in bytes, the number of packets in the flow, the observation time of the first or last packet, or the maximum packet length. The selection criteria can be a specific value or an interval. Property match filtering can be applied before aggregation in case the properties are already observable at the packet level (e.g. flow key fields).
There are content based Property Match filtering techniques that require a compution on the current flow cache. An example is the selection of the k largest flows or a percentage of flows with the longest livetime. This type of Property Match Filtering is also used in flow selection techniques that react on external events (e.g. resource constraint). For example in case the flow cache is full, the flow cache entry with the lowest flow volume per current flow live time is deleted.
Hash-based Flow Filtering uses a Hash Function h to map the flow key c onto a Hash Range R. A flow is selected if the hash value h(c) is within the Hash Selection Range S, which is a subset of R. Hash-based Flow Filtering can be used to emulate a random sampling process but still enable the correlation between selected flow subset at different observation points. Hash-based Flow Filtering is similar to Hash-based Packet Selection, and in fact is identical when Hash-based Packet Selection uses the flow key that define the flow as the Hash Input. Nevertheless there MAY be the incentive to apply Hash-based Flow Selection not on the packet level before aggregation, for example when the size of the Selection Range and therefore the sampling probability is dependent on the number of observed flows.
Flow state dependent filtering does not base the selection decision on fields of the current flow record content but on the flow state which may be kept additionally for each of the flows. External processes may update counters, bounds and timers for each of the flow records and the flow selection process utilises this information for the selection decision. A review of flow state dependent filtering techniques that aim at the selection of the most frequent items by keeping additional flow state information can be found in [CoHa08]. Flow state dependent flow filtering can only be applied after aggregation, when a packet has been assigned to a flow cache. The selection process then decides based upon the flow state for each flow if it is kept in the flow cache or not. Two flow dependent flow filtering techniques are here described:
The Frequent Algorithm [KaPS03] is a technique that aims at the selection of all flows that at least exceed a 1/k fraction of the observed packet stream. The algorithm has only a flow cache of size k-1 and each flow in the cache has an additional counter. The counter is incremented each time a packet belonging to the flow in the flow cache is observed. In case the observed packet does not belong to any flow all counters are decremented and if any of the flow counters have a value of zero the flow is replaced with the new flow.
Lossy Counting is a selection technique that identifies all flows whose packet count exceeds a certain percentage of the whole observed packet stream (e.g. 5% of all packets) with a certain estimation error e. Lossy Counting seperates the observed packet stream in windows of size N=1/e, where N is an amount of consecutive packets. For each observed flow an additional counter will be held in the flow state. The counter is incremented each time a packet belonging to the flow is observed and all counters are decremented at the end of each window and all flows with a counter of zero will be removed from the flow cache.
Flow sampling operates on flow record sequence or arrival times. It can use a systematic or a random functions for the selection process. Flow sampling usually aims at the selection of a representative subset of all flows in order to estimate characteristics of the whole set (e.g. mean flow size in the network).
Systematic sampling is a deterministic selection function. Systematic sampling may be a periodic selection of the k-th flow record which arrives at the exporting or mediator process. Systematic Sampling can also be applied before aggregation. An example would be to use an additional data structure that saves the flow keys of the not selected flows. Then one can create a flow cache entry for the k-th observed packet that has yet no flow cache entry and is not within the data structure containing the not selected flows.
Systematic sampling can also be time-based. Systematic Sampling is applied by only creating flows that are observed between time-based start and stop triggers. The time interval may be applied at packet level or after aggregation level, e.g. by selecting every k seconds a flow arriving at the export process.
Random flow sampling is based on a random process which requires the calculation of random numbers. One can differentiate between n-out-N and probabilistic sampling. The sampling probability of individual flows records MAY be adjusted according to the flow record content or external events like the available export resources. Non-uniform random sampling approaches can be applied similar to the ones defined in [RFC5475]. An example would be to prefer large volume flows over small volume flows. Random flow sampling can also be applied before aggregation when additional flow state about non selected flows is kept.
As explained above Flow-state Dependent Packet Selection is not a Flow Selection Technique but a packet selection. Nevertheless we will describe configuration and reporting parameters for this technique in this document. An example is the the "Sample and Hold" algorithm [EsVa01] that tries to prefer large volume flows in the selection. When a packet arrives it is selected when already a flow cache entry for this packet exists. In case there is no flow cache entry, the packet is selected by a certain probability that is dependent on the packet size.
This section describes the configuration parameters of the flow selection techniques presented above. It provides the basis of an information model to be adopted in order to configure the flow selection process within an IPFIX device. The following table gives an overview of the defined selection techniques, where they can be applied and what are their input parameters. Dependent on where the flow selection techniques are applied different input parameters can be configured.
Overview of Flow Selection Techniques:
Location | Selection Method | Selection Input |
---|---|---|
before aggregation | Flow State Dependent Packet Selection | packet sampling probabilities, flow state, packet properties |
Property Match Flow Filtering | flow key fields, filter function | |
Hash-Based Flow Filtering | selection range, hash function, flow key | |
Time-based Systematic Flow Sampling | flow position (derived from arrival time of packets), flow state | |
Sequence-based Systematic Flow Sampling | flow position (derived from packet position), flow state | |
Random Flow Sampling | random number generator or list and packet position, flow state | |
after aggregation | Property Match Flow Filtering | flow record content, filter function |
Hash-Based Flow Filtering | selection range, hash function, hash input (flow keys and other flow properties) | |
Flow State Dependent Flow Selection | flow state parameters | |
Time-based Systematic Flow Sampling | flow arrival time, flow state | |
Sequence-based Systematic Flow Sampling | flow position, flow state | |
Random Flow Sampling | random number generator or list and flow position, flow state | |
during Exporting Process or in the Mediator | Property Match Flow Filtering | flow record content, filter function |
Hash-Based Flow Filtering | selection range, hash function, flow key | |
Time-based Systematic Flow Sampling | flow record arrival time | |
Sequence-based Systematic Flow Sampling | flow record position | |
Random Flow Sampling | random number generator or list and flow position | |
Flow State Dependent Flow Selection | flow state parameters |
A flow selection configuration consists of FS_SELECTOR_ID, FS_TYPE, FS_SELECTOR PARAMETERS.
FS_SELECTOR ID: Unique ID for the flow sampler
FS_TYPE: Defines which algorithm is used.
FS_SELECTOR_PARAMETERS: Defines the input parameter for the flow selection methods
In this section, we define what elements are needed to describe the most common Flow Filtering techniques.
FS_SELECTOR_ID | FS_TYPE |
---|---|
1 | fs_property_matching |
2 | fs_hashing |
3 | fs_flow_state_dependent_flow_selection |
FS_SELECTOR_PARAMETERS:
case fs_property_matching:
case fs_hashing:
case fs_flow_state_dependent_flow_selection:
The above list of parameters for flow dependent flow selection techniques is suitable for the presented Frequent Item and Lossy Counting Algorithm. Nevertheless there exist a variety of techniques with very specific parameters which are not defined here.
In this section, we define what elements are needed to describe the most common Flow Sampling techniques.
FS_SELECTOR_ID | FS_TYPE |
---|---|
5 | fs_systematic_count-based |
5 | fs_systematic_time-based |
6 | fs_n-out-of-N |
7 | fs_probabilistic |
FS_SELECTOR_PARAMETERS:
case systematic count-based:
case fs_systematic_time-based:
case fs_random n-out-of-N:
case fs_probabilistic:
The configuration of flow dependent packet selection has not been described in [RFC5475] therefore the paramaters are defined here:
SELECTOR_TYPE: flow_dependent_packet_selection
SELECTOR_PARAMETERS:
In this section we describe Information Elements (IEs) that SHOULD be exported by a flow selection process in order to support the interpretation of measurement results from flow measurements where only some flows are selected. The information is mainly used to report how many packets and flows have been observed in total and how many of them where selected. This helps for instance to calculate the attained sampling fraction, which is an important parameter to provide an accuracy statement. The IEs can provide reporting information about flow records, flow cache entries, packets or bytes. The reported metrics are number of total and the number of selected elements. From this the number of dropped elements can be derived. All counters are delta counters and SHOULD be exported and reset when a new measurement interval starts. Additional IEs may be useful for future flow selection techniques. Those can be defined additionally if needed.
List of additional Flow Selection information elements:
ID | Name |
---|---|
TBD1 | fsFlowRecordTotalCount |
TBD2 | fsFlowRecordSelectedCount |
TBD3 | fsCurrentFlowEntries |
TBD4 | fsMaxFlowEntries |
TBD5 | fsFlowEntryTotalCount |
TBD6 | fsFlowEntrySelectedCount |
TBD7 | fsPacketTotalCount |
TBD8 | fsPacketSelectedCount |
TBD9 | fsOctetTotalCount |
TBD10 | fsOctetSelectedCount |
Description:
Abstract Data Type: unsigned64
ElementId: TBD1
Status: Proposed
Units: Flow Records
Description:
Abstract Data Type: unsigned64
ElementId: TBD2
Status: Proposed
Units: Flow Records
Description:
Abstract Data Type: unsigned64
ElementId: TBD3
Status: Proposed
Units: Flow Entries
Description:
Abstract Data Type: unsigned64
ElementId: TBD4
Status: Proposed
Units: Flow Entries
Description:
Abstract Data Type: unsigned64
ElementId: TBD5
Status: Proposed
Units: Flow Entries
Description:
Abstract Data Type: unsigned64
ElementId: TBD6
Status: Proposed
Units: Flow Entries
Description:
Abstract Data Type: unsigned64
ElementId: TBD7
Status: Proposed
Units: Packets
Description:
Abstract Data Type: unsigned64
ElementId: TBD8
Status: Proposed
Units: Packets
Description:
Abstract Data Type: unsigned64
ElementId: TBD9
Status: Proposed
Units: Bytes
Description:
Abstract Data Type: unsigned64
ElementId: TBD10
Status: Proposed
Units: Bytes
This document introduces several new information elements as an extension to the IPFIX information model. Values TBD1-TBD10 in this document should be replaced with the assigned numbers by IANA.
In this section security issues concerning an IPFIX device performing flow selection are pointed out. In case the flow selection function is activated an IPFIX device might be exposed to security threats. Since flow selection implies analysing flow packets, associating them to a specific traffic flow and selecting flow records, a malicious user who was able to gain control of an IPFIX device might access both packet and flow data, thus violating their confidentiality.
Furthermore, the intruder might be attracted by the possibility of altering the flow selection process by modifying the criteria used to select flow records. In this case, the IPFIX device would export flow data which are different from the ones that the Collector expects to receive.
It is apparent that these security threats can be mitigated by authenticating entities that interact with the IPFIX device and keeping information for flow selection configuration confidential.
[RFC2119] | Bradner, S., "Key words for use in RFCs to Indicate Requirement Levels", BCP 14, RFC 2119, March 1997. |