GEOPRIV | M. Thomson |
Internet-Draft | Mozilla |
Updates: 3693,4119,5491 (if approved) | J. Winterbottom |
Intended status: Standards Track | Unaffiliated |
Expires: March 20, 2015 | September 16, 2014 |
Representation of Uncertainty and Confidence in PIDF-LO
draft-ietf-geopriv-uncertainty-03
The key concepts of uncertainty and confidence as they pertain to location information are defined. Methods for the manipulation of location estimates that include uncertainty information are outlined.
This draft normatively updates the definition of location information representations defined in RFC 4119 and RFC 5491. It also deprecates related terminology defined in RFC 3693.
This Internet-Draft is submitted in full conformance with the provisions of BCP 78 and BCP 79.
Internet-Drafts are working documents of the Internet Engineering Task Force (IETF). Note that other groups may also distribute working documents as Internet-Drafts. The list of current Internet-Drafts is at http://datatracker.ietf.org/drafts/current/.
Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress."
This Internet-Draft will expire on March 20, 2015.
Copyright (c) 2014 IETF Trust and the persons identified as the document authors. All rights reserved.
This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (http://trustee.ietf.org/license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Code Components extracted from this document must include Simplified BSD License text as described in Section 4.e of the Trust Legal Provisions and are provided without warranty as described in the Simplified BSD License.
Location information represents an estimation of the position of a Target [RFC6280]. Under ideal circumstances, a location estimate precisely reflects the actual location of the Target. For automated systems that determine location, there are many factors that introduce errors into the measurements that are used to determine location estimates.
The process by which measurements are combined to generate a location estimate is outside of the scope of work within the IETF. However, the results of such a process are carried in IETF data formats and protocols. This document outlines how uncertainty, and its associated datum, confidence, are expressed and interpreted.
This document provides a common nomenclature for discussing uncertainty and confidence as they relate to location information.
This document also provides guidance on how to manage location information that includes uncertainty. Methods for expanding or reducing uncertainty to obtain a required level of confidence are described. Methods for determining the probability that a Target is within a specified region based on its location estimate are described. These methods are simplified by making certain assumptions about the location estimate and are designed to be applicable to location estimates in a relatively small geographic area.
A confidence extension for the Presence Information Data Format - Location Object (PIDF-LO) [RFC4119] is described.
This document describes methods that can be used in combination with automatically determined location information. These are statistically-based methods.
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in [RFC2119].
This document assumes a basic understanding of the principles of mathematics, particularly statistics and geometry.
Some terminology is borrowed from [RFC3693] and [RFC6280], in particular Target.
Mathematical formulae are presented using the following notation: add +, subtract -, multiply *, divide /, power ^ and absolute value |x|. Precedence is indicated using parentheses. Mathematical functions are represented by common abbreviations: square root sqrt(x), sine sin(x), cosine cos(x), inverse cosine acos(x), tangent tan(x), inverse tangent atan(x), two-argument inverse tangent atan2(y,x), error function erf(x), and inverse error function erfinv(x).
Uncertainty results from the limitations of measurement. In measuring any observable quantity, errors from a range of sources affect the result. Uncertainty is a quantification of what is known about the observed quantity, either through the limitations of measurement or through inherent variability of the quantity.
Uncertainty is most completely described by a probability distribution. A probability distribution assigns a probability to possible values for the quantity.
A probability distribution describing a measured quantity can be arbitrarily complex and so it is desirable to find a simplified model. One approach commonly taken is to reduce the probability distribution to a confidence interval. Many alternative models are used in other areas, but study of those is not the focus of this document.
In addition to the central estimate of the observed quantity, a confidence interval is succinctly described by two values: an error range and a confidence. The error range describes an interval and the confidence describes an estimated upper bound on the probability that a "true" value is found within the extents defined by the error.
In the following example, a measurement result for a length is shown as a nominal value with additional information on error range (0.0043 meters) and confidence (95%). x between 1.00312 and 1.01172 meters with 95% probability. No other assertion is made: in particular, this does not assert that x is 1.00742.
This result indicates that the measurement indicates that the value of
Uncertainty and confidence for location estimates can be derived in a number of ways. This document does not attempt to enumerate the many methods for determining uncertainty. [ISO.GUM] and [NIST.TN1297] provide a set of general guidelines for determining and manipulating measurement uncertainty. This document applies that general guidance for consumers of location information.
As a statistical measure, values determined for uncertainty are determined based on information in the aggregate, across numerous individual estimates. An individual estimate might be determined to be "correct" - by using a survey to validate the result, for example - without invalidating the statistical assertion.
This understanding of estimates in the statistical sense explains why asserting a confidence of 100%, which might seem intuitively correct, is rarely advisable.
The Probability Density Function (PDF) that is described by uncertainty indicates the probability that the "true" value lies at any one point. The shape of the probability distribution can vary depending on the method that is used to determine the result. The two probability density functions most generally applicable to location information are considered in this document:
Each of these probability density functions can be characterized by its center point, or mean, and its width. For a normal distribution, uncertainty and confidence together are related to the standard deviation of the function (see Section 5.4). For a rectangular distribution, the half-width of the distribution is used.
Figure 1 shows a normal and rectangular probability density function with the mean (m) and standard deviation (s) labelled. The half-width (h) of the rectangular distribution is also indicated.
***** *** Normal PDF ** : ** --- Rectangular PDF ** : ** ** : ** .---------*---------------*---------. | ** : ** | | ** : ** | | * <-- s -->: * | | * : : : * | | ** : ** | | * : : : * | | * : * | |** : : : **| ** : ** *** | : : : | *** ***** | :<------ h ------>| ***** .****-------+.......:.........:.........:.......+-------*****. m
Figure 1: Normal and Rectangular Probability Density Functions
For a given PDF, the value of the PDF describes the probability that the "true" value is found at that point. Confidence for any given interval is the total probability of the "true" value being in that range, defined as the integral of the PDF over the interval. Figure 2 shows how confidence is determined for a normal distribution. The area of the shaded region gives the confidence (c) for the interval between m-u and m+u.
***** **:::::** **:::::::::** **:::::::::::** *:::::::::::::::* **:::::::::::::::** **:::::::::::::::::** *:::::::::::::::::::::* *:::::::::::::::::::::::* **:::::::::::::::::::::::** *:::::::::::: c ::::::::::::* *:::::::::::::::::::::::::::::* **|:::::::::::::::::::::::::::::|** ** |:::::::::::::::::::::::::::::| ** *** |:::::::::::::::::::::::::::::| *** ***** |:::::::::::::::::::::::::::::| ***** .****..........!:::::::::::::::::::::::::::::!..........*****. | | | (m-u) m (m+u)
Figure 2: Confidence as the Integral of a PDF
In Section 5.4, methods are described for manipulating uncertainty if the shape of the PDF is known.
The terms Precision and Resolution are defined in RFC 3693 [RFC3693]. These definitions were intended to provide a common nomenclature for discussing uncertainty; however, these particular terms have many different uses in other fields and their definitions are not sufficient to avoid confusion about their meaning. These terms are unsuitable for use in relation to quantitative concepts when discussing uncertainty and confidence in relation to location information.
Uncertainty is a quantitative concept. The term accuracy is useful in describing, qualitatively, the general concepts of location information. Accuracy is generally useful when describing qualitative aspects of location estimates. Accuracy is not a suitable term for use in a quantitative context.
For instance, it could be appropriate to say that a location estimate with uncertainty X is more accurate than a location estimate with uncertainty 2X at the same confidence. It is not appropriate to assign a number to "accuracy", nor is it appropriate to refer to any component of uncertainty or confidence as "accuracy". That is, to say that the "accuracy" for the first location estimate is X would be an erroneous use of this term.
A location estimate is the result of location determination. A location estimate is subject to uncertainty like any other observation. However, unlike a simple measure of a one dimensional property like length, a location estimate is specified in two or three dimensions.
Uncertainty in two or three dimensional locations can be described using confidence intervals. The confidence interval for a location estimate in two or three dimensional space is expressed as a subset of that space. This document uses the term region of uncertainty to refer to the area or volume that describes the confidence interval.
Areas or volumes that describe regions of uncertainty can be formed by the combination of two or three one-dimensional ranges, or more complex shapes could be described (for example, the shapes in [RFC5491]).
This document makes a simplifying assumption that the Target of the PIDF-LO occupies just a single point in space. While this is clearly false in virtually all scenarios with any practical application, it is often a reasonable simplifying assumption to make.
To a large extent, whether this simplification is valid depends on the size of the target relative to the size of the uncertainty region. When locating a personal device using contemporary location determination techniques, the space the device occupies relative to the uncertainty is proportionally quite small. Even where that device is used as a proxy for a person, the proportions change little.
This assumption is less useful as uncertainty becomes small relative to the size of the Target of the PIDF-LO (or conversely, as uncertainty becomes small relative to the Target). For instance, describing the location of a football stadium or small country would include a region of uncertainty that is infinitesimally larger than the Target itself. In these cases, much of the guidance in this document is not applicable. Indeed, as the accuracy of location determination technology improves, it could be that the advice this document contains becomes less relevant by the same measure.
A set of shapes suitable for the expression of uncertainty in location estimates in the Presence Information Data Format - Location Object (PIDF-LO) are described in [GeoShape]. These shapes are the recommended form for the representation of uncertainty in PIDF-LO [RFC4119] documents.
The PIDF-LO can contain uncertainty, but does not include an indication of confidence. [RFC5491] defines a fixed value of 95%. Similarly, the PIDF-LO format does not provide an indication of the shape of the PDF. Section 4 defines elements to convey this information in PIDF-LO.
Absence of uncertainty information in a PIDF-LO document does not indicate that there is no uncertainty in the location estimate. Uncertainty might not have been calculated for the estimate, or it may be withheld for privacy purposes.
If the Point shape is used, confidence and uncertainty are unknown; a receiver can either assume a confidence of 0% or infinite uncertainty. The same principle applies on the altitude axis for two-dimension shapes like the Circle.
Automatically determined civic addresses [RFC5139] inherently include uncertainty, based on the area of the most precise element that is specified. In this case, uncertainty is effectively described by the presence or absence of elements. To the recipient of location information, elements that are not present are uncertain.
To apply the concept of uncertainty to civic addresses, it is helpful to unify the conceptual models of civic address with geodetic location information. This is particularly useful when considering civic addresses that are determined using reverse geocoding (that is, the process of translating geodetic information into civic addresses).
In the unified view, a civic address defines a series of (sometimes non-orthogonal) spatial partitions. The first is the implicit partition that identifies the surface of the earth and the space near the surface. The second is the country. Each label that is included in a civic address provides information about a different set of spatial partitions. Some partitions require slight adjustments from a standard interpretation: for instance, a road includes all properties that adjoin the street. Each label might need to be interpreted with other values to provide context.
As a value at each level is interpreted, one or more spatial partitions at that level are selected, and all other partitions of that type are excluded. For non-orthogonal partitions, only the portion of the partition that fits within the existing space is selected. This is what distinguishes King Street in Sydney from King Street in Melbourne. Each defined element selects a partition of space. The resulting location is the intersection of all selected spaces.
The resulting spatial partition can be considered as a region of uncertainty.
Uncertainty in civic addresses can be increased by removing elements. This does not increase confidence unless additional information is used. Similarly, arbitrarily increasing uncertainty in a geodetic location does not increase confidence.
Location information is often measured in two or three dimensions; expressions of uncertainty in one dimension only are rare. The "resolution" parameters in [RFC6225] provide an indication of how many bits of a number are valid, which could be interpreted as an expression of uncertainty in one dimension.
[RFC6225] defines a means for representing uncertainty, but a value for confidence is not specified. A default value of 95% confidence should be assumed for the combination of the uncertainty on each axis. This is consistent with the transformation of those forms into the uncertainty representations from [RFC5491]. That is, the confidence of the resultant rectangular polygon or prism is assumed to be 95%.
On the whole, a fixed definition for confidence is preferable, primarily because it ensures consistency between implementations. Location generators that are aware of this constraint can generate location information at the required confidence. Location recipients are able to make sensible assumptions about the quality of the information that they receive.
In some circumstances - particularly with pre-existing systems - location generators might unable to provide location information with consistent confidence. Existing systems sometimes specify confidence at 38%, 67% or 90%. Existing forms of expressing location information, such as that defined in [TS-3GPP-23_032], contain elements that express the confidence in the result.
The addition of a confidence element provides information that was previously unavailable to recipients of location information. Without this information, a location server or generator that has access to location information with a confidence lower than 95% has two options:
Both of these choices degrade the quality of the information provided.
The addition of a confidence element avoids this problem entirely if a location recipient supports and understands the element. A recipient that does not understand - and hence ignores - the confidence element is in no worse a position than if the location server ignored confidence.
The confidence element MAY be added to the location-info element of the Presence Information Data Format - Location Object (PIDF-LO) [RFC4119] document. This element expresses the confidence in the associated location information as a percentage. A special unknown value is reserved to indicate that confidence is supported, but not known to the Location Generator.
The confidence element optionally includes an attribute that indicates the shape of the probability density function (PDF) of the associated region of uncertainty. Three values are possible: unknown, normal and rectangular.
Indicating a particular PDF only indicates that the distribution approximately fits the given shape based on the methods used to generate the location information. The PDF is normal if there are a large number of small, independent sources of error; rectangular if all points within the area have roughly equal probability of being the actual location of the Target; otherwise, the PDF MUST either be set to unknown or omitted.
If a PIDF-LO does not include the confidence element, the confidence of the location estimate is 95%, as defined in [RFC5491].
A Point shape does not have uncertainty (or it has infinite uncertainty), so confidence is meaningless for a point; therefore, this element MUST be omitted if only a point is provided.
Location generators SHOULD attempt to ensure that confidence is equal in each dimension when generating location information. This restriction, while not always practical, allows for more accurate scaling, if scaling is necessary.
A confidence element MUST be included with all location information that includes uncertainty (that is, all forms other than a point). A special unknown MAY be used if confidence is not known.
The inclusion of confidence that is anything other than 95% presents a potentially difficult usability problem for applications that use location information. Effectively communicating the probability that a location is incorrect to a user can be difficult.
It is inadvisable to simply display locations of any confidence, or to display confidence in a separate or non-obvious fashion. If locations with different confidence levels are displayed such that the distinction is subtle or easy to overlook - such as using fine graduations of color or transparency for graphical uncertainty regions, or displaying uncertainty graphically, but providing confidence as supplementary text - a user could fail to notice a difference in the quality of the location information that might be significant.
Depending on the circumstances, different ways of handling confidence might be appropriate. Section 5 describes techniques that could be appropriate for consumers that use automated processing.
Providing that the full implications of any choice for the application are understood, some amount of automated processing could be appropriate. In a simple example, applications could choose to discard or suppress the display of location information if confidence does not meet a pre-determined threshold.
In settings where there is an opportunity for user training, some of these problems might be mitigated by defining different operational procedures for handling location information at different confidence levels.
This section deals with manipulation of location information that contains uncertainty.
The following rules generally apply when manipulating location information:
Manipulating location estimates that include uncertainty information requires additional complexity in systems. In some cases, systems only operate on definitive values, that is, a single point.
This section describes algorithms for reducing location estimates to a simple form without uncertainty information. Having a consistent means for reducing location estimates allows for interaction between applications that are able to use uncertainty information and those that cannot.
Several different approaches can be taken when reducing a location estimate to a point. Different methods each make a set of assumptions about the properties of the PDF and the selected point; no one method is more "correct" than any other. For any given region of uncertainty, selecting an arbitrary point within the area could be considered valid; however, given the aforementioned problems with point locations, a more rigorous approach is appropriate.
Given a result with a known distribution, selecting the point within the area that has the highest probability is a more rigorous method. Alternatively, a point could be selected that minimizes the overall error; that is, it minimizes the expected value of the difference between the selected point and the "true" value.
If a rectangular distribution is assumed, the centroid of the area or volume minimizes the overall error. Minimizing the error for a normal distribution is mathematically complex. Therefore, this document opts to select the centroid of the region of uncertainty when selecting a point.
For regular shapes, such as Circle, Sphere, Ellipse and Ellipsoid, this approach equates to the center point of the region. For regions of uncertainty that are expressed as regular Polygons and Prisms the center point is also the most appropriate selection.
For the Arc-Band shape and non-regular Polygons and Prisms, selecting the centroid of the area or volume minimizes the overall error. This assumes that the PDF is rectangular.
The centroid of the Arc-Band shape is found along a line that bisects the arc. The centroid can be found at the following distance from the starting point of the arc-band (assuming an arc-band with an inner radius of r, outer radius R, start angle a, and opening angle o): a + (o/2).
This point can be found along the line that bisects the arc; that is, the line at an angle of
Calculating a centroid for the Polygon and Prism shapes is more complex. Polygons that are specified using geodetic coordinates are not necessarily coplanar. For Polygons that are specified without an altitude, choose a value for altitude before attempting this process; an altitude of 0 is acceptable.
The polygon is translated to a new coordinate system that has an x-y plane roughly parallel to the polygon. This enables the elimination of z-axis values and calculating a centroid can be done using only x and y coordinates. This requires that the upward normal for the polygon is known.
To translate the polygon coordinates, apply the process described in Appendix B to find the normal vector N = [Nx,Ny,Nz]. This value should be made a unit vector to ensure that the transformation matrix is a special orthogonal matrix. From this vector, select two vectors that are perpendicular to this vector and combine these into a transformation matrix.
If Nx and Ny are non-zero, the matrices in Figure 3 can be used, given p = sqrt(Nx^2 + Ny^2). More transformations are provided later in this section for cases where Nx or Ny are zero.
[ -Ny/p Nx/p 0 ] [ -Ny/p -Nx*Nz/p Nx ] T = [ -Nx*Nz/p -Ny*Nz/p p ] T' = [ Nx/p -Ny*Nz/p Ny ] [ Nx Ny Nz ] [ 0 p Nz ] (Transform) (Reverse Transform)
Figure 3: Recommended Transformation Matrices
To apply a transform to each point in the polygon, form a matrix from the ECEF coordinates and use matrix multiplication to determine the translated coordinates.
[ -Ny/p Nx/p 0 ] [ x[1] x[2] x[3] ... x[n] ] [ -Nx*Nz/p -Ny*Nz/p p ] * [ y[1] y[2] y[3] ... y[n] ] [ Nx Ny Nz ] [ z[1] z[2] z[3] ... z[n] ] [ x'[1] x'[2] x'[3] ... x'[n] ] = [ y'[1] y'[2] y'[3] ... y'[n] ] [ z'[1] z'[2] z'[3] ... z'[n] ]
Figure 4: Transformation
Alternatively, direct multiplication can be used to achieve the same result:
The first and second rows of this matrix (x' and y') contain the values that are used to calculate the centroid of the polygon. To find the centroid of this polygon, first find the area using: x'[0] == x'[n] and x'[n+1] == x'[1]. Based on the area, the centroid along each axis can be determined by:
For these formulae, treat each set of coordinates as circular, that is
The third row contains a distance from a plane parallel to the polygon. If the polygon is coplanar, then the values for z' are identical; however, the constraints recommended in [RFC5491] mean that this is rarely the case. To determine Cz', average these values:
Once the centroid is known in the transformed coordinates, these can be transformed back to the original coordinate system. The reverse transformation is shown in Figure 5.
[ -Ny/p -Nx*Nz/p Nx ] [ Cx' ] [ Cx ] [ Nx/p -Ny*Nz/p Ny ] * [ Cy' ] = [ Cy ] [ 0 p Nz ] [ sum of z'[i] / n ] [ Cz ]
Figure 5: Reverse Transformation
The reverse transformation can be applied directly as follows:
The ECEF value [Cx,Cy,Cz] can then be converted back to geodetic coordinates. Given a polygon that is defined with no altitude or equal altitudes for each point, the altitude of the result can either be ignored or reset after converting back to a geodetic value.
The centroid of the Prism shape is found by finding the centroid of the base polygon and raising the point by half the height of the prism. This can be added to altitude of the final result; alternatively, this can be added to Cz', which ensures that negative height is correctly applied to polygons that are defined in a "clockwise" direction.
The recommended transforms only apply if Nx and Ny are non-zero. If the normal vector is [0,0,1] (that is, along the z-axis), then no transform is necessary. Similarly, if the normal vector is [0,1,0] or [1,0,0], avoid the transformation and use the x and z coordinates or y and z coordinates (respectively) in the centroid calculation phase. If either Nx or Ny are zero, the alternative transform matrices in Figure 6 can be used. The reverse transform is the transpose of this matrix.
if Nx == 0: | if Ny == 0: [ 0 -Nz Ny ] [ 0 1 0 ] | [ -Nz 0 Nx ] T = [ 1 0 0 ] T' = [ -Nz 0 Ny ] | T = T' = [ 0 1 0 ] [ 0 Ny Nz ] [ Ny 0 Nz ] | [ Nx 0 Nz ]
Figure 6: Alternative Transformation Matrices
The Circle or Sphere are simple shapes that suit a range of applications. A circle or sphere contains fewer units of data to manipulate, which simplifies operations on location estimates.
The simplest method for converting a location estimate to a Circle or Sphere shape is to determine the centroid and then find the longest distance to any point in the region of uncertainty to that point. This distance can be determined based on the shape type:
Once the Circle or Sphere shape is found, the associated confidence can be increased if the result is known to follow a normal distribution. However, this is a complicated process and provides limited benefit. In many cases it also violates the constraint that confidence in each dimension be the same. Confidence should be unchanged when performing this conversion.
Two dimensional shapes are converted to a Circle; three dimensional shapes are converted to a Sphere.
A three-dimensional shape can be easily converted to a two-dimensional shape by removing the altitude component. A sphere becomes a circle; a prism becomes a polygon; an ellipsoid becomes an ellipse. Each conversion is simple, requiring only the removal of those elements relating to altitude.
The altitude is unspecified for a two-dimensional shape and therefore has unlimited uncertainty along the vertical axis. The confidence for the two-dimensional shape is thus higher than the three-dimensional shape. Assuming equal confidence on each axis, the confidence of the circle can be increased using the following approximate formula: C[2d] is the confidence of the two-dimensional shape and C[3d] is the confidence of the three-dimensional shape. For example, a Sphere with a confidence of 95% can be simplified to a Circle of equal radius with confidence of 96.6%.
The combination of uncertainty and confidence provide a great deal of information about the nature of the data that is being measured. If uncertainty, confidence and PDF are known, certain information can be extrapolated. In particular, the uncertainty can be scaled to meet a desired confidence or the confidence for a particular region of uncertainty can be found.
In general, confidence decreases as the region of uncertainty decreases in size and confidence increases as the region of uncertainty increases in size. However, this depends on the PDF; expanding the region of uncertainty for a rectangular distribution has no effect on confidence without additional information. If the region of uncertainty is increased during the process of obfuscation (see [RFC6772]), then the confidence cannot be increased.
A region of uncertainty that is reduced in size always has a lower confidence.
A region of uncertainty that has an unknown PDF shape cannot be reduced in size reliably. The region of uncertainty can be expanded, but only if confidence is not increased.
This section makes the simplifying assumption that location information is symmetrically and evenly distributed in each dimension. This is not necessarily true in practice. If better information is available, alternative methods might produce better results.
Uncertainty that follows a rectangular distribution can only be decreased in size. Increasing uncertainty has no value, since it has no effect on confidence. Since the PDF is constant over the region of uncertainty, the resulting confidence is determined by the following formula: Uo and Ur are the sizes of the original and reduced regions of uncertainty (either the area or the volume of the region); Co and Cr are the confidence values associated with each region.
Where
Information is lost by decreasing the region of uncertainty for a rectangular distribution. Once reduced in size, the uncertainty region cannot subsequently be increased in size.
Uncertainty and confidence can be both increased and decreased for a normal distribution. This calculation depends on the number of dimensions of the uncertainty region.
For a normal distribution, uncertainty and confidence are related to the standard deviation of the function. The following function defines the relationship between standard deviation, uncertainty, and confidence along a single axis: S[x] is the standard deviation, U[x] is the uncertainty, and C[x] is the confidence along a single axis. erfinv is the inverse error function.
Where
Scaling a normal distribution in two dimensions requires several assumptions. Firstly, it is assumed that the distribution along each axis is independent. Secondly, the confidence for each axis is assumed to be the same. Therefore, the confidence along each axis can be assumed to be: C[x] is the confidence along a single axis and Co is the overall confidence and n is the number of dimensions in the uncertainty.
Where
Therefore, to find the uncertainty for each axis at a desired confidence, Cd, apply the following formula:
For regular shapes, this formula can be applied as a scaling factor in each dimension to reach a required confidence.
A number of applications require that a judgment be made about whether a Target is within a given region of interest. Given a location estimate with uncertainty, this judgment can be difficult. A location estimate represents a probability distribution, and the true location of the Target cannot be definitively known. Therefore, the judgment relies on determining the probability that the Target is within the region.
The probability that the Target is within a particular region is found by integrating the PDF over the region. For a normal distribution, there are no analytical methods that can be used to determine the integral of the two or three dimensional PDF over an arbitrary region. The complexity of numerical methods is also too great to be useful in many applications; for example, finding the integral of the PDF in two or three dimensions across the overlap between the uncertainty region and the target region. If the PDF is unknown, no determination can be made without a simplifying assumption.
When judging whether a location is within a given region, this document assumes that uncertainties are rectangular. This introduces errors, but simplifies the calculations significantly. Prior to applying this assumption, confidence should be scaled to 95%.
Given the assumption of a rectangular distribution, the probability that a Target is found within a given region is found by first finding the area (or volume) of overlap between the uncertainty region and the region of interest. This is multiplied by the confidence of the location estimate to determine the probability. Figure 7 shows an example of finding the area of overlap between the region of uncertainty and the region of interest.
_.-""""-._ .' `. _ Region of / \ / Uncertainty ..+-"""--.. | .-' | :::::: `-. | ,' | :: Ao ::: `. | / \ :::::::::: \ / / `._ :::::: _.X | `-....-' | | | | | \ / `. .' \_ Region of `._ _.' Interest `--..___..--'
Figure 7: Area of Overlap Between Two Circular Regions
Once the area of overlap, Ao, is known, the probability that the Target is within the region of interest, Pi, is: Au and the confidence is Co.
Given that the area of the region of uncertainty is
This probability is often input to a decision process that has a limited set of outcomes; therefore, a threshold value needs to be selected. Depending on the application, different threshold probabilities might be selected. In the absence of specific recommendations, this document suggests that the probability be greater than 50% before a decision is made. If the decision process selects between two or more regions, as is required by [RFC5222], then the region with the highest probability can be selected.
Determining the area of overlap between two arbitrary shapes is a non-trivial process. Reducing areas to circles (see Section 5.2) enables the application of the following process.
Given the radius of the first circle r, the radius of the second circle R and the distance between their center points d, the following set of formulas provide the area of overlap Ao. d can be determined by converting the center points to Cartesian coordinates and calculating the distance between the two center points:
A value for
A calculation of overlap based on polygons can give better results than the circle-based method. However, efficient calculation of overlapping area is non-trivial. Algorithms such as Vatti's clipping algorithm [Vatti92] can be used.
For large polygonal areas, it might be that geodesic interpolation is used. In these cases, altitude is also frequently omitted in describing the polygon. For such shapes, a planar projection can still give a good approximation of the area of overlap if the larger area polygon is projected onto the local tangent plane of the smaller. This is only possible if the only area of interest is that contained within the smaller polygon. Where the entire area of the larger polygon is of interest, geodesic interpolation is necessary.
This section presents some examples of how to apply the methods described in Section 5.
Alice receives a location estimate from her LIS that contains an ellipsoidal region of uncertainty. This information is provided at 19% confidence with a normal PDF. A PIDF-LO extract for this information is shown in Figure 8.
<gp:geopriv> <gp:location-info> <gs:Ellipsoid srsName="urn:ogc:def:crs:EPSG::4979"> <gml:pos>-34.407242 150.882518 34</gml:pos> <gs:semiMajorAxis uom="urn:ogc:def:uom:EPSG::9001"> 7.7156 </gs:semiMajorAxis> <gs:semiMinorAxis uom="urn:ogc:def:uom:EPSG::9001"> 3.31 </gs:semiMinorAxis> <gs:verticalAxis uom="urn:ogc:def:uom:EPSG::9001"> 28.7 </gs:verticalAxis> <gs:orientation uom="urn:ogc:def:uom:EPSG::9102"> 43 </gs:orientation> </gs:Ellipsoid> <con:confidence pdf="normal">95</con:confidence> </gp:location-info> <gp:usage-rules/> </gp:geopriv>
Figure 8
This information can be reduced to a point simply by extracting the center point, that is [-34.407242, 150.882518, 34].
If some limited uncertainty were required, the estimate could be converted into a circle or sphere. To convert to a sphere, the radius is the largest of the semi-major, semi-minor and vertical axes; in this case, 28.7 meters.
However, if only a circle is required, the altitude can be dropped as can the altitude uncertainty (the vertical axis of the ellipsoid), resulting in a circle at [-34.407242, 150.882518] of radius 7.7156 meters.
Bob receives a location estimate with a Polygon shape (which roughly corresponds to the location of the Sydney Opera House). This information is shown in Figure 9.
<gml:Polygon srsName="urn:ogc:def:crs:EPSG::4326"> <gml:exterior> <gml:LinearRing> <gml:posList> -33.856625 151.215906 -33.856299 151.215343 -33.856326 151.214731 -33.857533 151.214495 -33.857720 151.214613 -33.857369 151.215375 -33.856625 151.215906 </gml:posList> </gml:LinearRing> </gml:exterior> </gml:Polygon>
Figure 9
To convert this to a polygon, each point is firstly assigned an altitude of zero and converted to ECEF coordinates (see Appendix A). Then a normal vector for this polygon is found (see Appendix B). The result of each of these stages is shown in Figure 10. Note that the numbers shown in this document are rounded only for formatting reasons; the actual calculations do not include rounding, which would generate significant errors in the final values.
Polygon in ECEF coordinate space (repeated point omitted and transposed to fit): [ -4.6470e+06 2.5530e+06 -3.5333e+06 ] [ -4.6470e+06 2.5531e+06 -3.5332e+06 ] pecef = [ -4.6470e+06 2.5531e+06 -3.5332e+06 ] [ -4.6469e+06 2.5531e+06 -3.5333e+06 ] [ -4.6469e+06 2.5531e+06 -3.5334e+06 ] [ -4.6469e+06 2.5531e+06 -3.5333e+06 ] Normal Vector: n = [ -0.72782 0.39987 -0.55712 ] Transformation Matrix: [ -0.48152 -0.87643 0.00000 ] t = [ -0.48828 0.26827 0.83043 ] [ -0.72782 0.39987 -0.55712 ] Transformed Coordinates: [ 8.3206e+01 1.9809e+04 6.3715e+06 ] [ 3.1107e+01 1.9845e+04 6.3715e+06 ] pecef' = [ -2.5528e+01 1.9842e+04 6.3715e+06 ] [ -4.7367e+01 1.9708e+04 6.3715e+06 ] [ -3.6447e+01 1.9687e+04 6.3715e+06 ] [ 3.4068e+01 1.9726e+04 6.3715e+06 ] Two dimensional polygon area: A = 12600 m^2 Two-dimensional polygon centroid: C' = [ 8.8184e+00 1.9775e+04 ] Average of pecef' z coordinates: 6.3715e+06 Reverse Transformation Matrix: [ -0.48152 -0.48828 -0.72782 ] t' = [ -0.87643 0.26827 0.39987 ] [ 0.00000 0.83043 -0.55712 ] Polygon centroid (ECEF): C = [ -4.6470e+06 2.5531e+06 -3.5333e+06 ] Polygon centroid (Geo): Cg = [ -33.856926 151.215102 -4.9537e-04 ]
Figure 10
The point conversion for the polygon uses the final result, Cg, ignoring the altitude since the original shape did not include altitude.
To convert this to a circle, take the maximum distance in ECEF coordinates from the center point to each of the points. This results in a radius of 99.1 meters. Confidence is unchanged.
Assume that confidence is known to be 19% for Alice's location information. This is a typical value for a three-dimensional ellipsoid uncertainty of normal distribution where the standard deviation is used directly for uncertainty in each dimension. The confidence associated with Alice's location estimate is quite low for many applications. Since the estimate is known to follow a normal distribution, the method in Section 5.4.2 can be used. Each axis can be scaled by:
Ensuring that rounding always increases uncertainty, the location estimate at 95% includes a semi-major axis of 23.1, a semi-minor axis of 10 and a vertical axis of 86.
Bob's location estimate (from the previous example) covers an area of approximately 12600 square meters. If the estimate follows a rectangular distribution, the region of uncertainty can be reduced in size. Here we find the confidence that Bob is within the smaller area of the Concert Hall. For the Concert Hall, the polygon [-33.856473, 151.215257; -33.856322, 151.214973; -33.856424, 151.21471; -33.857248, 151.214753; -33.857413, 151.214941; -33.857311, 151.215128] is used. To use this new region of uncertainty, find its area using the same translation method described in Section 5.1.1.2, which produces 4566.2 square meters. Given that the Concert Hall is entirely within Bob's original location estimate, the confidence associated with the smaller area is therefore 95% * 4566.2 / 12600 = 34%.
Suppose that a circular area is defined centered at [-33.872754, 151.20683] with a radius of 1950 meters. To determine whether Bob is found within this area - given that Bob is at [-34.407242, 150.882518] with an uncertainty radius 7.7156 meters - we apply the method in Section 5.5. Using the converted Circle shape for Bob's location, the distance between these points is found to be 1915.26 meters. The area of overlap between Bob's location estimate and the region of interest is therefore 2209 square meters and the area of Bob's location estimate is 30853 square meters. This gives the estimated probability that Bob is less than 1950 meters from the selected point as 67.8%.
Note that if 1920 meters were chosen for the distance from the selected point, the area of overlap is only 16196 square meters and the confidence is 49.8%. Therefore, it is marginally more likely that Bob is outside the region of interest, despite the center point of his location estimate being within the region.
The PIDF-LO document in Figure 11 includes a representation of uncertainty as a circular area. The confidence element (on the line marked with a comment) indicates that the confidence is 67% and that it follows a normal distribution.
<pidf:presence xmlns:pidf="urn:ietf:params:xml:ns:pidf" xmlns:dm="urn:ietf:params:xml:ns:pidf:data-model" xmlns:gp="urn:ietf:params:xml:ns:pidf:geopriv10" xmlns:gs="http://www.opengis.net/pidflo/1.0" xmlns:gml="http://www.opengis.net/gml" xmlns:con="urn:ietf:params:xml:ns:pidf:geopriv:conf" entity="pres:alice@example.com"> <dm:device id="sg89ab"> <pidf:status> <gp:geopriv> <gp:location-info> <gs:Circle srsName="urn:ogc:def:crs:EPSG::4326"> <gml:pos>42.5463 -73.2512</gml:pos> <gs:radius uom="urn:ogc:def:uom:EPSG::9001"> 850.24 </gs:radius> </gs:Circle> <!-- c --> <con:confidence pdf="normal">67</con:confidence> </gp:location-info> <gp:usage-rules/> </gp:geopriv> </pidf:status> <dm:deviceID>mac:010203040506</dm:deviceID> </dm:device> </pidf:presence>
Figure 11: Example PIDF-LO with Confidence
<?xml version="1.0"?> <xs:schema xmlns:conf="urn:ietf:params:xml:ns:geopriv:conf" xmlns:xs="http://www.w3.org/2001/XMLSchema" targetNamespace="urn:ietf:params:xml:ns:geopriv:conf" elementFormDefault="qualified" attributeFormDefault="unqualified"> <xs:annotation> <xs:appinfo source="urn:ietf:params:xml:schema:geopriv:conf"> PIDF-LO Confidence </xs:appinfo> <xs:documentation source="http://www.ietf.org/rfc/rfcXXXX.txt"> <!-- [[NOTE TO RFC-EDITOR: Please replace above URL with URL of published RFC and remove this note.]] --> This schema defines an element that is used for indicating confidence in PIDF-LO documents. </xs:documentation> </xs:annotation> <xs:element name="confidence" type="conf:confidenceType"/> <xs:complexType name="confidenceType"> <xs:simpleContent> <xs:extension base="conf:confidenceBase"> <xs:attribute name="pdf" type="conf:pdfType" default="unknown"/> </xs:extension> </xs:simpleContent> </xs:complexType> <xs:simpleType name="confidenceBase"> <xs:union> <xs:simpleType> <xs:restriction base="xs:decimal"> <xs:minExclusive value="0.0"/> <xs:maxExclusive value="100.0"/> </xs:restriction> </xs:simpleType> <xs:simpleType> <xs:restriction base="xs:token"> <xs:enumeration value="unknown"/> </xs:restriction> </xs:simpleType> </xs:union> </xs:simpleType> <xs:simpleType name="pdfType"> <xs:restriction base="xs:token"> <xs:enumeration value="unknown"/> <xs:enumeration value="normal"/> <xs:enumeration value="rectangular"/> </xs:restriction> </xs:simpleType> </xs:schema>
BEGIN <?xml version="1.0"?> <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd"> <html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en"> <head> <title>PIDF-LO Confidence Attribute</title> </head> <body> <h1>Namespace for PIDF-LO Confidence Attribute</h1> <h2>urn:ietf:params:xml:ns:geopriv:conf</h2> [[NOTE TO IANA/RFC-EDITOR: Please update RFC URL and replace XXXX with the RFC number for this specification.]] <p>See <a href="[[RFC URL]]">RFCXXXX</a>.</p> </body> </html> END
This section registers a new XML namespace, urn:ietf:params:xml:ns:geopriv:conf, as per the guidelines in [RFC3688].
This section registers an XML schema as per the guidelines in [RFC3688].
This document describes methods for managing and manipulating uncertainty in location. No specific security concerns arise from most of the information provided. The considerations of [RFC4119] all apply.
Providing uncertainty and confidence information can reveal information about the process by which location information is generated. For instance, it might reveal information that could be used to infer that a user is using a mobile device with a GPS, or that a user is acquiring location information from a particular network-based service. A Rule Maker might choose to remove uncertainty-related fields from a location object in order to protect this information; though it is noted that this information might not be perfectly protected due to difficulties associated with location obfuscation, as described in Section 13.5 of [RFC6772].
Adding confidence to location information risks misinterpretation by consumers of location that do not understand the element. This could be exploited, particularly when reducing confidence, since the resulting uncertainty region might include locations that are less likely to contain the target than the recipient expects. Since this sort of error is always a possibility, the impact of this is low.
Peter Rhodes provided assistance with some of the mathematical groundwork on this document. Dan Cornford provided a detailed review and many terminology corrections.
The process of conversion from geodetic (latitude, longitude and altitude) to earth-centered, earth-fixed (ECEF) Cartesian coordinates is relatively simple.
In this section, the following constants and derived values are used from the definition of WGS84 [WGS84]:
To convert geodetic coordinates (latitude, longitude, altitude) to ECEF coordinates (X, Y, Z), use the following relationships:
The reverse conversion requires more complex computation and most methods introduce some error in latitude and altitude. A range of techniques are described in [Convert]. A variant on the method originally proposed by Bowring, which results in an acceptably small error, is described by the following: p < 1, the value for altitude that this method produces is unstable. A simpler method for determining the altitude of a point near the poles is:
If the point is near the poles, that is
For a polygon that is guaranteed to be convex and coplanar, the upward normal can be found by finding the vector cross product of adjacent edges.
For more general cases the Newell method of approximation described in [Sunday02] may be applied. In particular, this method can be used if the points are only approximately coplanar, and for non-convex polygons.
This process requires a Cartesian coordinate system. Therefore, convert the geodetic coordinates of the polygon to Cartesian, ECEF coordinates [ecef2geo]. If no altitude is specified, assume an altitude of zero.
This method can be condensed to the following set of equations: (x[1], y[1], z[1]) through (x[n], y[n], x[n]). Each array is treated as circular, that is, x[0] == x[n] and x[n+1] == x[1].
For these formulae, the polygon is made of points
To translate this into a unit-vector; divide each component by the length of the vector:
RFC 5491 [RFC5491] stipulates that polygons be presented in anti-clockwise direction so that the upward normal is in an upward direction. Accidental reversal of points can invert this vector. This error can be hard to detect just by looking at the series of coordinates that form the polygon.
Calculate the dot product of the upward normal of the polygon [polygonupnormal] and any vector that points away from the center of the Earth from the location of polygon. If this product is positive, then the polygon upward normal also points away from the center of the Earth.
A unit vector for the upward direction at any point can be found based on the latitude (lat) and longitude (lng) of the point, as follows:
For polygons that span less than half the globe, any point in the polygon - including the centroid - can be selected to generate an approximate up vector for comparison with the upward normal.