Mean absolute error

In statistics, mean absolute error (MAE) is a measure of difference between two continuous variables. Assume X and Y are variables of paired observations that express the same phenomenon. Examples of Y versus X include comparisons of predicted versus observed, subsequent time versus initial time, and one technique of measurement versus an alternative technique of measurement. Consider a scatter plot of n points, where point i has coordinates (xi, yi)... Mean Absolute Error (MAE) is the average vertical distance between each point and the identity line. MAE is also the average horizontal distance between each point and the identity line.

The Mean Absolute Error is given by:

[1]

It is possible to express MAE as the sum of two components: Quantity Disagreement and Allocation Disagreement. Quantity Disagreement is the absolute value of the Mean Error. Allocation Disagreement is MAE minus Quantity Disagreement. The Mean Error is given by:

[2]

It is also possible to identify the types of difference by looking at an plot. Allocation difference exists if and only if points reside on both sides of the identity line. Quantity difference exists when the average of the X values does not equal the average of the Y values.[2][3]

MAE has a clear interpretation as the average absolute difference between yi and xi. Many researchers want to know this average difference because its interpretation is clear, but researchers frequently compute and misinterpret the Root Mean Squared Error (RMSE), which is not the average absolute error.[1][3]

MAE vs. RMSE

As the name suggests, the mean absolute error is an average of the absolute errors , where is the prediction and the true value. Note that alternative formulations may include relative frequencies as weight factors. The mean absolute error uses the same scale as the data being measured. This is known as a scale-dependent accuracy measure and therefore cannot be used to make comparisons between series using different scales.[4] The mean absolute error is a common measure of forecast error in time series analysis,[5] where the terms "mean absolute deviation" is sometimes used in confusion with the more standard definition of mean absolute deviation. The same confusion exists more generally.

The mean absolute error is one of a number of ways of comparing forecasts with their eventual outcomes. Well-established alternatives are the mean absolute scaled error (MASE) and the mean squared error. These all summarize performance in ways that disregard the direction of over- or under- prediction; a measure that does place emphasis on this is the mean signed difference.

Where a prediction model is to be fitted using a selected performance measure, in the sense that the least squares approach is related to the mean squared error, the equivalent for mean absolute error is least absolute deviations.

MAE is not identical to RMSE, but some researchers report and interpret RMSE as if RMSE reflects the measurement that MAE gives. MAE is conceptually simpler and more interpretable than RMSE. MAE does not require the use of squares or square roots. The use of squared distances hinders the interpretation of RMSE. MAE is simply the average absolute vertical or horizontal distance between each point in a scatter plot and the Y=X line. In other words, MAE is the average absolute difference between X and Y. MAE is fundamentally easier to understand than the square root of the average of the sum of squared deviations. Furthermore, each error contributes to MAE in proportion to the absolute value of the error, which is not true for RMSE.[2] See the example above for an illustration of these differences.

Optimality property

The mean absolute error of a real variable c with respect to the random variable X is

Provided that the probability distribution of X is such that the above expectation exists, then m is a median of X if and only if m is a minimizer of the mean absolute error with respect to X.[6] In particular, m is a sample median if and only if m minimizes the arithmetic mean of the absolute deviations.

More generally, a median is defined as a minimum of

as discussed at Multivariate median (and specifically at Spatial median).

This optimization-based definition of the median is useful in statistical data-analysis, for example, in k-medians clustering.

See also

References

  1. 1 2 Willmott, Cort J.; Matsuura, Kenji (December 19, 2005). "Advantages of the mean absolute error (MAE) over the root mean square error (RMSE) in assessing average model performance". Climate Research. 30: 79–82. doi:10.3354/cr030079.
  2. 1 2 3 Pontius Jr., Robert Gilmore; Thontteh, Olufunmilayo; Chen, Hao (2008). "Components of information for multiple resolution comparison between maps that share a real variable". Environmental and Ecological Statistics. 15: 111–142. doi:10.1007/s10651-007-0043-y.
  3. 1 2 Willmott, C. J.; Matsuura, K. (January 2006). "On the use of dimentioned measures of error to evaluate the performance of spatial interpolators". International Journal of Geographical Information Science. 20: 89–102. doi:10.1080/13658810500286976.
  4. "2.5 Evaluating forecast accuracy | OTexts". www.otexts.org. Retrieved 2016-05-18.
  5. Hyndman, R. and Koehler A. (2005). "Another look at measures of forecast accuracy"
  6. Stroock, Daniel (2011). Probability Theory. Cambridge University Press. p. 43. ISBN 978-0-521-13250-3.
This article is issued from Wikipedia. The text is licensed under Creative Commons - Attribution - Sharealike. Additional terms may apply for the media files.