Horvitz–Thompson estimator

In statistics, the Horvitz–Thompson estimator, named after Daniel G. Horvitz and Donovan J. Thompson,^[1] is a method for estimating the total^[2] and mean of a superpopulation in a stratified sample. Inverse probability weighting is applied to account for different proportions of observations within strata in a target population. The Horvitz–Thompson estimator is frequently applied in survey analyses and can be used to account for missing data.

The method

Formally, let $Y_{i},i=1,2,\ldots ,n$ be an independent sample from n of N ≥ n distinct strata with a common mean μ. Suppose further that $\pi _{i}$ is the inclusion probability that a randomly sampled individual in a superpopulation belongs to the ith stratum. The Horvitz–Thompson estimate of the total is given by:

{\hat {Y}}_{{HT}}=\sum _{{i=1}}^{n}\pi _{i}^{{-1}}Y_{i},

and the estimate of the mean is given by:

{\hat {\mu }}_{{HT}}=N^{{-1}}{\hat {Y}}_{{HT}}=N^{{-1}}\sum _{{i=1}}^{n}\pi _{i}^{{-1}}Y_{i}.

In a Bayesian probabilistic framework $\pi _{i}$ is considered the proportion of individuals in a target population belonging to the ith stratum. Hence, $\pi _{i}^{{-1}}Y_{i}$ could be thought of as an estimate of the complete sample of persons within the ith stratum. The Horvitz–Thompson estimator can also be expressed as the limit of a weighted bootstrap resampling estimate of the mean. It can also be viewed as a special case of multiple imputation approaches.^[3]

For post-stratified study designs, estimation of $\pi$ and $\mu$ are done in distinct steps. In such cases, computating the variance of ${\hat {\mu }}_{{HT}}$ is not straightforward. Resampling techniques such as the bootstrap or the jackknife can be applied to gain consistent estimates of the variance of the Horvitz–Thompson estimator.^[4] The "survey" package for R conducts analyses for post-stratified data using the Horvitz–Thompson estimator.^[5]

Proof of Horvitz-Thompson Unbiased Estimation of the Mean

The Horvitz–Thompson estimator can be shown to be unbiased when evaluating the expectation of the Horvitz–Thompson estimator, $\mathbf {E} {\bar {X}}_{n}^{HT}$ , as follows:

\mathbf {E} {\bar {X}}_{n}^{HT}=\mathbf {E} {\frac {1}{N}}\sum _{i=1}^{n}{\frac {\mathbf {X} _{I_{i}}}{\pi _{I_{i}}}}

=\mathbf {E} {\frac {1}{N}}\sum _{i=1}^{N}{\frac {X_{i}}{\pi _{i}}}1_{i\in D_{n}}

=\sum _{b=1}^{B}P(D_{n}^{(b)})[{\frac {1}{N}}\sum _{i=1}^{N}{\frac {X_{i}}{\pi _{i}}}1_{i\in D_{n}^{(b)}}]

={\frac {1}{N}}\sum _{i=1}^{N}{\frac {X_{i}}{\pi _{i}}}\sum _{b=1}^{B}1_{i\in D_{n}^{(b)}}P(D_{n}^{(b)})

={\frac {1}{N}}\sum _{i=1}^{N}({\frac {X_{i}}{\pi _{i}}})\pi _{i}

={\frac {1}{N}}\sum _{i=1}^{N}X_{i}

where~D_{n}=\{x_{1},x_{2},...,x_{n}\}

References

↑ Horvitz, D. G.; Thompson, D. J. (1952) "A generalization of sampling without replacement from a finite universe", Journal of the American Statistical Association, 47, 663–685, . JSTOR 2280784
↑ William G. Cochran (1977), Sampling Techniques, 3rd Edition, Wiley. ISBN 0-471-16240-X
↑ Roderick J.A. Little, Donald B. Rubin (2002) Statistical Analysis With Missing Data, 2nd ed., Wiley. ISBN 0-471-18386-5
↑ Quatember, A. (2014). "The Finite Population Bootstrap - from the Maximum Likelihood to the Horvitz-Thompson Approach". Austrian Journal of Statistics. 43: 93–102.CS1 maint: Date and year (link)
↑ https://cran.r-project.org/web/packages/survey/

External links

Survey Package Website for R

This article is issued from Wikipedia. The text is licensed under Creative Commons - Attribution - Sharealike. Additional terms may apply for the media files.

[1] Horvitz, D. G.; Thompson, D. J. (1952) "A generalization of sampling without replacement from a finite universe", Journal of the American Statistical Association, 47, 663–685, . JSTOR 2280784

[2] William G. Cochran (1977), Sampling Techniques, 3rd Edition, Wiley. ISBN 0-471-16240-X

[3] Roderick J.A. Little, Donald B. Rubin (2002) Statistical Analysis With Missing Data, 2nd ed., Wiley. ISBN 0-471-18386-5

[4] Quatember, A. (2014). "The Finite Population Bootstrap - from the Maximum Likelihood to the Horvitz-Thompson Approach". Austrian Journal of Statistics. 43: 93–102.CS1 maint: Date and year (link)

[5] ttps://cran.r-project.org/web/packages/survey/