World Library  
Flag as Inappropriate
Email this Article

Empirical measure

Article Id: WHEBN0005647670
Reproduction Date:

Title: Empirical measure  
Author: World Heritage Encyclopedia
Language: English
Subject: Empirical process, Empirical distribution function, Empirical distribution, Measures (measure theory), Empirical probability
Collection: Empirical Process, Measures (Measure Theory), Probability Theory
Publisher: World Heritage Encyclopedia
Publication
Date:
 

Empirical measure

In probability theory, an empirical measure is a random measure arising from a particular realization of a (usually finite) sequence of random variables. The precise definition is found below. Empirical measures are relevant to mathematical statistics.

The motivation for studying empirical measures is that it is often impossible to know the true underlying probability measure P. We collect observations X_1, X_2, \dots , X_n and compute relative frequencies. We can estimate P, or a related distribution function F by means of the empirical measure or empirical distribution function, respectively. These are uniformly good estimates under certain conditions. Theorems in the area of empirical processes provide rates of this convergence.

Contents

  • Definition 1
  • Empirical distribution function 2
  • See also 3
  • References 4
  • Further reading 5

Definition

Let X_1, X_2, \dots be a sequence of independent identically distributed random variables with values in the state space S with probability measure P.

Definition

The empirical measure Pn is defined for measurable subsets of S and given by
P_n(A) = {1 \over n} \sum_{i=1}^n I_A(X_i)=\frac{1}{n}\sum_{i=1}^n \delta_{X_i}(A)
where I_A is the indicator function and \delta_X is the Dirac measure.

For a fixed measurable set A, nPn(A) is a binomial random variable with mean nP(A) and variance nP(A)(1 − P(A)). In particular, Pn(A) is an unbiased estimator of P(A).

Definition

\bigl(P_n(c)\bigr)_{c\in\mathcal{C}} is the empirical measure indexed by \mathcal{C}, a collection of measurable subsets of S.

To generalize this notion further, observe that the empirical measure P_n maps measurable functions f:S\to \mathbb{R} to their empirical mean,

f\mapsto P_n f=\int_S f \, dP_n=\frac{1}{n}\sum_{i=1}^n f(X_i)

In particular, the empirical measure of A is simply the empirical mean of the indicator function, Pn(A) = Pn IA.

For a fixed measurable function f, P_nf is a random variable with mean \mathbb{E}f and variance \frac{1}{n}\mathbb{E}(f -\mathbb{E} f)^2.

By the strong law of large numbers, Pn(A) converges to P(A) almost surely for fixed A. Similarly P_nf converges to \mathbb{E} f almost surely for a fixed measurable function f. The problem of uniform convergence of Pn to P was open until Vapnik and Chervonenkis solved it in 1968.[1]

If the class \mathcal{C} (or \mathcal{F}) is Glivenko–Cantelli with respect to P then Pn converges to P uniformly over c\in\mathcal{C} (or f\in \mathcal{F}). In other words, with probability 1 we have

\|P_n-P\|_\mathcal{C}=\sup_{c\in\mathcal{C}}|P_n(c)-P(c)|\to 0,
\|P_n-P\|_\mathcal{F}=\sup_{f\in\mathcal{F}}|P_nf-\mathbb{E}f|\to 0.

Empirical distribution function

The empirical distribution function provides an example of empirical measures. For real-valued iid random variables X_1,\dots,X_n it is given by

F_n(x)=P_n((-\infty,x])=P_nI_{(-\infty,x]}.

In this case, empirical measures are indexed by a class \mathcal{C}=\{(-\infty,x]:x\in\mathbb{R}\}. It has been shown that \mathcal{C} is a uniform Glivenko–Cantelli class, in particular,

\sup_F\|F_n(x)-F(x)\|_\infty\to 0

with probability 1.

See also

References

  1. ^ Vapnik, V.; Chervonenkis, A (1968). "Uniform convergence of frequencies of occurrence of events to their probabilities". Dokl. Akad. Nauk SSSR 181. 

Further reading

  • Billingsley, P. (1995). Probability and Measure (Third ed.). New York: John Wiley and Sons.  
  • Donsker, M. D. (1952). "Justification and extension of Doob's heuristic approach to the  
  • Dudley, R. M. (1978). "Central limit theorems for empirical measures".  
  • Dudley, R. M. (1999). Uniform Central Limit Theorems. Cambridge Studies in Advanced Mathematics 63. Cambridge, UK: Cambridge University Press.  
  • Wolfowitz, J. (1954). "Generalization of the theorem of Glivenko–Cantelli". Annals of Mathematical Statistics 25 (1): 131–138.  
This article was sourced from Creative Commons Attribution-ShareAlike License; additional terms may apply. World Heritage Encyclopedia content is assembled from numerous content providers, Open Access Publishing, and in compliance with The Fair Access to Science and Technology Research Act (FASTR), Wikimedia Foundation, Inc., Public Library of Science, The Encyclopedia of Life, Open Book Publishers (OBP), PubMed, U.S. National Library of Medicine, National Center for Biotechnology Information, U.S. National Library of Medicine, National Institutes of Health (NIH), U.S. Department of Health & Human Services, and USA.gov, which sources content from all federal, state, local, tribal, and territorial government publication portals (.gov, .mil, .edu). Funding for USA.gov and content contributors is made possible from the U.S. Congress, E-Government Act of 2002.
 
Crowd sourced content that is contributed to World Heritage Encyclopedia is peer reviewed and edited by our editorial staff to ensure quality scholarly research articles.
 
By using this site, you agree to the Terms of Use and Privacy Policy. World Heritage Encyclopedia™ is a registered trademark of the World Public Library Association, a non-profit organization.
 



Copyright © World Library Foundation. All rights reserved. eBooks from Hawaii eBook Library are sponsored by the World Library Foundation,
a 501c(4) Member's Support Non-Profit Organization, and is NOT affiliated with any governmental agency or department.