Nonparametric Definition of the Representativeness of a Sample - with Tables

01 January 1958

New Image

This paper deals with the problem of determining how large a random sample is needed in order to guarantee with preassigned probability P* t h a t the sample will have a specified amount (or a specified degree) of representativeness of the true, unknown (cumulative) distribution F under study. No a priori information is given about F and no assumptions are made about the form of F. The solution given is nonparametric (i.e., distribution-free) so t h a t the results obtained and the tables and graphs 135 136 1-40 T H E BELL SYSTEM TECHNICAL JOURNAL, JANUARY 1 9 5 8 constructed are valid for any true underlying distribution. The case of a finite population as well as that of an infinite population is considered; in the latter case it is assumed only for ease of exposition that those percentiles of F which enter the discussion are uniquely defined and have probability zero under F. (This will, in particular, be the case when F has a density function without zero-stretches between points having positive density.) A definition of representativeness (and also a degree of representativeness) is given with respect to those parts of F which are between certain percentiles which we denote by F"1(pi), the values of p, being preassigned. The intervals between these percentiles will be called cells and we shall only consider collections of pairwise disjoint cells. For example the experimenter may want to guarantee with probability at least P* = 0.90 that between 40 per cent and 60 per cent of his sample will lie on each side of the population median.