Skip to main content

Clusters of Languages: Checking the Variance

New Image

When languages are hierarchically clustered using lexicostatistical similarity percentages; the resulting tree may fit the data far better than in most other applications of hierarchical clustering. In particular, the percentages between most-remote pairs of languages beneath any single node may vary surprisingly little. It then becomes appropriate to check whether the sample variance of such percentages has a reasonable value, as a further check on the model. We present full results of such data analysis for a tree of 84 Indoeuropean languages and dialects.