Statistical Synopses for Graph-Structured XML Databases

01 January 2002

New Image

In this paper, we introduce a novel approach to buiding and using statistical summaries of large XML data graphs for effective path-expression selectivity estimation. Our proposed graph-synopsis model (termed XSketch) exploits localized graph stability to accurately approximate (in limited space) the path and branching distribution in the data graph. To estimate the selectivities of complex path expressions over concise XSketch synposes, we develop an estimation framework that relies on appropriate statistical (uniformity and independence) assumptions to compensate for the lack of detailed distribution information.