Image Sequence Coding by Octrees
This work addresses the problem of representing an image sequence as a set of octrees. The purpose is to generate a flexible data structure to model video signals, for applications such as motion estimation, video coding and/or analysis. An image sequence can be represented as a 3-dimensional causal signal f(x,y,t sub n) with n = 0,1,... ;x epsilon X ;y epsilon Y; where X and Y are finite domains. After digitization, the corresponding digital sequence is composed by N frames of size K x L, defining a 3D array of size K x L x N. If it is desirable to track long-term spatio-temporal correlation, a series of octree structures may be embedded on this 3D array. Each octree looks at a subset of data in the spatio-temporal space. At the lowest level (leaves of the octree), adjacent pixels of neighboring frames are captured. A combination of these is represented at the parent level of each group of 8 children. This combination may result in a more compact representation of the information of these pixels (coding application) or in a local estimate of some feature of interest (e.g., velocity, classification, object boundary). This combination can be iterated bottom up to get a hierarchical description of the image sequence characteristics.