Efficient architecture and design of an embedded video coding engine
01 September 2001
By doubling the accuracy of motion compensation from integer-pel to half-pel, we can significantly improve the coding gain. Therefore, in this paper, we propose a novel COordinate Rotation Digital Computer (CORDIC) architecture for combined design of discrete cosine transform (DCT) and half-pel motion estimation. Unlike the conventional block matching approaches based on interpolated images, our CORDIC design can directly extracts motion vectors at half-pel accuracy in the transform domain without interpolation. Compared to the conventional block matching methods with interpolation, our multiplier-free design achieves significant hardware saving and far less data flow. Our emphasis in this paper is on achieving efficient design of video coding engine by minimizing computational units along the data path. Furthermore, we implement the embedded design on a dedicated single chip to demonstrate its performance. The DCT-based nature of our design enables us to efficiently combine both DCT and motion estimation units, which are the two most important components of many multimedia standards consuming more than 80% of computing power for a video coder, into one single component. As a result, we can provide a single chip solution for video coding engine while many conventional designs may require multiple chips. In addition, all multiply-and-add (MAC) operations in plane rotations are replaced by CORDIC processors with simple shift-and-add, which is quite simple and compact to realize while being no slower than the bit serial multipliers widely proposed for VLSI array structures. Based on the test result, our chip can operate at 20 MHz with 0.8-mum CMOS technology. Overall, we provide a low-complexity, high throughput solution in this paper for MPEG-1, MPEG-2, and H.263 compatible video codec design.