Enumeration of RNA Secondary Structures: A Constrained Coding Approach

01 January 2006

New Image

We consider the problem of enumerating and generating predefined RNA secondary structures in terms of classical constrained coding techniques and new grammar-based extensions thereof. First, we define a class of constraints, termed stem-loop constraints, that restrict the separation length of a phrase and its reverse-complement in both binary sequences and DNA/RNA sequences. For a simple subclass of this constraint, we evaluate the underlying channel capacity. Then we proceed to analyze stem-loop constraints for RNA secondary structures represented by context-free languages. The derived results provide a means for studying the shape diversity of pools of RNA strands involved in the process of aptamer design and identification.