Joint Sequence Complexity Analysis: Application to Social Networks Information Flow

01 March 2014

New Image

In this paper we study the joint sequence complexity and its applications on finding similarities between sequences up to the discrimination of sources. The mathematical concept of the complexity of a sequence is defined as the number of distinct subsequences of it. Sequences containing many common parts have a higher joint complexity. The analysis of a sequence in subcomponents is done by Suffix Trees, which is a simple, fast and low complexity method to store and recall them from the memory, especially for short sequences. Joint complexity is used for evaluating the similarity between sequences generated by different Markov sources. Markov models describe well the generation of natural text, and their performance can be predicted via linear algebra, combinatorics and asymptotic analysis. We exploit datasets of different natural languages, on short and long sequences, with very promising results. The goal is to perform automated on-line sequence analysis on information streams, on social networks, like Twitter.