Semantic search over encrypted data
01 January 2013
In this article we present an original solution to semantically search over encrypted data. In fact the idea is mainly based on what already exist on search over plaintext data, in fact in the purpose to enhance the accuracy information retrieval many studies targeted the semantic search amelioration namely the usage of stemming algorithms. Indeed, stemming algorithms aim to find roots of a set of words and then associate each root to the corresponding meaningfully related word. This association will create a cluster of semantic belonging and consequently if the user searches for any keyword in the cluster means implicitly that he searches at the same time for all related keywords. This technique shows its efficiency for plaintext data, thus we aim to use it in an encrypted context. In literature, there are many schemes aiming to securely search for keywords in encrypted documents that we have introduced earlier under the name of searchable encryption concept, on all these schemes we can retain the one introduced by Curtmola which performs one of the most efficient search over encrypted data. This scheme enables the user to store his encrypted documents and the associated secure index which contains all the existing keywords in these documents, the search phase than is performed as follow: the user send an encrypted query to the server containing the keyword, and the output will be the entire encrypted documents containing the exact keyword. Our contribution stands here: Why can not we use this scheme in order to perform semantic search over encrypted data? In the basic scheme we associated one keyword to all its associated documents, instead, in order to allow semantic search, we have to associate related keywords to the same set of documents. Related keywords have the same root and consequently we can only store the root of related keywords in the secure index. In the search phase, the user will send the keyword that he wants to search for, then implicitly, the stemming will create the root of this keyword and finally send the root instead of the keyword in the query. As a result, the user will retrieve all encrypted documents which contain all related keywords. This solution is based on a mix of one of the most efficient searchable encryption scheme and a specific stemming algorithm. This later aims to create the stems that the user will store in the secure index and to compute the stem to send in the search phase.