Skip to main content

One Size Fits All: Towards Domain-Independent Automated Processing of Free-form Text

01 January 2019

New Image

Free-form, unstructured and semi-structured textual data has become increasingly more prevalent among the data owned by service providers across business domains. Some typical examples include textual data from customer care tickets, surveys, social media, machine logs, alarm and alerting systems, and diagnostics. There is a growing business need to rapidly and automatically understand the underlying key topics and categories of this bulk collection of textual data. In this paper, we propose a domain-agnostic, unsupervised approach that deploys a multi-stage text processing pipeline for automatically discovering the key topics and categories from free-form text documents. For each input text document, the text processing pipeline assigns either a `hard' hierarchical category or a `soft' topical alignment identified by a set of topical key words.