The huge quantity of information, talks, posts, and papers available on the web cannot be ignored by companies. Being aware in near-real time of hot topics and opinions about a product or a topic is strategic for taking better decisions. Unfortunately, most of this information is totally or partially unstructured, thus it is difficult to exploit it with traditional database technology. Similarly, a relevant portion of the information stored in the information systems of the enterprises is unstructured (e.g., emails, documental repositories, CRM conversations) and currently underexploited.
Companies are asking for tools that can handle a large quantity of unstructured data (the so called Big Data) to identify, extract, and synthesize relevant information through a semantic analyis of the text. Semantic research engines are not sufficient since they simply retrieve information related to a set of keywords. A complex suite of tools is necessary instead to carry out an in-depth analysis of the text, to allow an efficient storing of data, and to enable powerful and real-time analyses. The information extracted from the texts can have a statistic nature (e.g., the words more used in a given domain, the words used to qualify a given topic, etc.) or a semantic one (e.g., the opinion related to a given topic). Practitioners often refer to this family of tools as Opinion Mining software, Sentiment Analysis Software, or Brand Reputation Software. Actually, the set of functionalities made available are large and heterogeneous, and they are obtained by applying several different techniques in the area of Text Analytics ranging from Text Mining, to Natural Language Processing, to Information Retrieval.
Although the competitive advantage deriving from the use of such techniques is apparent in all the decision making processes, commercial tools are not mature enough and many research issues remain open: