The huge quantity of information, talks, posts, and papers available on the web cannot be ignored by companies. Being aware in near-real time of hot topics and opinions about a product or a topic is strategic for taking better decisions. Unfortunately, most of this information is totally or partially unstructured, thus it is difficult to exploit it with traditional database technology. Similarly, a relevant portion of the information stored in the information systems of the enterprises is unstructured (e.g., emails, documental repositories, CRM conversations) and currently underexploited.
Companies are asking for tools that can handle a large quantity of unstructured data (the so called Big Data) to identify, extract, and synthesize relevant information through a semantic analyis of the text. Semantic research engines are not sufficient since they simply retrieve information related to a set of keywords. A complex suite of tools is necessary instead to carry out an in-depth analysis of the text, to allow an efficient storing of data, and to enable powerful and real-time analyses. The information extracted from the texts can have a statistic nature (e.g., the words more used in a given domain, the words used to qualify a given topic, etc.) or a semantic one (e.g., the opinion related to a given topic). Practitioners often refer to this family of tools as Opinion Mining software, Sentiment Analysis Software, or Brand Reputation Software. Actually, the set of functionalities made available are large and heterogeneous, and they are obtained by applying several different techniques in the area of Text Analytics ranging from Text Mining, to Natural Language Processing, to Information Retrieval.
Although the competitive advantage deriving from the use of such techniques is apparent in all the decision making processes, commercial tools are not mature enough and many research issues remain open:
- The real effectiveness of opinion mining tools has not been independently benchmarked and it strictly depend on the data sources (e.g., social networks, blogs, on-line newspapers) and on the language.
- The web coverage of different web monitoring services is also not clear.
- The costs and the resources needed to carry out a Brand Reputation project can be reduced by the adoption of a correct methodology for tuning and domain verticalization.
- Most of the commercial solutions are "closed" applications and most of the services are one-shot projects rather than stable monitoring systems. Many companies would prefer a solution that could be integrated in the enteprise information systems and that could be considered as yet another data flow to be included in the Business Intelligence platform and to be queried with the traditional tools that are well-known to the users.
Based on our BI and database background we are coping with all of the previous issues. In particular, we have prototyped a Social Business Intelligence platform that enables more powerful and flexible analyses with respect to those made currently available by commercial tools. The prototype has been in used a FIRB Project funded by MIUR, "WebPolEU: Comparing Social Media and Political Participation across EU", aimed at studying the connection between politics and social media. SBI is used in the project as an enabling technology for analyzing the UGC generated in Germany, Italy, and UK during a timespan ranging from March, 2014 to May, 2014 (the 2014 European Parliament Election was held on May 22-25, 2014).