-
oa Discovering the Truth on the Web Data: One Facet of Data Forensics
- Publisher: Hamad bin Khalifa University Press (HBKU Press)
- Source: Qatar Foundation Annual Research Conference Proceedings, Qatar Foundation Annual Research Conference Proceedings Volume 2016 Issue 1, Mar 2016, Volume 2016, ICTPP3179
Abstract
Data Forensics with Analytics, or DAFNA for short, is an ambitious project initiated by the Data Analytics Research Group in Qatar Computing Research Institute, Hamad Bin Khalifa University. It main goal is to provide effective algorithms and tools for determining the veracity of structured information when they originate from multiple sources. The ability to efficiently estimate the veracity of data, along with the reliability level of the information sources, is a challenging problem with many real-world use cases (e.g., data fusion, social data analytics, rumour detection, etc.) in which users rely on a semi-automated data extraction and integration process in order to consume high quality information for personal or business purposes. DAFNA's vision is to provide a suite of tools for Data Forensics and investigate various research topics such as fact-checking and truth discovery and their practical applicability. We will present our ongoing development (dafna.qcri.org) on extensively comparing the state-of-the-art truth discovery algorithms, releasing a new system and the first REST API for truth discovery, and designing a novel hybrid truth discovery approach using active ensembling. Finally, we will briefly present real-world applications of truth discovery from Web data.
Efficient Truth Discovery. Truth discovery is a hard problem to deal with since there is no a priori knowledge about the veracity of provided information and the reliability level of online sources. This raises many questions about a thorough understanding of the state-of-the-art truth discovery algorithms and their applicability for actionable truth discovery. A new truth discovery approach is needed and it should be rather comprehensible and domain-independent. In addition, it should take advantage of the benefits of existing solutions, while being built on realistic assumptions for an easy use in real-world applications. In this context, we propose an approach that deals with open truth discovery challenges and consists of the following contributions: (i) The thorough comparative study of existing truth discovery algorithms; (ii) The design and release of the first online truth discovery system and the first REST API for truth discovery available at dafna.qcri.org; (iii) An hybrid truth discovery method using active ensembling; and (iv) An application to query answering related to Qatar where the veracity of information provided by multiple Web sources is estimated.