An unsupervised data-driven method to discover equivalent relations in large linked datasets

Tools

Zhang, Z ORCID: https://orcid.org/0000-0002-8587-8618, Gentile, AL, Blomqvist, E, Augenstein, I and Ciravegna, F, 2017. An unsupervised data-driven method to discover equivalent relations in large linked datasets. Semantic Web, 8 (3), pp. 437-452. ISSN 1570-0844

Preview

Text
PubSub6046_Zhang.pdf - Post-print
Download (808kB) | Preview

Official URL: http://doi.org/10.3233/SW-160213

Abstract

This article addresses a number of limitations of state-of-the-art methods of Ontology Alignment: 1) they primarily address concepts and entities while relations are less well-studied; 2) many build on the assumption of the ‘well-formedness’ of ontologies which is unnecessarily true in the domain of Linked Open Data; 3) few have looked at schema heterogeneity from a single source, which is also a common issue particularly in very large Linked Dataset created automatically from heterogeneous resources, or integrated from multiple datasets. We propose a domain- and language-independent and completely unsupervised method to align equivalent relations across schemata based on their shared instances. We introduce a novel similarity measure able to cope with unbalanced population of schema elements, an unsupervised technique to automatically decide similarity threshold to assert equivalence for a pair of relations, and an unsupervised clustering process to discover groups of equivalent relations across different schemata. Although the method is designed for aligning relations within a single dataset, it can also be adapted for cross-dataset alignment where sameAs links between datasets have been established. Using three gold standards created based on DBpedia, we obtain encouraging results from a thorough evaluation involving four baseline similarity measures and over 15 comparative models based on variants of the proposed method. The proposed method makes significant improvement over baseline models in terms of F1 measure (mostly between 7% and 40%), and it always scores the highest precision and is also among the top performers in terms of recall. We also make public the datasets used in this work, which we believe make the largest collection of gold standards for evaluating relation alignment in the LOD context.

Item Type:	Journal article
Publication Title:	Semantic Web
Creators:	Zhang, Z., Gentile, A.L., Blomqvist, E., Augenstein, I. and Ciravegna, F.
Publisher:	IOS Press
Date:	2017
Volume:	8
Number:	3
ISSN:	1570-0844
Identifiers:	Number Type 10.3233/SW-160213 DOI
Divisions:	Schools > School of Science and Technology
Record created by:	Jonathan Gallacher
Date Added:	13 Sep 2016 08:43
Last Modified:	20 Oct 2017 09:58
URI:	https://irep.ntu.ac.uk/id/eprint/28467