A knowledge-based framework for information extraction and exploration

Aljamel, A

NTU > IRep

IRep

A knowledge-based framework for information extraction and exploration

Tools

Aljamel, A, 2018. A knowledge-based framework for information extraction and exploration. PhD, Nottingham Trent University.

Preview

Text
AbdulademAljamel2018.pdf - Published version
Download (3MB) | Preview

Abstract

Harnessing insights from the colossal amount of online information requires the computerised processing of unstructured text in order to satisfy the information need of particular applications such as recommender systems and sentiment analysis. The increasing availability of online documents that describe domain-specific information provides an opportunity in employing a knowledge-based approach in extracting information from Web data.

In this thesis, a novel comprehensive knowledge-based framework is proposed to construct and exploit a domain-specific semantic knowledgebase. The proposed framework introduces a methodology for linking several components of different techniques and tools. It focuses on providing reusable and configurable data and application templates, which allow developers to apply it in diversity of domains. The objectives of this framework are: extracting information from unstructured data, constructing a semantic knowledgebase from the extracted information, enriching the resultant semantic knowledgebase by sourcing appropriate semi-structured and structured datasets, and consuming the resultant semantic knowledgebase to facilitate the intelligent exploration and search of information. For the purpose of investigating the challenges of extracting and modelling information in a specific domain, the financial domain was employed as a use-case in the context of a stock investment motivating scenario.

The developed knowledge-based approach exploits the semantic and syntactic characteristics of the problem domain knowledge in implementing a hybrid approach of Rule-based and Machine Learning based relation classification. The rule-based approach is adopted in the Natural Language Processing tasks associated with linguistic and structural features, Named Entity Recognition, instances labelling and feature generation processes. The results of these tasks are used to classify the relations between the named entities by employing the Machine Learning based relation classification. In addition, the domain knowledge is analysed to benefit knowledge modelling by translating the domain key concepts into a formal ontology. This ontology is employed in constructing semantic knowledgebase from unstructured online data of a specific domain, enriching the resulting semantic knowledgebase by sourcing semi-structured and structured online data sources and applying advanced classifications and inference technologies to infer new and interesting facts to improve the decision-making and intelligent exploration activities. However, most relations are non-binary in the problem domain knowledge because of its specific characteristic hence an appropriate N-ary relation patterns technique were adopted and investigated.

A serious of a novel experiments were conducted to implement and configure a Machine Learning based relation classification. The experimental evaluation evidenced that the developed knowledge-assisted ML relation classification model, which was further boosted by our implementation of GAs to reduce the feature space, has resulted in significant improvement in the process of relation extraction. The experimental results also indicate that amongst the implemented ML algorithms, SVM exhibited the best relation classification accuracy in the majority of the training datasets, while retaining acceptable levels of accuracy in the rest in the remaining training datasets.

Web Ontology Language (OWL) reasoning and rule-based reasoning on the resultant semantic knowledgebase were applied to derive stock investment specific recommendations. In addition, SPARQL query language was employed to explore the semantic knowledgebase. Moreover, taking into consideration the problem domain's requirements for modelling non-binary relations, a relation-as-class N-ary relations pattern was implemented, and the reasoning axioms and query language were adjusted to fit the intermediate resources in the N-ary relations requirements.

In this thesis also the experience on addressing the challenges of implementing the proposed knowledge-based framework for constructing and exploiting a semantic knowledgebase were summarised. These challenges can be considered by domain experts and knowledge engineers as a novel methodology for employing the Semantic Web Technologies for the knowledge user to intelligently exploit knowledge in similar problem domains.

The evaluation of knowledge accessibility by utilising Semantic Web Technologies in the developed application includes the ability of data retrieval to obtain either the entire or some portion of the data from the semantic knowledgebase for a particular use-case scenario. Investigating the tasks of reasoning, accessing and querying the semantic knowledgebase evidences that Semantic Web Technologies can perform an accurate and complex knowledge representation to share Knowledge from a diversity of data sources and, improve the decision‑making process and the intelligent exploration of the semantic knowledgebase.

Item Type:	Thesis
Creators:	Aljamel, A.
Date:	January 2018
Rights:	This work is the intellectual property of the author. You may copy up to 5% of this work for private study, or personal, non-commercial research. Any re-use of the information contained within this document should be fully referenced, quoting the author, title, university, degree level and pagination. Queries or requests for any other use, or if a more substantial copy is required, should be directed to the owner of the Intellectual Property Rights.
Divisions:	Schools > School of Science and Technology
Record created by:	Linda Sullivan
Date Added:	31 May 2018 08:13
Last Modified:	31 May 2018 08:14
URI:	https://irep.ntu.ac.uk/id/eprint/33759