Automated knowledge extraction from text.

Bowden, PR, 1999. Automated knowledge extraction from text. PhD, Nottingham Trent University.

[thumbnail of 10183048.pdf]
Preview
Text
10183048.pdf - Published version

Download (38MB) | Preview

Abstract

Knowledge Extraction (KE) is the automated extraction of facts from machine-readable text. KE is a branch of Natural Language Processing (NLP). Within NLP, processing techniques may be deep or shallow. Deep techniques are the traditional methods of NLP and computational linguistics, and are aimed at language understanding. They are mostly domain independent techniques. Shallow techniques are currently a focus of interest and may be defined as methods which achieve NLP goals without recourse to attempts to understand fully the input text. These are mostly domain specific techniques.

Deep processing approaches are considered with respect to the problems they entail. These problems can be both theoretical and practical. These and other difficulties are used to justify shallow attempts at NLP tasks. After a review of several existing KE and similar systems this work describes tire knowledge extraction program developed by the author (KEP). KEP aims to be shallow and non domain specific, and extracts factual knowledge from explanatory texts. A pattern-matching approach is used which cuts fact-bearing sentences into fragments so that concepts and the facts relating to them can be extracted. Various conceptual relations are searched for, including at present definitions (definitions of concepts), hypernyms (parent classes of concepts), exemplifications (examples of concepts) and partitions (lists of the component parts of a concept).

One of the motivating factors for doing this research was the desire to answer the question: how useful can a specific set of shallow techniques be in a non domain specific NLP application? This is an important question at a time when shallow techniques are viewed favourably by the NLP community. To this end, the performance of KEP has been evaluated using the recall and precision measures. As a final demonstration of the program's abilities, KEP has also been run on a large part of the text from this work to produce a first-cut glossary for that text. This glossary successfully captures the main concepts from the text and provides useful explanations of them in many cases.

It is concluded that KEP is a working program which demonstrates the usefulness of shallow, non domain specific methods, and which has opened up the possibilities of several new research directions, including automatic index creation, student assignment marking, and information retrieval from the Internet for the automatic construction of semantic-net knowledge bases.

Item Type: Thesis
Creators: Bowden, P.R.
Date: 1999
ISBN: 9781369313406
Identifiers:
Number
Type
PQ10183048
Other
Divisions: Schools > School of Science and Technology
Record created by: Jeremy Silvester
Date Added: 03 Sep 2020 14:56
Last Modified: 22 Jun 2023 09:50
URI: https://irep.ntu.ac.uk/id/eprint/40620

Actions (login required)

Edit View Edit View

Statistics

Views

Views per month over past year

Downloads

Downloads per month over past year