Large vocabulary semantic analysis for text recognition

Rose, TG, 1993. Large vocabulary semantic analysis for text recognition. PhD, Nottingham Trent University.

[thumbnail of 10290211.pdf]
Preview
Text
10290211.pdf - Published version

Download (32MB) | Preview

Abstract

This thesis describes research work undertaken by the author from October 1988 to September 1992 concerning the automatic recognition of text (either handwritten or typescript) by computer. In particular, it details the use of semantic information (using lexical co-occurrence and collocational models rather than compositional theories) to improve the performance of a computerised handwriting recognition system. An important part of this work has been the systematic empirical testing and validation of the techniques so developed.

Such is the visual ambiguity of handwriting that a number of possible interpretations may be made for any written word. Indeed, this is true of any text, but especially handwritten text since the segmentation between the individual characters is particularly indistinct. Human readers cope with this by making selective use of visual cues and using an understanding of the text to compensate for any degradation or ambiguity within the visual stimulus. Word images occur within a meaningful context, and human readers are able to exploit the syntactic and semantic constraints of the textual material. Analogously, computerised text recognition systems would be enhanced by using higher level knowledge. Character recognition techniques alone are insufficient to unambiguously identify the input, particularly that of handwritten data.

Ideally, this higher-level knowledge would be acquired by the creation of a lexical database that contains all the relevant information. However, to create a semantic lexicon by hand for a large vocabulary is a considerable task - which is a major reason why so many semantic theories fail to "scale up" from the small, artificial domains in which they were developed. An alternative approach is to exploit existing sources of semantic information, such as machine-readable dictionaries and text corpora. This thesis describes the acquisition of semantic knowledge from such sources and its use in computerised text recognition systems.

Item Type: Thesis
Creators: Rose, T.G.
Date: 1993
ISBN: 9781369324600
Identifiers:
Number
Type
PQ10290211
Other
Rights: This copy has been supplied on the understanding that it is copyright material and that no quotation from the thesis may be published without proper acknowledgement.
Divisions: Schools > School of Science and Technology
Record created by: Linda Sullivan
Date Added: 16 Jun 2021 15:44
Last Modified: 17 Oct 2023 14:38
URI: https://irep.ntu.ac.uk/id/eprint/43101

Actions (login required)

Edit View Edit View

Statistics

Views

Views per month over past year

Downloads

Downloads per month over past year