Extracting Arabic composite names using genitive principles of Arabic grammar

Khalil, H, Osman, T ORCID logoORCID: https://orcid.org/0000-0001-8781-2658 and Miltan, M, 2020. Extracting Arabic composite names using genitive principles of Arabic grammar. ACM Transactions on Asian and Low-Resource Language Information Processing, 19 (4): 57. ISSN 2375-4699

[thumbnail of 1334483_Osman.pdf]
Preview
Text
1334483_Osman.pdf - Post-print

Download (832kB) | Preview

Abstract

Named Entity Recognition (NER) is a basic prerequisite of using Natural Language Processing (NLP) for information retrieval. Arabic NER is especially challenging as the language is morphologically rich and has short vowels with no capitalisation convention. This article presents a novel rule-based approach that uses linguistic grammar-based techniques to extract Arabic composite names from Arabic text. Our approach uniquely exploits the genitive Arabic grammar rules; in particular, the rules regarding the identification of definite nouns (معرفة) and indefinite nouns (نكرة) to support the process of extracting composite names. Based on domain knowledge and Arabic Genitive Rules (AGR), the developed approach formalises a set of syntactical rules and linguistic patterns that initially use genitive patterns to classify definiteness within phrases and then extracts proper composite names from the unstructured text. The developed novel approach does not place any constraints on the length of the Arabic composite name and our initial experimentation demonstrated high recall and precision results when the NER algorithm was applied to a financial domain corpus.

Item Type: Journal article
Publication Title: ACM Transactions on Asian and Low-Resource Language Information Processing
Creators: Khalil, H., Osman, T. and Miltan, M.
Publisher: Association for Computing Machinery (ACM)
Date: June 2020
Volume: 19
Number: 4
ISSN: 2375-4699
Identifiers:
Number
Type
10.1145/3382187
DOI
1334483
Other
Divisions: Schools > School of Science and Technology
Record created by: Linda Sullivan
Date Added: 23 Jun 2020 07:51
Last Modified: 23 Jun 2020 08:43
URI: https://irep.ntu.ac.uk/id/eprint/40076

Actions (login required)

Edit View Edit View

Statistics

Views

Views per month over past year

Downloads

Downloads per month over past year