Khalil, H., Osman, T. ORCID: 0000-0001-8781-2658 and Miltan, M., 2020. Extracting Arabic composite names using genitive principles of Arabic grammar. ACM Transactions on Asian and Low-Resource Language Information Processing, 19 (4): 57. ISSN 2375-4699
|
Text
1334483_Osman.pdf - Post-print Download (832kB) | Preview |
Abstract
Named Entity Recognition (NER) is a basic prerequisite of using Natural Language Processing (NLP) for information retrieval. Arabic NER is especially challenging as the language is morphologically rich and has short vowels with no capitalisation convention. This article presents a novel rule-based approach that uses linguistic grammar-based techniques to extract Arabic composite names from Arabic text. Our approach uniquely exploits the genitive Arabic grammar rules; in particular, the rules regarding the identification of definite nouns (معرفة) and indefinite nouns (نكرة) to support the process of extracting composite names. Based on domain knowledge and Arabic Genitive Rules (AGR), the developed approach formalises a set of syntactical rules and linguistic patterns that initially use genitive patterns to classify definiteness within phrases and then extracts proper composite names from the unstructured text. The developed novel approach does not place any constraints on the length of the Arabic composite name and our initial experimentation demonstrated high recall and precision results when the NER algorithm was applied to a financial domain corpus.
Item Type: | Journal article | ||||||
---|---|---|---|---|---|---|---|
Publication Title: | ACM Transactions on Asian and Low-Resource Language Information Processing | ||||||
Creators: | Khalil, H., Osman, T. and Miltan, M. | ||||||
Publisher: | Association for Computing Machinery (ACM) | ||||||
Date: | June 2020 | ||||||
Volume: | 19 | ||||||
Number: | 4 | ||||||
ISSN: | 2375-4699 | ||||||
Identifiers: |
|
||||||
Divisions: | Schools > School of Science and Technology | ||||||
Record created by: | Linda Sullivan | ||||||
Date Added: | 23 Jun 2020 07:51 | ||||||
Last Modified: | 23 Jun 2020 08:43 | ||||||
URI: | https://irep.ntu.ac.uk/id/eprint/40076 |
Actions (login required)
Edit View |
Views
Views per month over past year
Downloads
Downloads per month over past year