Exploiting functional discourse grammar to enhance complex Arabic relation extraction using a hybrid semantic knowledge base - machine learning approach

Osman, T ORCID logoORCID: https://orcid.org/0000-0001-8781-2658, Khalil, H, Miltan, M, Shaalan, K and Alfrjani, R ORCID logoORCID: https://orcid.org/0009-0007-4624-4245, 2023. Exploiting functional discourse grammar to enhance complex Arabic relation extraction using a hybrid semantic knowledge base - machine learning approach. ACM Transactions on Asian and Low-Resource Language Information Processing, 22 (8): 214. ISSN 2375-4699

[thumbnail of 2412478_Osman.pdf]
Preview
Text
2412478_Osman.pdf - Post-print

Download (2MB) | Preview

Abstract

Relation extraction from unstructured Arabic text is especially challenging due to the Arabic language complex morphology and the variation in word semantics and lexical categories. The research documented in this paper presents a hybrid Semantic Knowledge base - Machine Learning (SKML) approach for extracting complex Arabic relations from unstructured Arabic documents; the proposed approach exploits the principles of Functional Discourse Grammar (FDG) to emphasise the semantic and pragmatic properties of the language and facilitate the identification of relation elements. At the initial phase, the novel FDG-SKML relation extraction approach deploys a lexical-based mechanism that utilises a purposely built domain-specific Semantic Knowledge to encode the semantic association between the identified relations’ elements. The evaluation of the initial stage evidenced improved accuracy for extracting most complex Arabic relations. The initial relation extraction mechanism was further extended by integrating its output into a Machine Learning classifier that facilitated extracting especially complex relations with significant disparity in the relation elements’ presence, order, and correlation. Using Economics as the problem domain, experimental evaluation evidenced the high accuracy of our FDG-SKML approach in complex Arabic relation extraction task and demonstrated its further improvement upon integration with machine learning classifiers.

Item Type: Journal article
Publication Title: ACM Transactions on Asian and Low-Resource Language Information Processing
Creators: Osman, T., Khalil, H., Miltan, M., Shaalan, K. and Alfrjani, R.
Publisher: Association for Computing Machinery (ACM)
Date: August 2023
Volume: 22
Number: 8
ISSN: 2375-4699
Identifiers:
Number
Type
10.1145/3610581
DOI
2412478
Other
Rights: © ACM 2023. This is the author's version of the work. It is posted here for your personal use. Not for redistribution. The definitive Version of Record was published in ACM Transactions on Asian and Low-Resource Language Information Processing, http://dx.doi.org/10.1145/10.1145/3615980
Divisions: Schools > School of Science and Technology
Record created by: Laura Borcherds
Date Added: 21 Mar 2025 10:13
Last Modified: 21 Mar 2025 10:13
URI: https://irep.ntu.ac.uk/id/eprint/53276

Actions (login required)

Edit View Edit View

Statistics

Views

Views per month over past year

Downloads

Downloads per month over past year