Challenges and solutions in transliterating 19th century Romanian texts from the transitional to the Latin script

Frincu, M ORCID logoORCID: https://orcid.org/0000-0003-1034-8409, Frincu, S and Penteliuc, M, 2023. Challenges and solutions in transliterating 19th century Romanian texts from the transitional to the Latin script. In: Carvalho, S, Khan, AF, Anic, AO, Spahiu, B, Gracia, J, McCrae, JP, Gromann, D, Heinisch, B and Salgado, AC, eds., Language, Data and Knowledge 2023 (LDK 2023): proceedings of the 4th Conference on Language, Data and Knowledge. Lisbon, Portugal: Universidade Nova de Lisboa, pp. 226-231. ISBN 9789895408153

Full text not available from this repository.

Abstract

During the 19th century, the Romanian script has undergone a massive yet uneven transition from the Cyrillic to the current Latin alphabet. The amount of existing literature written in that script as well as the problems it poses for OCR and transliteration engines make the problem highly challenging from a Big Data perspective. In this paper, we discuss the issues and propose and test a machine-learning solution trained on small datasets using either transfer learning from Latin/Cyrillic or from scratch.

Item Type: Chapter in book
Creators: Frincu, M., Frincu, S. and Penteliuc, M.
Publisher: Universidade Nova de Lisboa
Place of Publication: Lisbon, Portugal
Date: August 2023
ISBN: 9789895408153
Identifiers:
Number
Type
10.34619/srmk-injj
DOI
1805644
Other
Rights: Open access: Attribution 4.0 International (CC BY 4.0)
Divisions: Schools > School of Science and Technology
Record created by: Jonathan Gallacher
Date Added: 18 Sep 2023 14:45
Last Modified: 18 Sep 2023 14:45
URI: https://irep.ntu.ac.uk/id/eprint/49732

Actions (login required)

Edit View Edit View

Statistics

Views

Views per month over past year

Downloads

Downloads per month over past year