Frincu, M ORCID: https://orcid.org/0000-0003-1034-8409, Frincu, S and Penteliuc, M, 2023. Challenges and solutions in transliterating 19th century Romanian texts from the transitional to the Latin script. In: Carvalho, S, Khan, AF, Anic, AO, Spahiu, B, Gracia, J, McCrae, JP, Gromann, D, Heinisch, B and Salgado, AC, eds., Language, Data and Knowledge 2023 (LDK 2023): proceedings of the 4th Conference on Language, Data and Knowledge. Lisbon, Portugal: Universidade Nova de Lisboa, pp. 226-231. ISBN 9789895408153
Full text not available from this repository.Abstract
During the 19th century, the Romanian script has undergone a massive yet uneven transition from the Cyrillic to the current Latin alphabet. The amount of existing literature written in that script as well as the problems it poses for OCR and transliteration engines make the problem highly challenging from a Big Data perspective. In this paper, we discuss the issues and propose and test a machine-learning solution trained on small datasets using either transfer learning from Latin/Cyrillic or from scratch.
Item Type: | Chapter in book |
---|---|
Creators: | Frincu, M., Frincu, S. and Penteliuc, M. |
Publisher: | Universidade Nova de Lisboa |
Place of Publication: | Lisbon, Portugal |
Date: | August 2023 |
ISBN: | 9789895408153 |
Identifiers: | Number Type 10.34619/srmk-injj DOI 1805644 Other |
Rights: | Open access: Attribution 4.0 International (CC BY 4.0) |
Divisions: | Schools > School of Science and Technology |
Record created by: | Jonathan Gallacher |
Date Added: | 18 Sep 2023 14:45 |
Last Modified: | 18 Sep 2023 14:45 |
URI: | https://irep.ntu.ac.uk/id/eprint/49732 |
Actions (login required)
Edit View |
Statistics
Views
Views per month over past year
Downloads
Downloads per month over past year