Developing an extended online grooming dataset to evaluate the robustness of context determination using BERT

Street, J ORCID logoORCID: https://orcid.org/0000-0002-9305-8468, Ihianle, I ORCID logoORCID: https://orcid.org/0000-0001-7445-8573 and Lotfi, A ORCID logoORCID: https://orcid.org/0000-0002-5139-6565, 2026. Developing an extended online grooming dataset to evaluate the robustness of context determination using BERT. In: ICAAI '25: Proceedings of the 2025 9th International Conference on Advances in Artificial Intelligence. New York: Association for Computing Machinery, pp. 220-224. ISBN 9798400721045

Full text not available from this repository.

Abstract

Online Grooming (OG) is a pertinent threat to children online with limited real world solutions. OG is defined as the deceptive practice targeting children for sexual exploitation. Due to the obvious concerns with ethics and privacy, data scarcity is a recurring problem throughout this research with most datasets not being made publicly available. This investigation introduces a new dataset to the research community (Discord PJ) formed of 137 transcripts for Online Grooming classification, linguistic, and psychological research. Analysis of this dataset shows consistent proportionality between message line frequencies and Linguistic Inquiry Word Count category proportions with other OG datasets. In further analysis this dataset was used to determine the robustness of the ‘Context Determination’ approach which determines if an Adult and a Child are communicating in a transcript. Discord PJ outperformed the original dataset BERT model based on the Message Level Analysis True Positive metric. However when considering transcript-level Context Determination F1 scores this underperformed by 0.11-0.18 in comparison to the Context Determination of PAN12

Item Type: Chapter in book
Creators: Street, J., Ihianle, I. and Lotfi, A.
Publisher: Association for Computing Machinery
Place of Publication: New York
Date: 25 April 2026
ISBN: 9798400721045
Identifiers:
Number
Type
10.1145/3787279.3787315
DOI
2681231
Other
Rights: © 2025 Copyright held by the owner/author(s). This work is licensed under a Creative Commons Attribution-NonCommercial NoDerivatives 4.0 International License (https://creativecommons.org/licenses/by-nc-nd/4.0/).
Divisions: Schools > School of Science and Technology
Record created by: Jonathan Gallacher
Date Added: 29 Apr 2026 12:24
Last Modified: 29 Apr 2026 12:24
URI: https://irep.ntu.ac.uk/id/eprint/55624

Actions (login required)

Edit View Edit View

Statistics

Views

Views per month over past year

Downloads

Downloads per month over past year