Street, J ORCID: https://orcid.org/0000-0002-9305-8468, Ihianle, I
ORCID: https://orcid.org/0000-0001-7445-8573 and Lotfi, A
ORCID: https://orcid.org/0000-0002-5139-6565,
2026.
Developing an extended online grooming dataset to evaluate the robustness of context determination using BERT.
In:
ICAAI '25: Proceedings of the 2025 9th International Conference on Advances in Artificial Intelligence.
New York: Association for Computing Machinery, pp. 220-224.
ISBN 9798400721045
Abstract
Online Grooming (OG) is a pertinent threat to children online with limited real world solutions. OG is defined as the deceptive practice targeting children for sexual exploitation. Due to the obvious concerns with ethics and privacy, data scarcity is a recurring problem throughout this research with most datasets not being made publicly available. This investigation introduces a new dataset to the research community (Discord PJ) formed of 137 transcripts for Online Grooming classification, linguistic, and psychological research. Analysis of this dataset shows consistent proportionality between message line frequencies and Linguistic Inquiry Word Count category proportions with other OG datasets. In further analysis this dataset was used to determine the robustness of the ‘Context Determination’ approach which determines if an Adult and a Child are communicating in a transcript. Discord PJ outperformed the original dataset BERT model based on the Message Level Analysis True Positive metric. However when considering transcript-level Context Determination F1 scores this underperformed by 0.11-0.18 in comparison to the Context Determination of PAN12
| Item Type: | Chapter in book |
|---|---|
| Creators: | Street, J., Ihianle, I. and Lotfi, A. |
| Publisher: | Association for Computing Machinery |
| Place of Publication: | New York |
| Date: | 25 April 2026 |
| ISBN: | 9798400721045 |
| Identifiers: | Number Type 10.1145/3787279.3787315 DOI 2681231 Other |
| Rights: | © 2025 Copyright held by the owner/author(s). This work is licensed under a Creative Commons Attribution-NonCommercial NoDerivatives 4.0 International License (https://creativecommons.org/licenses/by-nc-nd/4.0/). |
| Divisions: | Schools > School of Science and Technology |
| Record created by: | Jonathan Gallacher |
| Date Added: | 29 Apr 2026 12:24 |
| Last Modified: | 29 Apr 2026 12:24 |
| URI: | https://irep.ntu.ac.uk/id/eprint/55624 |
Actions (login required)
![]() |
Edit View |
Statistics
Views
Views per month over past year
Downloads
Downloads per month over past year

Tools
Tools





