CLASSIC Utterance Boundary: a chunking-based model of early naturalistic word segmentation

Cabiddu, F, Bott, L, Jones, G ORCID logoORCID: https://orcid.org/0000-0003-3867-9947 and Gambi, C, 2023. CLASSIC Utterance Boundary: a chunking-based model of early naturalistic word segmentation. Language Learning, 73 (3), pp. 942-975. ISSN 0023-8333

[thumbnail of 1629413_Jones.pdf]
Preview
Text
1629413_Jones.pdf - Post-print

Download (1MB) | Preview
[thumbnail of 1629413_Jones_Supp_1.pdf]
Preview
Text
1629413_Jones_Supp_1.pdf - Supplemental Material

Download (1MB) | Preview

Abstract

Word segmentation is a crucial step in children’s vocabulary learning. While computational models of word segmentation can capture infants’ performance in small-scale artificial tasks, the examination of early word segmentation in naturalistic settings has been limited by the lack of measures that can relate models’ performance to developmental data. Here, we extended CLASSIC (Jones et al., 2021) - a corpus-trained chunking model that can simulate several memory, phonological and vocabulary learning phenomena - to allow it to perform word segmentation using utterance boundary information (henceforth CLASSIC-UB). Further, we compared our model to children on a wide range of new measures, capitalizing on the link between word segmentation and vocabulary learning abilities. We show that the combination of chunking and utterance-boundary information used by CLASSIC-UB allows a better prediction of English-learning children's output vocabulary than other models.

Item Type: Journal article
Alternative Title: CLASSIC-Utterance-Boundary chunking-based model
Publication Title: Language Learning
Creators: Cabiddu, F., Bott, L., Jones, G. and Gambi, C.
Publisher: Wiley
Date: September 2023
Volume: 73
Number: 3
ISSN: 0023-8333
Identifiers:
Number
Type
10.1111/lang.12559
DOI
1629413
Other
Rights: This is the peer reviewed version of the following article: Cabiddu, F., Bott, L., Jones, G., & Gambi, C. (2023). CLASSIC Utterance Boundary: a chunking-based model of early naturalistic word segmentation. Language Learning, 73(3), 942-975, which has been published in final form at https://doi.org/10.1111/lang.12559. This article may be used for non-commercial purposes in accordance with Wiley Terms and Conditions for Use of Self-Archived Versions. This article may not be enhanced, enriched or otherwise transformed into a derivative work, without express permission from Wiley or by statutory rights under applicable legislation. Copyright notices must not be removed, obscured or modified. The article must be linked to Wiley’s version of record on Wiley Online Library and any embedding, framing or otherwise making available the article or pages thereof by third parties from platforms, services and websites other than Wiley Online Library must be prohibited.
Divisions: Schools > School of Social Sciences
Record created by: Laura Ward
Date Added: 20 Dec 2022 15:31
Last Modified: 02 Feb 2024 03:04
URI: https://irep.ntu.ac.uk/id/eprint/47685

Actions (login required)

Edit View Edit View

Statistics

Views

Views per month over past year

Downloads

Downloads per month over past year