Multimodal Multisensor attention modelling

Taheri, M.H. ORCID: 0000-0001-7594-4530, 2020. Multimodal Multisensor attention modelling. PhD, Nottingham Trent University.

Thesis_Multimodal Multisensor Attention Tracking _Mohammad Taheri.pdf - Published version

Download (19MB) | Preview


Introduction: Sustaining attention is one of the most important factors in determining successful outcomes and deep learning in students. Existing approaches to track student engagement involve periodic human observations that are subject to inter-rater reliability. Our solution uses real-time Multimodal Multisensor data labeled by objective performance outcomes to track the attention of students.

Method: The study involves four students with a combined diagnosis of cerebral palsy and a learning disability who took part in a 3-month trial over 59 sessions. Multimodal Multisensor data were collected while they participated in a Continuous Performance Test (CPT). Eyegaze, electroencephalogram, body pose, and interaction data were used to create a model of student attention through objective labeling from the Continuous Performance Test outcomes. To achieve this, a type of continuous performance test is introduced, the Seek-X type. Nine features were extracted including High-Level handpicked Compound Features (HLCF). Using leave-one-out cross-validation, a series of different machine learning approaches were evaluated.

Research questions:

RQ1: Can we create a model of attention for PMLD/CP students using the CPT?

RQ2: What are the main correlations found in the CPT outcomes and the Multimodal Multisensor data?

Results: Overall, the random forest classification approach achieved the best classification results. Using random forest, 84.8% classification for attention and 65.4% accuracy for inattention were achieved. We compared these results to outcomes from different models: AdaBoost, decision tree, k-Nearest Neighbor, naïve Bayes, neural network, and support vector machine. We showed that using a multisensor approach achieved higher accuracy than using features from any reduced set of sensors. Incorporating person-specific data improved the classification outcome, compared to being participant neutral. We found that using HighLevel handpicked Compound Features (HLCF) can improve the classification accuracy in every sensor mode. Our approach is robust to both sensor fallout and occlusions. The single most important sensor feature to the classification of attention and inattention was shown to be eye-gaze. We have shown that we can accurately predict the level of attention of students with learning disabilities in a real-time approach that is not subject to inter-rater reliability, human observation, or reliant on a single mode of sensor input. In total, 2475 separate correlation tests were carried over 55 data points using Pearson’s correlation coefficient. Data points from the SDT, CPT outcomes measures, Multimodal Multisensor features, and participant characteristics were assessed longitudinally for cross-correlation significance. A strong positive correlation was found between participant ability to maintain sustained and selective attention in the CPT to their academic progress in school (d′), P < .01. Participants who showed more inhibition in tests had progressed further in their academic assessments P < .01. The Seek-X type CPT also showed specific physiological characteristics, including body movement range and eye-gaze that were significant in P scales such as ‘Reading’ and ‘Listening’ P < .05. We found that participant bias was overall liberal B″D < 0. Participants iii showed no significant bias change during the sessions, and we found no significant correlation between bias (B″D) and sensitivity (d′).

Conclusion: An approach to labeling Multimodal Multisensor data to train machine-learning algorithms to track the attention of students with profound and multiple disabilities has been presented. We posit that this approach can overcome the variation in observer inter-rater reliability when using standardized scales in tracking the emotional expression of students with such profound disabilities. The accuracy of our approach increases with multiple modes of sensor input, and our method is robust to sensor occlusion and fall-out. Multiple sources of sensor input are provided, to accommodate a wide variety of users and their needs. Our model can reliably track the attention of students with profound disabilities, regardless of the sensors available. A system incorporating this model can help teachers design personalized interventions for a very heterogeneous group of students, where teachers cannot possibly attend to each of their individual needs. This approach could be used to identify those with the greatest learning challenges, to guarantee that all students are supported to reach their full potential.

Keywords—Affective computing in education, affect detection, attention, continuous performance test, engagement, flow, HCI, interaction, learning disabilities, machine learning, multimodal, multisensor, physiological sensors, Signal Detection Theory, selective attention, sustained attention, student engagement.

Item Type: Thesis
Creators: Taheri, M.H.
Date: May 2020
Divisions: Schools > School of Science and Technology
Record created by: Jeremy Silvester
Date Added: 27 Nov 2020 15:46
Last Modified: 31 May 2021 15:12

Actions (login required)

Edit View Edit View


Views per month over past year


Downloads per month over past year