Synthetic biological signals machine-generated by GPT-2 improve the classification of EEG and EMG through data augmentation

Bird, JJ ORCID logoORCID: https://orcid.org/0000-0002-9858-1231, Pritchard, M, Fratini, A, Ekart, A and Faria, DR, 2021. Synthetic biological signals machine-generated by GPT-2 improve the classification of EEG and EMG through data augmentation. IEEE Robotics and Automation Letters, 6 (2), pp. 3498-3504. ISSN 2377-3766

[thumbnail of 1640826_Bird.pdf]
Preview
Text
1640826_Bird.pdf - Post-print

Download (1MB) | Preview

Abstract

Synthetic data augmentation is of paramount importance for machine learning classification, particularly for biological data, which tend to be high dimensional and with a scarcity of training samples. The applications of robotic control and augmentation in disabled and able-bodied subjects still rely mainly on subject-specific analyses. Those can rarely be generalised to the whole population and appear to over complicate simple action recognition such as grasp and release (standard actions in robotic prosthetics and manipulators). We show for the first time that multiple GPT-2 models can machine-generate synthetic biological signals (EMG and EEG) and improve real data classification. Models trained solely on GPT-2 generated EEG data can classify a real EEG dataset at 74.71% accuracy and models trained on GPT-2 EMG data can classify real EMG data at 78.24% accuracy. Synthetic and calibration data are then introduced within each cross validation fold when benchmarking EEG and EMG models. Results show algorithms are improved when either or both additional data are used. A Random Forest achieves a mean 95.81% (1.46) classification accuracy of EEG data, which increases to 96.69% (1.12) when synthetic GPT-2 EEG signals are introduced during training. Similarly, the Random Forest classifying EMG data increases from 93.62% (0.8) to 93.9% (0.59) when training data is augmented by synthetic EMG signals. Additionally, as predicted, augmentation with synthetic biological signals also increases the classification accuracy of data from new subjects that were not observed during training. A Robotiq 2F-85 Gripper was finally used for real-time gesture-based control, with synthetic EMG data augmentation remarkably improving gesture recognition accuracy, from 68.29% to 89.5%.

Item Type: Journal article
Publication Title: IEEE Robotics and Automation Letters
Creators: Bird, J.J., Pritchard, M., Fratini, A., Ekart, A. and Faria, D.R.
Publisher: Institute of Electrical and Electronics Engineers (IEEE)
Date: April 2021
Volume: 6
Number: 2
ISSN: 2377-3766
Identifiers:
Number
Type
10.1109/lra.2021.3056355
DOI
1640826
Other
Rights: © 2021 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works.
Divisions: Schools > School of Science and Technology
Record created by: Jeremy Silvester
Date Added: 27 Jan 2023 16:39
Last Modified: 27 Jan 2023 16:39
URI: https://irep.ntu.ac.uk/id/eprint/48103

Actions (login required)

Edit View Edit View

Statistics

Views

Views per month over past year

Downloads

Downloads per month over past year