Multimodal fusion towards crime prevention on the edge

Anwar, A. ORCID: 0000-0001-5347-4996, 2023. Multimodal fusion towards crime prevention on the edge. PhD, Nottingham Trent University.

Amna_Anwar_2023.pdf - Published version

Download (4MB) | Preview


Detecting violent language is a complex problem in preventing crime and harmful content. Violent language detection in real-time conversations is therefore a novel problem in computer science, with most current solutions focusing on the either text-based or audio-based solutions. These solutions will often miss the wider context, without audio it is difficult to extract auditory features, and without text it is difficult to understand the language used. In addition, there has been growing interest in the use of edge computing technologies to prevent crime. Edge computing is the processing of data at or close to the edge of the network, as opposed to sending it to a centralised data centre. Faster response times, lower bandwidth needs, and improved data security are just a few benefits of this strategy for preventing crime, which when combined with a multimodal dataset could achieve improved detection while preserving user privacy.

This thesis investigates the practical application of multimodal data fusion and edge computing for crime prevention, specifically focussing on the detection of violent language in conversations from text and audio data. A fusion algorithm that combines natural language pro cessing (NLP) techniques of Bidirectional Encoder Representations from Transformers (BERT) and Linguistic Inquiry and Word Count (LWIC), in addition to Mel-frequency cepstral coefficients (MFCC) and time-frequency domain features was developed. The resulting F1 score of 0.85 demonstrates the effectiveness of the algorithm in identifying potential instances of violent conversations related to domestic violence or public safety when compared to single modality results. However, the initial iteration of the algorithm required substantial computational resources, leading to its compression using model reduction for deployment on edge devices such as mobile phones and smart home devices.

To facilitate real-time detection, a mobile application and a cost effective smart home device were developed, utilising a model reduction approach. The mobile application enables timely identification of violent conversations, while the smart home device serves as an alternative for people without access to mobile phones. The approach gives consideration to contextual factors such as microphone quality and device positioning, which influence the algorithm’s adaptability to different scenarios. Future research aims to enhance the accuracy of the model, improve the realism of training data, and explore innovative approaches for contextual analysis and result normalisation. This thesis contributes to the advancement of multimodal technologies for crime prevention, highlighting the importance of data fusion and edge devices in this domain.

Item Type: Thesis
Creators: Anwar, A.
Kanjo, E.Thesis
Sanei, S.Thesis
Oikonomou, A.Thesis
Date: July 2023
Rights: The copyright in this work is held by the author. You may copy up to 5% of this work for private study, or personal, non-commercial research. Any re-use of the information contained within this document should be fully referenced, quoting the author, title, university, degree level and pagination. Queries or requests for any other use, or if a more substantial copy is required, should be directed to the author.
Divisions: Schools > School of Science and Technology
Record created by: Jeremy Silvester
Date Added: 26 Jun 2024 11:17
Last Modified: 26 Jun 2024 11:17

Actions (login required)

Edit View Edit View


Views per month over past year


Downloads per month over past year