Publications

Looking At The Body: Automatic Analysis of Body Gestures and Self-Adaptors in Psychological Distress

Published in IEEE Transactions on Affective Computing, 2021

Psychological distress is a significant and growing issue in society. Automatic detection, assessment, and analysis of such distress is an active area of research. Compared to modalities such as face, head, and vocal, research investigating the use of the body modality for these tasks is relatively sparse. This is, in part, due to the limited available datasets and difficulty in automatically extracting useful body features. Recent advances in pose estimation and deep learning have enabled new approaches to this modality and domain. To enable this research, we have collected and analyzed a new dataset containing full body videos for short interviews and self-reported distress labels. We propose a novel method to automatically detect self-adaptors and fidgeting, a subset of self-adaptors that has been shown to be correlated with psychological distress. We perform analysis on statistical body gestures and fidgeting features to explore how distress levels affect participants' behaviors. We then propose a multi-modal approach that combines different feature representations using Multi-modal Deep Denoising Auto-Encoders and Improved Fisher Vector Encoding. We demonstrate that our proposed model, combining audio-visual features with automatically detected fidgeting behavioral cues, can successfully predict distress levels in a dataset labeled with self-reported anxiety and depression levels.

Recommended citation: Lin, W., Orton, I., Li, Q., Pavarini, G., & Mahmoud, M. (2020). Looking At The Body: Automatic Analysis of Body Gestures and Self-Adaptors in Psychological Distress. arXiv preprint arXiv:2007.15815. https://arxiv.org/abs/2007.15815

Learning Similarity between Movie Characters and Its Potential Implications on Understanding Human Experiences

Published in Proceedings of the Third Workshop on Narrative Understanding (NAACL 2021), 2021

While many different aspects of human experiences have been studied by the NLP community, none has captured its full richness. We propose a new task to capture this richness based on an unlikely setting: movie characters. We sought to capture theme-level similarities between movie characters that were community-curated into 20,000 themes. By introducing a two-step approach that balances performance and efficiency, we managed to achieve 9-27% improvement over recent paragraph-embedding based methods. Finally, we demonstrate how the thematic information learnt from movie characters can potentially be used to understand themes in the experience of people, as indicated on Reddit posts.

Recommended citation: Wang, Z., Lin, W., & Wu, X. (2021, June). Learning Similarity between Movie Characters and Its Potential Implications on Understanding Human Experiences. In Proceedings of the Third Workshop on Narrative Understanding (pp. 24-35). https://aclanthology.org/2021.nuse-1.3/

Message-Aware Graph Attention Networks for Large-Scale Multi-Robot Path Planning

Published in IEEE Robotics and Automation Letters, 2021

The domains of transport and logistics are increasingly relying on autonomous mobile robots for the handling and distribution of passengers or resources. At large system scales, finding decentralized path planning and coordination solutions is key to efficient system performance. Recently, Graph Neural Networks (GNNs) have become popular due to their ability to learn communication policies in decentralized multi-agent systems. Yet, vanilla GNNs rely on simplistic message aggregation mechanisms that prevent agents from prioritizing important information. To tackle this challenge, in this letter, we extend our previous work that utilizes GNNs in multi-agent path planning by incorporating a novel mechanism to allow for message-dependent attention . Our Message-Aware Graph Attention neTwork (MAGAT) is based on a key-query-like mechanism that determines the relative importance of features in the messages received from various neighboring robots. We show that MAGAT is able to achieve a performance close to that of a coupled centralized expert algorithm. Further, ablation studies and comparisons to several benchmark models show that our attention mechanism is very effective across different robot densities and performs stably in different constraints in communication bandwidth. Experiments demonstrate that our model is able to generalize well in previously unseen problem instances, and that it achieves a 47% improvement over the benchmark success rate, even in very large-scale instances that are _100 larger than the training instances.

Recommended citation: Q. Li, W. Lin, Z. Liu and A. Prorok, "Message-Aware Graph Attention Networks for Large-Scale Multi-Robot Path Planning," in IEEE Robotics and Automation Letters, vol. 6, no. 3, pp. 5533-5540, July 2021, doi: 10.1109/LRA.2021.3077863. https://ieeexplore.ieee.org/abstract/document/9424371/

Common pitfalls and recommendations for using machine learning to detect and prognosticate for COVID-19 using chest radiographs and CT scans [as collaborative authors]

Published in Nature Machine Intelligence volume 3, pages199Ð217 (2021), 2021

Machine learning methods offer great promise for fast and accurate detection and prognostication of coronavirus disease 2019 (COVID-19) from standard-of-care chest radiographs (CXR) and chest computed tomography (CT) images. Many articles have been published in 2020 describing new machine learning-based models for both of these tasks, but it is unclear which are of potential clinical utility. In this systematic review, we consider all published papers and preprints, for the period from 1 January 2020 to 3 October 2020, which describe new machine learning models for the diagnosis or prognosis of COVID-19 from CXR or CT images. All manuscripts uploaded to bioRxiv, medRxiv and arXiv along with all entries in EMBASE and MEDLINE in this timeframe are considered. Our search identified 2,212 studies, of which 415 were included after initial screening and, after quality screening, 62 studies were included in this systematic review. Our review finds that none of the models identified are of potential clinical use due to methodological flaws and/or underlying biases. This is a major weakness, given the urgency with which validated COVID-19 models are needed. To address this, we give many recommendations which, if followed, will solve these issues and lead to higher-quality model development and well-documented manuscripts.

Recommended citation: Roberts, M., Driggs, D., Thorpe, M. et al. Common pitfalls and recommendations for using machine learning to detect and prognosticate for COVID-19 using chest radiographs and CT scans. Nat Mach Intell 3, 199Ð217 (2021). https://doi.org/10.1038/s42256-021-00307-0 https://www.nature.com/articles/s42256-021-00307-0

Multimodal Deep Learning Framework for Mental Disorder Recognition

Published in 2020 15th IEEE International Conference on Automatic Face and Gesture Recognition (FG 2020), 2020

Current methods for mental disorder recognition mostly depend on clinical interviews and self-reported scores that can be highly subjective. Building an automatic recognition system can help in early detection of symptoms and providing insights into the biological markers for diagnosis. It is, however, a challenging task as it requires taking into account indicators from different modalities, such as facial expressions, gestures, acoustic features and verbal content. To address this issue, we propose a general-purpose multimodal deep learning framework, in which multiple modalities - including acoustic, visual and textual features - are processed individually with the cross-modality correlation considered. Specifically, a Multimodal Deep Denoising Autoencoder (multi- DDAE) is designed to obtain multimodal representations of audio-visual features followed by the Fisher Vector encoding which produces session-level descriptors. For textual modality, a Paragraph Vector (PV) is proposed to embed the transcripts of interview sessions into document representations capturing cues related to mental disorders. Following an early fusion strategy, both audio-visual and textual features are then fused prior to feeding them to a Multitask Deep Neural Network (DNN) as the final classifier. Our framework is evaluated on the automatic detection of two mental disorders: bipolar disorder (BD) and depression, using two datasets: Bipolar Disorder Corpus (BDC) and the Extended Distress Analysis Interview Corpus (E-DAIC), respectively. Our experimental evaluation results showed comparable performance to the state-of-the-art in BD and depression detection, thus demonstrating the effective multimodal representation learning and the capability to generalise across different mental disorders.

Recommended citation: Z. Zhang, W. Lin, M. Liu and M. Mahmoud, "Multimodal Deep Learning Framework for Mental Disorder Recognition," 2020 15th IEEE International Conference on Automatic Face and Gesture Recognition (FG 2020), 2020, pp. 344-350, doi: 10.1109/FG47880.2020.00033. https://ieeexplore.ieee.org/abstract/document/9320154/

Automatic detection of self-adaptors for psychological distress

Published in 2020 15th IEEE International Conference on Automatic Face and Gesture Recognition (FG 2020), 2020

Psychological distress is a significant and growing issue in society. Automatic detection, assessment, and analysis of such distress is an active area of research. Compared to modalities such as face, head, and vocal, research investigating the use of the body modality for these tasks is relatively sparse. This is, in part, due to the lack of available datasets and difficulty in automatically extracting useful body features. Recent advances in pose estimation and deep learning have enabled new approaches to this modality and domain. We propose a novel method to automatically detect self-adaptors and fidgeting, a subset of self-adaptors that has been shown to be correlated with psychological distress. We also propose a multi-modal approach that combines different feature representations using Multi-modal Deep Denoising Auto-Encoders and Improved Fisher Vector encoding. We also demonstrate that our proposed model, combining audio-visual features with automatically detected fidgeting behavioral cues, can successfully predict distress levels in a dataset labeled with self-reported anxiety and depression levels. To enable this research we introduce a new dataset containing full body videos for short interviews and self-reported distress labels.

Recommended citation: W. Lin, I. Orton, M. Liu and M. Mahmoud, "Automatic Detection of Self-Adaptors for Psychological Distress," 2020 15th IEEE International Conference on Automatic Face and Gesture Recognition (FG 2020), 2020, pp. 371-378, doi: 10.1109/FG47880.2020.00032. https://ieeexplore.ieee.org/abstract/document/9320202

Author2Vec: A Novel Framework for Generating User Embedding

Published in Arxiv, 2020

Online forums and social media platforms provide noisy but valuable data every day. In this paper, we propose a novel end-to-end neural network-based user embedding system, Author2Vec. The model incorporates sentence representations generated by BERT (Bidirectional Encoder Representations from Transformers) with a novel unsupervised pre-training objective, authorship classification, to produce better user embedding that encodes useful user-intrinsic properties. This user embedding system was pre-trained on post data of 10k Reddit users and was analyzed and evaluated on two user classification benchmarks: depression detection and personality classification, in which the model proved to outperform traditional count-based and prediction-based methods. We substantiate that Author2Vec successfully encoded useful user attributes and the generated user embedding performs well in downstream classification tasks without further finetuning.

Recommended citation: Wu, X., Lin, W., Wang, Z., & Rastorgueva, E. (2020). Author2vec: A framework for generating user embedding. arXiv preprint arXiv:2003.11627. https://arxiv.org/abs/2003.11627

No, you are not alone: A better way to find people with similar experiences on Reddit

Published in Proceedings of the 5th Workshop on Noisy User-generated Text (W-NUT 2019), 2019

We present a probabilistic clustering algorithm that can help Reddit users to find posts that discuss experiences similar to their own. This model is built upon the BERT Next Sentence Prediction model and reduces the time complexity for clustering all posts in a corpus from O(nö2) to O(n) with respect to the number of posts. We demonstrate that such probabilistic clustering can yield a performance better than baseline clustering methods based on Latent Dirichlet Allocation (Blei et al., 2003) and Word2Vec (Mikolov et al., 2013). Furthermore, there is a high degree of coherence between our probabilistic clustering and the exhaustive comparison O(nö2) algorithm in which the similarity between every pair of posts is found. This makes the use of the BERT Next Sentence Prediction model more practical for unsupervised clustering tasks due to the high runtime overhead of each BERT computation.

Recommended citation: Wang, Z., Rastorgueva, E., Lin, W., & Wu, X. (2019, November). No, you are not alone: A better way to find people with similar experiences on Reddit. InÊProceedings of the 5th Workshop on Noisy User-generated Text (W-NUT 2019)Ê(pp. 307-315). https://www.aclweb.org/anthology/D19-5540/

Detecting personal attributes through analyzing online forums

Published in Cambridge Language Sciences Early Careers Researchers Symposium, 2019

This is a presentation in Cambridge Language Sciences Early Careers Researchers Symposium.

Recommended citation: Zhilin Wang, Xiaodong Wu, Weizhe Lin and Elena Rastorgueva. Detecting personal attributes through analyzing online forums. 2019. In Cambridge Language Sciences Early Careers Researchers Symposium. https://github.com/Zhilin123/Publications/blob/master/Cambridge%20Language%20Sciences%20ECR.pdf