Evaluating Deep Learning Models for Vocabulary Alignment at Scale in the UMLS Metathesaurus

Bajaj, Goonmeet; Nguyen, Vinh; Wijesiriwardene, Thilini; Yip, H. Y.; Javangula, Vishesh; Parthasarathy, Srinivasan; Sheth, Amit; Bodenreider, O.

Javascript is disabled or not supported in your browser. JavaScript must be enabled in order for you to use WIKINDX fully. Enable JavaScript through your browser options then try again, otherwise, try using a different browser.

WIKINDX

WIKINDX Resources

Bajaj, G., Nguyen, V., Wijesiriwardene, T., Yip, H. Y., Javangula, V., & Parthasarathy, S., et al. Evaluating Deep Learning Models for Vocabulary Alignment at Scale in the UMLS Metathesaurus.

Resource type: Journal Article
BibTeX citation key: anonk
View all bibliographic details

Categories: General
Creators: Bajaj, Bodenreider, Javangula, Nguyen, Parthasarathy, Sheth, Wijesiriwardene, Yip

Attachments

URLs https://www.semant ... ?email_index=1-0-1

Abstract

This paper evaluates different approaches of employing biomedical BERT-based Transformer models and Graph Attention Networks for synonymy prediction in the UMLS Metathesaurus to validate if using the BERT models or GNNs can actually outperform the existing approaches for synonymy prediction. The current UMLS (Unified Medical Language System) Metathesaurus construction process for integrating over 200 biomedical source vocabularies is expensive and error-prone as it relies on the lexical algorithms and human editors for deciding if the two biomedical terms are synonymous. Recent work has aimed to improve the Metathesaurus construction process using a deep learning approach with a Siamese Network initialized with BioWordVec embeddings for predicting synonymy among biomedical terms. Recent advances in Natural Language Processing, such as Transformer models, and Graph Neural Networks (GNN), such as Graph Attention Networks (GAT), have achieved state-of-the-art (SOTA) on different downstream tasks. Therefore, these techniques are therefore logical candidates for a synonymy prediction task as well. In this paper, we evaluate different approaches of employing biomedical BERT-based Transformer models and Graph Attention Networks for synonymy prediction. We employ BERT models in two model architectures: (1) Siamese Network, and (2) Transformer for predicting synonymy in the UMLS Metathesaurus. We aim to validate if using the BERT models or GNNs can actually outperform the existing approaches for synonymy prediction. In the existing Siamese Networks with LSTM and BioWordVec embeddings, we replace the BioWordVec embeddings with the biomedical BERT embeddings extracted from each BERT model using different ways of extraction. For the Transformer architecture, we evaluate the use of the different biomedical BERT models that have been pre-trained using different datasets and tasks. For the GNN architecture, we formulate synonymy prediction as a link prediction task use a graph neural network (GNN) with a graph attention layer to predict if two terms are synonymous in the UMLS Metatharsus. Given the SOTA performance of these BERT models for other downstream tasks, our experiments yield surprisingly interesting results: (1) employing these biomedical BERT-based models do not outperform the existing approaches us-ing Siamese Network with BioWordVec embeddings for the UMLS synonymy prediction task, (2) the original BioBERT large model that has not been pre-trained with the UMLS outperforms the SapBERT models that have been pre-trained with the UMLS, and (3) using the Siamese Networks yields

Notes

[Online; accessed 31. May 2024]

WIKINDX 6.11.0 | Total resources: 209 | Username: -- | Bibliography: WIKINDX Master Bibliography | Style: American Psychological Association (APA)