Semanformer: Semantics-aware Embedding Dimensionality Reduction Using Transformer-Based Models

Boyapati, Mallika; Aygun, Ramazan S.

Javascript is disabled or not supported in your browser. JavaScript must be enabled in order for you to use WIKINDX fully. Enable JavaScript through your browser options then try again, otherwise, try using a different browser.

WIKINDX

WIKINDX Resources

Boyapati, M., & Aygun, R. S. Semanformer: Semantics-aware Embedding Dimensionality Reduction Using Transformer-Based Models.

Resource type: Journal Article
BibTeX citation key: anonp
View all bibliographic details

Categories: General
Creators: Aygun, Boyapati

Attachments

URLs https://www.semant ... tm_medium=31101740

Abstract

A novel framework named as semanformer (semantics-aware encoder-decoder dimensionality reduction method) that leverages transformer-based encoder-decoder model architecture to perform dimensionality reduction on BERT embeddings for a corpus while preserving crucial semantic information is proposed. In recent years, transformer-based models, particularly BERT (Bidirectional encoder Representations from Transformers), have revolutionized natural language processing tasks, achieving state-of-the-art performance in various domains. In the context of natural language processing (NLP) and linguistics, understanding the semantic aspects of text is crucial for tasks like information retrieval, sentiment analysis, machine translation, and many others. However, the high dimensionality of BERT embeddings presents challenges in real-world applications due to increased memory and computational requirements. Reducing the dimensionality of BERT embeddings would benefit many application downstream tasks by reducing the computational requirements. Although there are prevalently used dimensionality reduction methods which focus on feature representation with lower dimensions, their application on NLP tasks may not yield semantically correct results. We propose a novel framework named as semanformer (semantics-aware encoder-decoder dimensionality reduction method) that leverages transformer-based encoder-decoder model architecture to perform dimensionality reduction on BERT embeddings for a corpus while preserving crucial semantic information. To evaluate the effectiveness of our approach, we conduct a comprehensive use case evaluation on diverse text datasets by sentence reconstruction. Our experiments show that our proposed method achieves high sentence reconstruction accuracy (SRA) more than 83\% compared to the traditional dimensionality reduction methods such as PCA (SRA {$<$} 66\%) and t-SNE (SRA {$<$} 9\%).

WIKINDX 6.11.0 | Total resources: 209 | Username: -- | Bibliography: WIKINDX Master Bibliography | Style: American Psychological Association (APA)