Morazzoni, I., Scotti, V., & Tedesco, R. Def2Vec: Extensible Word Embeddings from Dictionary Definitions.
|
 |
|
Abstract
|
The understanding of word embedding generation is advanced by incorporating structured lexical information and efficient embedding extension by effectively reconciling the advantages of dictionary definitions with LSA-based embeddings. D EF 2V EC introduces a novel paradigm for word embeddings, leveraging dictionary definitions to learn semantic representations. By constructing term-document matrices from definitions and applying Latent Semantic Analysis (LSA), D EF 2V EC generates embeddings that offer both strong performance and extensibility. In evaluations encompassing Part-of-Speech tagging , Named Entity Recognition , chunking , and semantic similarity , D EF 2V EC often matches or surpasses state-of-the-art models like W ORD 2V EC , G LO V E , and F AST T EXT . Our model’s second factorised matrix resulting from LSA enables efficient embedding extension for out-of-vocabulary words. By effectively reconciling the advantages of dictionary definitions with LSA-based embeddings, D EF 2V EC yields informative semantic representations, especially considering its reduced data requirements. This paper advances the understanding of word embedding generation by incorporating structured lexical information and efficient embedding extension.
|