WIKINDX

WIKINDX Resources

Tanaka, H., & Shinnou, H. Vocabulary expansion of compound words for domain adaptation of BERT. 
Resource type: Journal Article
BibTeX citation key: anon.163
View all bibliographic details
Categories: General
Creators: Shinnou, Tanaka
Attachments   URLs   https://www.semant ... 4425dad210bc2d71c8
Abstract
The proposed method assumes domain adaptation by additional pretraining and expands the vocabulary by embedding a synonym as an approximate embedding of additional words in the masked language model. Pretraining models such as BERT, have achieved high accuracy in various natu-ral language processing tasks by pretraining on a large corpus and fine-tuning on downstream task data. However, BERT trains token-level inferences, which make it difficult to train unknown or compound words that are split by byte-pair encoding. In this paper, we propose an effective method for constructing word representations in vocabulary expansions for such compound words. The proposed method assumes domain adaptation by additional pretraining and expands the vocabulary by embedding a synonym as an approximate embedding of additional words. We conducted experiments using each vocabulary expansion method and evaluated these experiments for their accuracies in predicting additional vocabularies in the masked language model.
  
WIKINDX 6.11.0 | Total resources: 209 | Username: -- | Bibliography: WIKINDX Master Bibliography | Style: American Psychological Association (APA)