VART: Vocabulary Adapted BERT Model for Multi-label Document Classification

Zheng, Zhongguang; Fang, Lu; Cao, Yiling

Javascript is disabled or not supported in your browser. JavaScript must be enabled in order for you to use WIKINDX fully. Enable JavaScript through your browser options then try again, otherwise, try using a different browser.

WIKINDX

WIKINDX Resources

Zheng, Z., Fang, L., & Cao, Y. VART: Vocabulary Adapted BERT Model for Multi-label Document Classification.

Resource type: Journal Article
BibTeX citation key: anon.196
View all bibliographic details

Categories: General
Creators: Cao, Fang, Zheng

Attachments

URLs https://www.semant ... 5603170e4c4a212b5a

Abstract

VART is presented, a concise pre-training method to adapt BERT model by learn OOV word representations for multi-label document classification (MLDC) task that consistently out-performs the conventional PTLM adaptation methods such as fine-tuning, task adaption and other pre-trained model adaptation methods. Large scale pre-trained language models (PTLMs) such as BERT have been widely used in various natural language processing (NLP) tasks, since PTLMs greatly improve the downstream task performances by fine-tuning the parameters on the target task datasets. However, in many NLP tasks, such as document classification, the task datasets often contain numerous domain specific words which are not included in the vocabulary of the original PTLM. Those out-of-vocabulary (OOV) words tend to carry useful domain knowledge for the downstream tasks. The domain gap caused by OOV words may limit the effectiveness of PTLM. In this paper, we present VART, a concise pre-training method to adapt BERT model by learn OOV word representations for multi-label document classification (MLDC) task. VART employs an extended embedding layer to learn the OOV word representations. The extended layer can be pre-trained on the task datasets with high efficiency and low computational resource. The experiments for MLDC task on three datasets from different domains with different sizes demonstrate that VART consistently out-performs the conventional PTLM adaptation methods such as fine-tuning, task adaption and other pre-trained model adaptation methods.

WIKINDX 6.11.0 | Total resources: 209 | Username: -- | Bibliography: WIKINDX Master Bibliography | Style: American Psychological Association (APA)