Zheng, Z., Fang, L., & Cao, Y. VART: Vocabulary Adapted BERT Model for Multi-label Document Classification.
|
 |
|
Abstract
|
VART is presented, a concise pre-training method to adapt BERT model by learn OOV word representations for multi-label document classification (MLDC) task that consistently out-performs the conventional PTLM adaptation methods such as fine-tuning, task adaption and other pre-trained model adaptation methods. Large scale pre-trained language models (PTLMs) such as BERT have been widely used in various natural language processing (NLP) tasks, since PTLMs greatly improve the downstream task performances by fine-tuning the parameters on the target task datasets. However, in many NLP tasks, such as document classification, the task datasets often contain numerous domain specific words which are not included in the vocabulary of the original PTLM. Those out-of-vocabulary (OOV) words tend to carry useful domain knowledge for the downstream tasks. The domain gap caused by OOV words may limit the effectiveness of PTLM. In this paper, we present VART, a concise pre-training method to adapt BERT model by learn OOV word representations for multi-label document classification (MLDC) task. VART employs an extended embedding layer to learn the OOV word representations. The extended layer can be pre-trained on the task datasets with high efficiency and low computational resource. The experiments for MLDC task on three datasets from different domains with different sizes demonstrate that VART consistently out-performs the conventional PTLM adaptation methods such as fine-tuning, task adaption and other pre-trained model adaptation methods.
|