Mesbahi, Y. E., Mahmud, A., Ghaddar, A., Rezagholizadeh, M., Langlais, P., & Parthasarathi, P. On the utility of enhancing BERT syntactic bias with Token Reordering Pretraining.
|
 |
|
Abstract
|
A novel TOR pretraining objective is designed which predicts whether two tokens are adjacent or not given a partial bag-of-tokens input, and the usefulness of Graph Isomorphism Network (GIN) is investigated, when placed on top of the BERT encoder, in order to enhance the overall model ability to leverage topological signal from the encoded representations. Self-supervised Language Modelling (LM) objectives —like BERT masked LM— have become the default choice for pretraining language models. TOken Reordering (TOR) pretraining objectives, beyond token prediction, have not been extensively studied yet. In this work, we explore challenges that underlie the development and usefulness of such objectives on downstream language tasks. In particular, we design a novel TOR pretraining objective which predicts whether two tokens are adjacent or not given a partial bag-of-tokens input. In addition, we investigate the usefulness of Graph Isomorphism Network (GIN), when placed on top of the BERT encoder, in order to enhance the overall model ability to leverage topological signal from the encoded representations. We compare language understanding abilities of TOR to the one of MLM on word-order sensitive (e.g. Dependency Parsing) and insensitive (e.g. text classification) tasks in both full training and few-shot settings. Our results indicate that TOR is competitive to MLM on the GLUE language understanding benchmark, and slightly superior on syntax-dependent datasets, especially in the few-shot setting.
|