On the Utility of Enhancing BERT Syntactic Bias with Token Reordering Pretraining

Mesbahi, Yassir El; Mahmud, A.; Ghaddar, Abbas; Rezagholizadeh, Mehdi; Langlais, P.; Parthasarathi, Prasanna

Javascript is disabled or not supported in your browser. JavaScript must be enabled in order for you to use WIKINDX fully. Enable JavaScript through your browser options then try again, otherwise, try using a different browser.

WIKINDX

WIKINDX Resources

Mesbahi, Y. E., Mahmud, A., Ghaddar, A., Rezagholizadeh, M., Langlais, P., & Parthasarathi, P. On the utility of enhancing BERT syntactic bias with Token Reordering Pretraining.

Resource type: Journal Article
BibTeX citation key: anon.117
View all bibliographic details

Categories: General
Creators: Ghaddar, Langlais, Mahmud, Mesbahi, Parthasarathi, Rezagholizadeh

Attachments

URLs https://www.semant ... 7beb8331eab341f13?

Abstract

A novel TOR pretraining objective is designed which predicts whether two tokens are adjacent or not given a partial bag-of-tokens input, and the usefulness of Graph Isomorphism Network (GIN) is investigated, when placed on top of the BERT encoder, in order to enhance the overall model ability to leverage topological signal from the encoded representations. Self-supervised Language Modelling (LM) objectives —like BERT masked LM— have become the default choice for pretraining language models. TOken Reordering (TOR) pretraining objectives, beyond token prediction, have not been extensively studied yet. In this work, we explore challenges that underlie the development and usefulness of such objectives on downstream language tasks. In particular, we design a novel TOR pretraining objective which predicts whether two tokens are adjacent or not given a partial bag-of-tokens input. In addition, we investigate the usefulness of Graph Isomorphism Network (GIN), when placed on top of the BERT encoder, in order to enhance the overall model ability to leverage topological signal from the encoded representations. We compare language understanding abilities of TOR to the one of MLM on word-order sensitive (e.g. Dependency Parsing) and insensitive (e.g. text classification) tasks in both full training and few-shot settings. Our results indicate that TOR is competitive to MLM on the GLUE language understanding benchmark, and slightly superior on syntax-dependent datasets, especially in the few-shot setting.

WIKINDX 6.11.0 | Total resources: 209 | Username: -- | Bibliography: WIKINDX Master Bibliography | Style: American Psychological Association (APA)