Jayasinghe, S., Rambukkanage, L., Silva, A., & family=Silva. Learning Sentence Embeddings In The Legal Domain with Low Resource Settings.
|
 |
|
Abstract
|
This paper has conducted research to develop sentence embed-dings, specifically for the legal domain, to address the domain needs and designed a multitask model with noise discrimination and Semantic Textual Similarity tasks. As Natural Language Processing is evolving rapidly, it is used to analyze domain specific large text corpora. Applying Natural Language Processing in a domain with uncommon vocabulary and unique semantics requires techniques specifically designed for that domain. The legal domain is such an area with unique vocabulary and semantic interpretations. In this paper we have conducted research to develop sentence embed-dings, specifically for the legal domain, to address the domain needs. We have carried this research under two approaches. Due to the availability of a large corpus of raw court case documents, an Auto-Encoder model which re-constructs the input sentence is trained in a self-supervised approach. Pre-trained word embeddings on general corpora and word embeddings specifically trained on legal corpora are also incorporated within the Auto-Encoder. As the next approach we have designed a multitask model with noise discrimination and Semantic Textual Similarity tasks. It is expected that these embed-dings and gained insights would help vec-torize legal domain corpora, enabling further application of Machine Learning in the legal domain.
|