Phan, D., & Le, H. T. Utilize Pre-Trained PhoBERT to Compute Text Similarity and Rerank Documents for Question-Answering Task.
|
 |
|
Abstract
|
Two novel strategies to improve the performance of identifying relevant passages in open-domain Question Answering are introduced, including a new method for computing the similarity between questions and text passages, and the integration of pretrained and fine-tuned models. Open-domain Question Answering (QA) is a crucial task in natural language processing. QA systems typically follow two main steps: (i) identifying relevant passages and (ii) generating answer sentences from these passages. Among these steps, identifying relevant passages poses a greater challenge and requires further refinement. In this paper, we introduce two novel strategies to improve the performance of this step, including: (i) a new method for computing the similarity between questions and text passages, and (ii) the integration of pretrained and fine-tuned models. Empirical evaluations conducted on the Zalo 2022 dataset demonstrate the efficacy of our proposed methods, manifesting a notable 10\% increase in recall compared to using the BM25 method alone, and a 6\% increase in recall compared to relying solely on a fine-tuned cross-encoder model.
|