Rath, M., Banerjee, S., & Swain, T. Fine Tuning Auto Regressive LLMs for Long Document Abstractive Summarization.
|
 |
|
Abstract
|
Cerebras’ wafer-scale cluster is used to provide an efficient software and hardware infrastructure that enhances the capabilities of pre-existing models and empowers them to handle lengthy documents well and to accommodate lengthy documents as much as possible. Generating a short summary from a long document is a challenging task, for which new language models are still being designed and trained based on the available data. Since deep learning models are used for NLP and NLG applications, it requires high computational power to train these models. Further, fine-tuning the weights based on a given context is an important task that needs additional computation space and time. In this paper, we have used Cerebras’ wafer-scale cluster that aims at providing an efficient software and hardware infrastructure that enhances the capabilities of pre-existing models and empowers them to handle lengthy documents well. In addition to analyzing common models along with their pros and cons, other factors such as the context lengths and model sizes have also been analyzed to accommodate lengthy documents as much as possible.
|