Semantic Embeddings for Arabic Retrieval Augmented Generation (ARAG)

Abdelazim, Hazem; Tharwat, Mohamed; Mohamed, Ammar

Javascript is disabled or not supported in your browser. JavaScript must be enabled in order for you to use WIKINDX fully. Enable JavaScript through your browser options then try again, otherwise, try using a different browser.

WIKINDX

WIKINDX Resources

Abdelazim, H., Tharwat, M., & Mohamed, A. Semantic Embeddings for Arabic Retrieval Augmented Generation (ARAG).

Resource type: Journal Article
BibTeX citation key: anona
View all bibliographic details

Categories: General
Creators: Abdelazim, Mohamed, Tharwat

Attachments

URLs https://www.semant ... bb916c860f77bc68b6

Abstract

An extensive evaluation of the performance of ten cutting-edge Multilingual Semantic embedding models, employing a publicly available ARCD dataset as a benchmark and assessing their performance using the average Recall@k metric showed that the Microsoft E5 sentence embedding model outperformed all other models on the AR CD dataset. —In recent times, Retrieval Augmented Generation (RAG) models have garnered considerable attention, primarily due to the impressive capabilities exhibited by Large Language Models (LLMs). Nevertheless, the Arabic language, despite its significance and widespread use, has received relatively less research emphasis in this field. A critical element within RAG systems is the Information Retrieval component, and at its core lies the vector embedding process commonly referred to as “semantic embedding”. This study encompasses an array of multilingual semantic embedding models, intending to enhance the model’s ability to comprehend and generate Arabic text effectively. We conducted an extensive evaluation of the performance of ten cutting-edge Multilingual Semantic embedding models, employing a publicly available ARCD dataset as a benchmark and assessing their performance using the average Recall@k metric. The results showed that the Microsoft E5 sentence embedding model outperformed all other models on the ARCD dataset, with Recall@10 exceeding 90\%

WIKINDX 6.11.0 | Total resources: 209 | Username: -- | Bibliography: WIKINDX Master Bibliography | Style: American Psychological Association (APA)