You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Add new Arabic benchmarks (5) and enhance existing tasks (#372)
* Update arabic_evals.py
Add new Arabic benchmarks and update existing tasks
- Renamed `arabic_mmlu` to `arabic_mmlu_mt` to highlight its machine-translated origin.
- Added new benchmarks: `arabic_mmlu` ArabicMMLU (https://arxiv.org/abs/2402.12840), `arabic_mmlu_ht` (human-translated), and `MadinahQA` from MBZUAI. As well as `arabic_mmmlu` (OpenAI MMMLU), and `AraTrust` a trustworthiness benchmark for Arabic LLMs (https://arxiv.org/abs/2403.09017).
- Enhanced prompt functions for better flexibility in answer options.
* Update and rename OALL_tasks.txt to OALL_v1_tasks.txt
Rename file to refelect that it is v1 leaderboard tasks
* Create OALL_v2_tasks.txt
Tasks for v2 of OALL
* Update all_arabic_tasks.txt
add new and renamed tasks
* Update arabic_evals.py
Fix formatting issues for
* Update all_arabic_tasks.txt
Add missing task: OpenAI's MMMLU arabic subset
* Update all_arabic_tasks.txt
Correct order
* Update arabic_evals.py
remove openai mmmlu task following the discussion here: #372
* Update all_arabic_tasks.txt
remove openai mmmlu task following the discussion here: #372
* Update tasks.py
Adding a templated version of arabic mmlu based on @hynky1999 request in the #372 PR
* Update tasks.py
remove arabic_mmlu_templated_tasks
---------
Co-authored-by: Clémentine Fourrier <[email protected]>
Co-authored-by: Nathan Habib <[email protected]>
0 commit comments