👉🏻 OmniGIRL 👈🏻

🌐 Website • 🤗 Hugging Face • 🐋 Env Docker Image • 📃 arXiv Paper · 📓 ISSTA 2025

✨ Key Features

🚀 Convenient, Standardized Evaluation Environment

Provide Pre-built Docker images, significantly simplifying the environment setup process and guaranteeing the consistency and reproducibility of evaluation tests.
🕸 Extensive Programming Language Coverage

Support Python, Java, JavaScript, and TypeScript, ensuring effective evaluation across these four major programming language ecosystems.
🗂️ Rich Multimodal Input Data

Integrate diverse modalities (text, web content, and images), requiring evaluated models to understand and leverage information from all sources to effectively resolve issues.
⚒ Automatic Environment Setup & Dataset Construction Tool

We introduce SWE-Factory, an automatic issue-resolution benchmark construction pipeline based on a multi-agent framework. For more information and the full source code, visit: SWE-Factory.

📦 Environment Setup

To get started, run the bash script below to set up the environment:

bash setup.sh

🚀 Running Evaluations

After setup the environment, you need to do following things to run evaluation:

Prepare Prediction file: Some patch files in JSONL format, each item containing:
- model_name_or_path: Model Name
- instance_id: Task Instance id
- prediction_patch: Prediction Patch Content
Example:
```
{
    "model_name_or_path": "agentless-v1",
    "instance_id": "prettier__prettier-12260",
    "model_patch": "diff --git ...."
}
```

Move to omnigirl/harness, then you can run the evaluation using the following command:

# required
cd omnigirl/harness

python run_evaluation.py --predictions_path <path of your prediction results> \
                         --max_workers <number of workers> \
                         --run_id <unique id number of this evaluation>

By default, your evaluation results will be generated in omnigirl/harness/reports.
For the detailed tutorial about evaluation, please refer to omnigirl/harness directory
Evaluation is recommended to be run on machines with amd64 architecture, consistent with the evaluation environment in the paper.

📖 Citation

If you find OmniGIRL useful for your research and applications, feel free to give us a star ⭐ or cite us using:

@inproceedings{guo2025omnigirl,
  title={OmniGIRL: A Multilingual and Multimodal Benchmark for GitHub Issue Resolution},
  author={Guo, Lianghong and Tao, Wei and Jiang, Runhan and Wang, Yanlin and Chen, Jiachi and Liu, Xilin and Ma, Yuchi and Mao, Mingzhi and Zhang, Hongyu and Zheng, Zibin},
  booktitle={Proceedings of the 34rd ACM SIGSOFT International Symposium on Software Testing and Analysis},
  year={2025},
  publisher={{ACM}},
}

🙏 Acknowledgements

We build on prior work — SWE-bench, Agentless, and AutoCodeRover — which laid the groundwork for this study.
We thank the EvalPlus leaderboard team for releasing the elegant page template that inspired this site.
Finally, we are grateful to the open-source developer community for their invaluable contributions.

Name		Name	Last commit message	Last commit date
Latest commit History 49 Commits
omnigirl		omnigirl
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt
setup.sh		setup.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

👉🏻 OmniGIRL 👈🏻

✨ Key Features

📦 Environment Setup

🚀 Running Evaluations

📖 Citation

🙏 Acknowledgements

About

Uh oh!

Releases

Packages

Contributors 2

Uh oh!

Languages

License

DeepSoftwareAnalytics/OmniGIRL

Folders and files

Latest commit

History

Repository files navigation

👉🏻 OmniGIRL 👈🏻

✨ Key Features

📦 Environment Setup

🚀 Running Evaluations

📖 Citation

🙏 Acknowledgements

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Uh oh!

Languages

Packages