This repository contains code to evaluate submissions to the SciFact leaderboard, hosted at https://leaderboard.allenai.org. SciFact data and modeling code can be found at https://github.com/allenai/scifact. Description of files and directories follow.
evaluator/: Contains evaluation code and environment.eval.py: Evaluation script to be invoked by leaderboard. In all leaderboard code, it is invoked with the--verboseflag, which reports P, R, and F1 (instead of just F1).Dockerfile: Specifies Docker env to be used when runningeval.py.
fixture/: Contains test fixtures.predictions_dummy.jsonl: "Dummy" prediction file for all 300 (hidden) test instances that can be submitted to the leaderboard as a test. This submission should not be publicly displayed on the leaderboard.expected_metrics_dummy.json: Metrics forpredictions_dummy.jsonlgold_small.jsonl: Gold labels for first 10 dev set instances.predictions_small.jsonl: VeriSci predictions on the first 10 dev set instances. To be used as a test to confirm correctness of the evaluation code.expected_metrics_small.json: Expected results of runningpython evaluator/eval.py, usinggold_small.jsonlas thelabels_fileandpredictions_small.jsonlas thepreds_file.
test.sh: Test that checks the correctness of the evaluator onpredictions_small.jsonl.