-
Notifications
You must be signed in to change notification settings - Fork 0
Home
This project reproduced Lucene-base Information Retrieval methods for code search, which focus more on Query Expansion and modifying similarity calculation approach to improve conventional code search performance.
Methods include:
Tested in Ubuntu 16.04
- Lucene Core 7.4.0
-
main/java/CS/evalution/
: This module contains several measuring metrics definition and their implementation, including MAP, MRR, NDCG, Precision, Recall, F-score, First Position, etc. -
main/java/CS/methods/
: This module contains implementation of above information retrieval methods as well as their corresponding bonus required prerequisites. -
main/java/CS/model/
: This module contains necessary data model for serialization and other bottomed supporting model. -
main/java/CS/similarity/
: This module contains modifications on similarity in the comparison process. -
main/java/CS/Util/
: This module contains several utilities that preprocess the data, including static configuration, loading train and test dataset, evaluation and dataset format conversion. -
test/java/CS/index/
: This module contains implementations of index process. -
test/java/CS/search/
: This module contains implementations of search and evaluation process. -
test/java/CS/Evaluate
: This module contains implementations of evaluation only process.
To build corresponding indexes and run searching process, the Cosbench dataset should be downloaded.
Edit hyper-parameters and settings in java/CS/Util/ConfigUtil.java
All searching and evaluating process is presented in a programmatic approach.
Edit corresponding files in test/java/CS/index
and test/java/CS/search
.
For the convenience of Evaluation, we provide the Top-20 search answers given by the 6 code search methods in the test/resources/searchResult
folder. You can obtain the Evaluation results in test/resources/evaluateResult
folder by modifying the test/java/CS/Evaluate/EvaluateResult
file.
It should be noted that due to various factors (for example, different versions of the library used and the external data set), the Top-20 search answers obtained by the search method may be different. We only provide Top-20 search answers used in our experiments.