Home

This project reproduced Lucene-base Information Retrieval methods for code search, which focus more on Query Expansion and modifying similarity calculation approach to improve conventional code search performance.

Methods include:

Dependency

Tested in Ubuntu 16.04

Lucene Core 7.4.0

Code Structures

main/java/CS/evalution/: This module contains several measuring metrics definition and their implementation, including MAP, MRR, NDCG, Precision, Recall, F-score, First Position, etc.
main/java/CS/methods/: This module contains implementation of above information retrieval methods as well as their corresponding bonus required prerequisites.
main/java/CS/model/: This module contains necessary data model for serialization and other bottomed supporting model.
main/java/CS/similarity/: This module contains modifications on similarity in the comparison process.
main/java/CS/Util/: This module contains several utilities that preprocess the data, including static configuration, loading train and test dataset, evaluation and dataset format conversion.
test/java/CS/index/: This module contains implementations of index process.
test/java/CS/search/: This module contains implementations of search and evaluation process.
test/java/CS/Evaluate: This module contains implementations of evaluation only process.

Usage

Data Preparation

To build corresponding indexes and run searching process, the Cosbench dataset should be downloaded.

Configuration

Edit hyper-parameters and settings in java/CS/Util/ConfigUtil.java

Search and Evaluation

All searching and evaluating process is presented in a programmatic approach.

Edit corresponding files in test/java/CS/index and test/java/CS/search.

Evaluation Only

For the convenience of Evaluation, we provide the Top-20 search answers given by the 6 code search methods in the test/resources/searchResult folder. You can obtain the Evaluation results in test/resources/evaluateResult folder by modifying the test/java/CS/Evaluate/EvaluateResult file.

It should be noted that due to various factors (for example, different versions of the library used and the external data set), the Top-20 search answers obtained by the search method may be different. We only provide Top-20 search answers used in our experiments.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Home

Dependency

Code Structures

Usage

Data Preparation

Configuration

Search and Evaluation

Evaluation Only

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Home

Method

Experiment

Clone this wiki locally