Skip to content

Base Lucene

BASE-LAB edited this page Feb 13, 2020 · 3 revisions

We take basic Lucene implementation as baseline, for default BM25 similarity and inverted index performs well in scenarios that there are high co-occurrence between query words and documents text, and multiple variations stem from this basic design.

Code Structures

Relevant codes consist of :

  • test/java/CS/index/baseIndex
  • test/java/CS/search/baseSearch

About how base lucene engine works, view this.

Usage

Data Preparation

  • Download Cosbench dataset.
  • Set the path of the codebase of CosBench to Config.Util/codebaseOrigin.

Index

  • Set the output path of index for codebase to Config.Util/codebaseIndex.
  • Run test/java/CS/index/baseIndex.

Search

  • Run test/java/CS/search/baseSearch/search with a query.

Evaluation

  • Set the path of the QAset of CosBench to Config.Util/QASet.
  • Set the output path of evaluation result to Config.Util/baseResult.
  • Run test/java/CS/search/baseSearch/evaluation.

Home

Method

Experiment

Clone this wiki locally