Skip to content

Unnecessary float[](BM25Scorer) allocations for non-scoring queries #12297

@jainankitk

Description

@jainankitk

Description

While looking into customer issue, I noticed increase in GC time from Lucene 7.x to 8.x. From the JVM histograms, one of the primary difference was float[] allocation. Took a heap dump to check the dominator and it was coming from BM25Scorer.

The change seems to have come in with 8fd7ead, which removed some of the special-case logic around the "non-scoring similarity" embedded in IndexSearcher (returned in the false case from the old IndexSearcher#scorer(boolean needsScores)).

 num     #instances         #bytes  class name (module)
-------------------------------------------------------
   1:      24601972     4773247024  [B ([email protected])
   2:       2100779     2061684496  [F ([email protected])
   3:      33501475      804035400  java.util.ArrayList ([email protected])
   4:      16232322      716523504  [Ljava.lang.Object; ([email protected])
   5:      14819347      711328656  java.util.HashMap ([email protected])
...
...
  34:       1106011       79632792  org.apache.lucene.store.ByteBufferIndexInput$SingleBufferImpl
  35:       1979609       79184360  org.apache.lucene.search.similarities.BM25Similarity$BM25Scorer

I also validated that the scoring mode for these queries is COMPLETE_NO_SCORES, that has needsScore set to false:

method=org.apache.lucene.search.TermQuery$TermWeight.<init> location=AtExit
ts=2023-05-01 22:32:57; [cost=0.014381ms] result=@ArrayList[
    @ScoreMode[COMPLETE_NO_SCORES],
]
method=org.apache.lucene.search.TermQuery$TermWeight.<init> location=AtExit
ts=2023-05-01 22:32:56; [cost=0.029482ms] result=@ArrayList[
    @ScoreMode[COMPLETE_NO_SCORES],
]
method=org.apache.lucene.search.TermQuery$TermWeight.<init> location=AtExit
ts=2023-05-01 22:32:57; [cost=0.0135ms] result=@ArrayList[
    @ScoreMode[COMPLETE_NO_SCORES],
]

Version and environment details

Using Lucene 8.10.1, though the issue is there starting 8.x goes into 9.x as well

Screenshot

Screenshot 2023-05-19 at 5 56 08 PM

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions