-
Notifications
You must be signed in to change notification settings - Fork 1.2k
Closed
Description
Description
While looking into customer issue, I noticed increase in GC time from Lucene 7.x to 8.x. From the JVM histograms, one of the primary difference was float[] allocation. Took a heap dump to check the dominator and it was coming from BM25Scorer.
The change seems to have come in with 8fd7ead, which removed some of the special-case logic around the "non-scoring similarity" embedded in IndexSearcher (returned in the false case from the old IndexSearcher#scorer(boolean needsScores)).
num #instances #bytes class name (module)
-------------------------------------------------------
1: 24601972 4773247024 [B ([email protected])
2: 2100779 2061684496 [F ([email protected])
3: 33501475 804035400 java.util.ArrayList ([email protected])
4: 16232322 716523504 [Ljava.lang.Object; ([email protected])
5: 14819347 711328656 java.util.HashMap ([email protected])
...
...
34: 1106011 79632792 org.apache.lucene.store.ByteBufferIndexInput$SingleBufferImpl
35: 1979609 79184360 org.apache.lucene.search.similarities.BM25Similarity$BM25Scorer
I also validated that the scoring mode for these queries is COMPLETE_NO_SCORES, that has needsScore set to false:
method=org.apache.lucene.search.TermQuery$TermWeight.<init> location=AtExit
ts=2023-05-01 22:32:57; [cost=0.014381ms] result=@ArrayList[
@ScoreMode[COMPLETE_NO_SCORES],
]
method=org.apache.lucene.search.TermQuery$TermWeight.<init> location=AtExit
ts=2023-05-01 22:32:56; [cost=0.029482ms] result=@ArrayList[
@ScoreMode[COMPLETE_NO_SCORES],
]
method=org.apache.lucene.search.TermQuery$TermWeight.<init> location=AtExit
ts=2023-05-01 22:32:57; [cost=0.0135ms] result=@ArrayList[
@ScoreMode[COMPLETE_NO_SCORES],
]
Version and environment details
Using Lucene 8.10.1, though the issue is there starting 8.x goes into 9.x as well
Screenshot
kkhatua
