Skip to content

Efficient iteration over deleted doc values #15226

@jainankitk

Description

@jainankitk

Description

As part of #14439, we introduced efficient histogram collection using PointTrees. Unfortunately, the optimization falls apart even with single deleted document in a segment. I am wondering if there is a way to efficiently iterate over deleted doc values and correct the values for each bucket.

I had thought about it a while back (and kind of forgot) and @mikemccand brought it up again during my talk on Efficient Histogram Collection in Lucene with BulkCollector at recently concluded Apache Community over Code

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions