Skip to content

Conversation

@BryceKan3
Copy link

Description

Currently, as part of DirectoryReader.open() Lucene will sequentially create segment readers for each segment.

This can be a very slow operation due to the I/O on the SegmentReader creation. By adding support for an ExecutorService to be passed in to DirectoryReader.open() we can submit the segment reader creations into the threadpool and achieve significant performance gains in DirectoryReader.open() times. The implementation is fully backwards compatible and allows for the users to pass in their own executor services.

I have tested the changes and validated the performance improvement that can be possible by utilizing parallelism for the opening of the directory readers.

Optimization P50 (ms) P90 (ms) P99 (ms) P50 Reduction %
Baseline 995 1020 1041 N/A
Concurrent SegmentReader Initialization 171 178 188 82.81%

Fixes #15387

@github-actions github-actions bot added this to the 11.0.0 milestone Nov 14, 2025
Copy link
Contributor

@vigyasharma vigyasharma left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for these changes Bryce. I'm curious about the use-cases where opening segments become a bottleneck that should be parallelized. Have you seen some in production?

createMultiSegmentIndex(dir, 5);

ExecutorService executor =
Executors.newFixedThreadPool(1, new NamedThreadFactory("TestDirectoryReader"));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any reason to only test with 1 thread in these tests? Can we add more threads to test for edge cases and races, like only one thread sees an exception while opening a segment reader?

@vigyasharma
Copy link
Contributor

I was wondering if this concurrency would also benefit the openIfChanged calls? But looks like the benefit comes from doing segment reading work in parallel, and for openIfChanged calls, we try to use old readers as much as possible?

Also curious if you tried using virtual threads for the executor in this use-case, and if they helped.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[RFC] Add support for Concurrent SegmentReader Initialization in DirectoryReader.open()

2 participants