perf(core): parallelize memory discovery file operations performance gain #5751

mag123c · 2025-08-07T06:07:34Z

TLDR

Parallelizes file I/O operations in memoryDiscovery.ts by converting sequential processing to parallel using Promise.all(), achieving 60%+ performance improvement while maintaining backward compatibility.

Dive Deeper

Problem

The current implementation processes directories and files sequentially, causing unnecessary I/O wait times. When loading multiple GEMINI.md files across directories, each operation blocks until completion before starting the next.

Solution

Converted sequential operations to parallel using Promise.allSettled()
Implemented batched processing with concurrency limits (10 for directories, 20 for files) to prevent EMFILE errors
Improved error isolation - individual failures don't block other operations
Maintained result order and backward compatibility

Performance Results

From benchmark tests with real file I/O:

20 files: 59.3% improvement (2.46x faster)
Processing rate: 1630+ files/second
Sequential: 23.51ms → Parallel: 9.57ms

Reviewer Test Plan

Run existing tests to verify backward compatibility: npm test -- src/utils/memoryDiscovery.test.ts

All 12 tests should pass.

Run performance benchmarks to verify improvements: npm test -- src/utils/memoryDiscovery.performance.test.ts

Should show 50%+ improvement.

Test with a real project containing multiple GEMINI.md files:

gemini "check performance with your command"

Verify file discovery order is maintained (important for hierarchical loading).

Testing Matrix

	🍏	🪟	🐧
npm run	✅	❓	❓
npx	❓	❓	❓
Docker	❓	❓	❓
Podman	❓	-	-
Seatbelt	❓	-	-

Linked issues / bugs

Resolves #5749

…uential directory processing to parallel with Promise.all() - Convert sequential file reading to parallel processing - Maintain backward compatibility with all existing tests passing Related to google-gemini#5749

- Add benchmark tests demonstrating 63.8% improvement - Verify parallel processing achieves 2.76x speedup - Document performance gains with real file I/O

google-cla · 2025-08-07T06:07:39Z

Thanks for your pull request! It looks like this may be your first contribution to a Google open source project. Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA).

View this failed invocation of the CLA check for more information.

For the most up to date status, view the checks section at the bottom of the pull request.

gemini-code-assist

Summary of Changes

Hello @mag123c, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

I've implemented a significant performance optimization for the memoryDiscovery module by parallelizing file I/O operations. This change converts sequential directory processing and file reading into concurrent operations using Promise.all(), drastically reducing the time it takes to discover and load GEMINI.md files. The primary goal is to improve the responsiveness and efficiency of the system when dealing with numerous GEMINI.md files, ensuring a smoother user experience without compromising backward compatibility.

Highlights

Parallel Directory Processing: I've refactored the getGeminiMdFilePathsInternal function to process directories concurrently using Promise.all(), which significantly speeds up the initial discovery of GEMINI.md files across the file system.
Concurrent File Reading: The readGeminiMdFiles function now reads and processes multiple GEMINI.md files in parallel. By leveraging Promise.all(), I've minimized I/O wait times, leading to a much faster overall file loading experience.
New Performance Benchmarking: I've introduced a new performance test file (memoryDiscovery.performance.test.ts) to rigorously validate the speed improvements. This test demonstrates over 50% gain in file processing, confirming the effectiveness of the parallelization efforts.
Significant Performance Gains: These changes collectively result in a substantial performance boost for memory discovery operations. Benchmarks show a 60%+ improvement and a processing rate exceeding 1920 files/second for GEMINI.md file discovery and reading.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point in your pull request via creating an issue comment (i.e. comment on the pull request page) using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in issue comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments or fill out our survey to provide feedback.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

gemini-code-assist

Code Review

This pull request significantly improves performance by parallelizing file I/O operations in memoryDiscovery.ts. The changes are well-structured and include performance tests to validate the improvements. However, the parallelization is unbounded, which introduces a critical risk of EMFILE: too many open files errors when processing a large number of files or directories. I've added comments highlighting where concurrency limits should be introduced to ensure stability. I've also suggested an improvement to the performance test to ensure correctness in addition to speed.

packages/core/src/utils/memoryDiscovery.ts

packages/core/src/utils/memoryDiscovery.performance.test.ts

- Limit concurrent directory processing to 10 - Limit concurrent file reads to 20 - Add deep equality check in tests - Addresses review feedback from gemini-code-assist

…cessing - Remove memoryDiscovery.performance.test.ts to follow project conventions - Add functional tests to verify parallel directory processing - Add test for handling multiple directories correctly - Add test for order preservation and duplicate prevention - Project convention: performance benchmarks are not included in test suite

mag123c · 2025-08-07T07:36:39Z

/gemini review

gemini-code-assist

Code Review

This pull request significantly improves performance by parallelizing file I/O operations for memory discovery, achieving a substantial speed-up. The implementation correctly uses batched promises to avoid hitting system limits while maintaining the order of results. The added tests effectively validate the new parallel logic. I have one suggestion to enhance the robustness of the directory processing by adding error handling for each parallel task, similar to the pattern used for file reading. This will prevent a single failing directory from halting the entire process.

packages/core/src/utils/memoryDiscovery.ts

- Implement try-catch for each directory to prevent single failure from halting entire process - Return empty array for failed directories to allow others to continue - Add error logging for debugging - Remove performance test file to follow project conventions - Add functional tests for parallel processing behavior

mag123c · 2025-08-07T07:44:29Z

reviews feedback addressed

Added concurrency limits (1st review)
- Set CONCURRENT_LIMIT = 10 for directories, 20 for files
- Prevents EMFILE errors when processing large numbers of files
Added robust error handling (2nd review)
- Individual try-catch for each directory to prevent single failures from halting the process
- Failed directories return empty arrays, allowing others to continue
- Added error logging for debugging
Replaced performance tests with functional tests
- Removed memoryDiscovery.performance.test.ts (follows project conventions)
- Added comprehensive functional tests to existing test suite

JULES-JUNIOR · 2025-08-07T07:46:26Z

E-MOBI / EKONOMIK MOBIL,S.R.L

THE COMPANY OF THE FUTURE IS IN YOUR MIDST.
E-MOBI Robotics Développement
The Next Way
I am the path of truth.

JULES-JUNIOR · 2025-08-07T07:46:43Z

Ok

Related google-gemini#5749

mag123c · 2025-08-15T14:42:59Z

@jacob314, I noticed you reviewed PR #4763 which implemented a similar parallelization approach.
This PR follows the same pattern for memory discovery operations.

Would appreciate your insights if you have time to take a look. Thank you!

jacob314

Thanks for this polish. Appreciate that you used fairly low limit so I didn't have to worry too much about excessive numbers of simultaneous promises.

jacob314 · 2025-08-21T08:00:06Z

Please resolve the conflict and then I will approve again and you can land.

mag123c · 2025-08-21T09:04:41Z

@jacob314 Thank you for the review!
I've resolved the conflicts and applied the formatting fixes.

… operations performance gain (google-gemini#5751)

…gain (#5751) Co-authored-by: Jacob Richman <[email protected]>

…gain (google-gemini#5751) Co-authored-by: Jacob Richman <[email protected]>

…gain (#5751) Co-authored-by: Jacob Richman <[email protected]>

…gain (google-gemini#5751) Co-authored-by: Jacob Richman <[email protected]>

mag123c added 2 commits August 7, 2025 14:52

test(core): add performance benchmarks for memory discovery

94dc034

- Add benchmark tests demonstrating 63.8% improvement - Verify parallel processing achieves 2.76x speedup - Document performance gains with real file I/O

mag123c requested a review from a team as a code owner August 7, 2025 06:07

gemini-code-assist bot reviewed Aug 7, 2025

View reviewed changes

packages/core/src/utils/memoryDiscovery.ts Outdated Show resolved Hide resolved

packages/core/src/utils/memoryDiscovery.performance.test.ts Outdated Show resolved Hide resolved

mag123c added 2 commits August 7, 2025 15:18

fix(core): add concurrency limits to prevent EMFILE errors

853a4e6

- Limit concurrent directory processing to 10 - Limit concurrent file reads to 20 - Add deep equality check in tests - Addresses review feedback from gemini-code-assist

gemini-code-assist bot reviewed Aug 7, 2025

View reviewed changes

packages/core/src/utils/memoryDiscovery.ts Show resolved Hide resolved

mag123c changed the title ~~perf(core): parallelize memory discovery file operations for 60%+ performance gain~~ perf(core): parallelize memory discovery file operations performance gain Aug 8, 2025

injae-kim mentioned this pull request Aug 15, 2025

Parallelize memory discovery file operations for performance improvement #5749

Closed

refactor: use Promise.allSettled() for better error isolation

16c4f5b

Related google-gemini#5749

mag123c force-pushed the perf/parallelize-memory-discovery branch from 6d311c4 to 16c4f5b Compare August 15, 2025 14:38

jacob314 approved these changes Aug 21, 2025

View reviewed changes

mag123c added 2 commits August 21, 2025 17:56

Merge branch 'main' into perf/parallelize-memory-discovery

158e5da

fix: resolve CI formatting and TypeScript errors

78841b3

Merge branch 'main' into perf/parallelize-memory-discovery

fbef860

jacob314 enabled auto-merge August 21, 2025 18:11

jacob314 added this pull request to the merge queue Aug 21, 2025

Merged via the queue into google-gemini:main with commit 1e5ead6 Aug 21, 2025
18 checks passed

thacio added a commit to thacio/auditaria that referenced this pull request Aug 21, 2025

Merge commit '1e5ead6': perf(core): parallelize memory discovery file…

fdcd0c2

… operations performance gain (google-gemini#5751)

silviojr pushed a commit that referenced this pull request Aug 21, 2025

perf(core): parallelize memory discovery file operations performance …

45f14c0

…gain (#5751) Co-authored-by: Jacob Richman <[email protected]>

nandakishorereddy-chundi pushed a commit to nandakishorereddy-chundi/gemini-cli that referenced this pull request Aug 22, 2025

perf(core): parallelize memory discovery file operations performance …

f1d673d

…gain (google-gemini#5751) Co-authored-by: Jacob Richman <[email protected]>

Gosling-dude pushed a commit to Gosling-dude/gemini-cli that referenced this pull request Aug 23, 2025

perf(core): parallelize memory discovery file operations performance …

7473085

…gain (google-gemini#5751) Co-authored-by: Jacob Richman <[email protected]>

acoliver referenced this pull request in vybestack/llxprt-code Sep 11, 2025

perf(core): parallelize memory discovery file operations performance …

9551c5b

…gain (#5751) Co-authored-by: Jacob Richman <[email protected]>

involvex pushed a commit to involvex/gemini-cli that referenced this pull request Sep 11, 2025

perf(core): parallelize memory discovery file operations performance …

85beee3

…gain (google-gemini#5751) Co-authored-by: Jacob Richman <[email protected]>

reconsumeralization pushed a commit to reconsumeralization/gemini-cli that referenced this pull request Sep 19, 2025

perf(core): parallelize memory discovery file operations performance …

eda942c

…gain (google-gemini#5751) Co-authored-by: Jacob Richman <[email protected]>

perf(core): parallelize memory discovery file operations performance gain #5751

perf(core): parallelize memory discovery file operations performance gain #5751

Uh oh!

Conversation

mag123c commented Aug 7, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

TLDR

Dive Deeper

Problem

Solution

Performance Results

Reviewer Test Plan

Testing Matrix

Linked issues / bugs

Uh oh!

google-cla bot commented Aug 7, 2025

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Summary of Changes

Highlights

Footnotes

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

mag123c commented Aug 7, 2025

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

mag123c commented Aug 7, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

reviews feedback addressed

Uh oh!

JULES-JUNIOR commented Aug 7, 2025

Uh oh!

JULES-JUNIOR commented Aug 7, 2025

Uh oh!

mag123c commented Aug 15, 2025

Uh oh!

jacob314 left a comment

Choose a reason for hiding this comment

Uh oh!

jacob314 commented Aug 21, 2025

Uh oh!

mag123c commented Aug 21, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

mag123c commented Aug 7, 2025 •

edited

Loading

mag123c commented Aug 7, 2025 •

edited

Loading