Skip to content

Conversation

@mag123c
Copy link
Contributor

@mag123c mag123c commented Aug 7, 2025

TLDR

Parallelizes file I/O operations in memoryDiscovery.ts by converting sequential processing to parallel using Promise.all(), achieving 60%+ performance improvement while maintaining backward compatibility.

Dive Deeper

Problem

The current implementation processes directories and files sequentially, causing unnecessary I/O wait times. When loading multiple GEMINI.md files across directories, each operation blocks until completion before starting the next.

Solution

  • Converted sequential operations to parallel using Promise.allSettled()
  • Implemented batched processing with concurrency limits (10 for directories, 20 for files) to prevent EMFILE errors
  • Improved error isolation - individual failures don't block other operations
  • Maintained result order and backward compatibility

Performance Results

From benchmark tests with real file I/O:

  • 20 files: 59.3% improvement (2.46x faster)
  • Processing rate: 1630+ files/second
  • Sequential: 23.51ms → Parallel: 9.57ms

Reviewer Test Plan

  1. Run existing tests to verify backward compatibility: npm test -- src/utils/memoryDiscovery.test.ts
  • All 12 tests should pass.
  1. Run performance benchmarks to verify improvements: npm test -- src/utils/memoryDiscovery.performance.test.ts
  • Should show 50%+ improvement.
  1. Test with a real project containing multiple GEMINI.md files:
  • gemini "check performance with your command"
  1. Verify file discovery order is maintained (important for hierarchical loading).

Testing Matrix

🍏 🪟 🐧
npm run
npx
Docker
Podman - -
Seatbelt - -

Linked issues / bugs

Resolves #5749

mag123c added 2 commits August 7, 2025 14:52
…uential directory processing to parallel with Promise.all()

- Convert sequential file reading to parallel processing
- Maintain backward compatibility with all existing tests passing

Related to google-gemini#5749
- Add benchmark tests demonstrating 63.8% improvement
- Verify parallel processing achieves 2.76x speedup
- Document performance gains with real file I/O
@mag123c mag123c requested a review from a team as a code owner August 7, 2025 06:07
@google-cla
Copy link

google-cla bot commented Aug 7, 2025

Thanks for your pull request! It looks like this may be your first contribution to a Google open source project. Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA).

View this failed invocation of the CLA check for more information.

For the most up to date status, view the checks section at the bottom of the pull request.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Summary of Changes

Hello @mag123c, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

I've implemented a significant performance optimization for the memoryDiscovery module by parallelizing file I/O operations. This change converts sequential directory processing and file reading into concurrent operations using Promise.all(), drastically reducing the time it takes to discover and load GEMINI.md files. The primary goal is to improve the responsiveness and efficiency of the system when dealing with numerous GEMINI.md files, ensuring a smoother user experience without compromising backward compatibility.

Highlights

  • Parallel Directory Processing: I've refactored the getGeminiMdFilePathsInternal function to process directories concurrently using Promise.all(), which significantly speeds up the initial discovery of GEMINI.md files across the file system.
  • Concurrent File Reading: The readGeminiMdFiles function now reads and processes multiple GEMINI.md files in parallel. By leveraging Promise.all(), I've minimized I/O wait times, leading to a much faster overall file loading experience.
  • New Performance Benchmarking: I've introduced a new performance test file (memoryDiscovery.performance.test.ts) to rigorously validate the speed improvements. This test demonstrates over 50% gain in file processing, confirming the effectiveness of the parallelization efforts.
  • Significant Performance Gains: These changes collectively result in a substantial performance boost for memory discovery operations. Benchmarks show a 60%+ improvement and a processing rate exceeding 1920 files/second for GEMINI.md file discovery and reading.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point in your pull request via creating an issue comment (i.e. comment on the pull request page) using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in issue comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments or fill out our survey to provide feedback.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request significantly improves performance by parallelizing file I/O operations in memoryDiscovery.ts. The changes are well-structured and include performance tests to validate the improvements. However, the parallelization is unbounded, which introduces a critical risk of EMFILE: too many open files errors when processing a large number of files or directories. I've added comments highlighting where concurrency limits should be introduced to ensure stability. I've also suggested an improvement to the performance test to ensure correctness in addition to speed.

mag123c added 2 commits August 7, 2025 15:18
- Limit concurrent directory processing to 10
- Limit concurrent file reads to 20
- Add deep equality check in tests
- Addresses review feedback from gemini-code-assist
…cessing

  - Remove memoryDiscovery.performance.test.ts to follow project conventions
  - Add functional tests to verify parallel directory processing
  - Add test for handling multiple directories correctly
  - Add test for order preservation and duplicate prevention
  - Project convention: performance benchmarks are not included in test suite
@mag123c
Copy link
Contributor Author

mag123c commented Aug 7, 2025

/gemini review

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request significantly improves performance by parallelizing file I/O operations for memory discovery, achieving a substantial speed-up. The implementation correctly uses batched promises to avoid hitting system limits while maintaining the order of results. The added tests effectively validate the new parallel logic. I have one suggestion to enhance the robustness of the directory processing by adding error handling for each parallel task, similar to the pattern used for file reading. This will prevent a single failing directory from halting the entire process.

- Implement try-catch for each directory to prevent single failure from halting entire process
- Return empty array for failed directories to allow others to continue
- Add error logging for debugging
- Remove performance test file to follow project conventions
- Add functional tests for parallel processing behavior
@mag123c
Copy link
Contributor Author

mag123c commented Aug 7, 2025

reviews feedback addressed

  1. Added concurrency limits (1st review)

    • Set CONCURRENT_LIMIT = 10 for directories, 20 for files
    • Prevents EMFILE errors when processing large numbers of files
  2. Added robust error handling (2nd review)

    • Individual try-catch for each directory to prevent single failures from halting the process
    • Failed directories return empty arrays, allowing others to continue
    • Added error logging for debugging
  3. Replaced performance tests with functional tests

    • Removed memoryDiscovery.performance.test.ts (follows project conventions)
    • Added comprehensive functional tests to existing test suite

@JULES-JUNIOR
Copy link

E-MOBI / EKONOMIK MOBIL,S.R.L

THE COMPANY OF THE FUTURE IS IN YOUR MIDST.
E-MOBI Robotics Développement
The Next Way
I am the path of truth.

@JULES-JUNIOR
Copy link

Ok

@mag123c mag123c changed the title perf(core): parallelize memory discovery file operations for 60%+ performance gain perf(core): parallelize memory discovery file operations performance gain Aug 8, 2025
@mag123c mag123c force-pushed the perf/parallelize-memory-discovery branch from 6d311c4 to 16c4f5b Compare August 15, 2025 14:38
@mag123c
Copy link
Contributor Author

mag123c commented Aug 15, 2025

@jacob314, I noticed you reviewed PR #4763 which implemented a similar parallelization approach.
This PR follows the same pattern for memory discovery operations.

Would appreciate your insights if you have time to take a look. Thank you!

Copy link
Collaborator

@jacob314 jacob314 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for this polish. Appreciate that you used fairly low limit so I didn't have to worry too much about excessive numbers of simultaneous promises.

@jacob314
Copy link
Collaborator

Please resolve the conflict and then I will approve again and you can land.

@mag123c
Copy link
Contributor Author

mag123c commented Aug 21, 2025

@jacob314 Thank you for the review!
I've resolved the conflicts and applied the formatting fixes.

@jacob314 jacob314 enabled auto-merge August 21, 2025 18:11
@jacob314 jacob314 added this pull request to the merge queue Aug 21, 2025
Merged via the queue into google-gemini:main with commit 1e5ead6 Aug 21, 2025
18 checks passed
thacio added a commit to thacio/auditaria that referenced this pull request Aug 21, 2025
silviojr pushed a commit that referenced this pull request Aug 21, 2025
nandakishorereddy-chundi pushed a commit to nandakishorereddy-chundi/gemini-cli that referenced this pull request Aug 22, 2025
Gosling-dude pushed a commit to Gosling-dude/gemini-cli that referenced this pull request Aug 23, 2025
acoliver referenced this pull request in vybestack/llxprt-code Sep 11, 2025
involvex pushed a commit to involvex/gemini-cli that referenced this pull request Sep 11, 2025
reconsumeralization pushed a commit to reconsumeralization/gemini-cli that referenced this pull request Sep 19, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Parallelize memory discovery file operations for performance improvement

3 participants