-
Notifications
You must be signed in to change notification settings - Fork 9.1k
perf(core): parallelize memory discovery file operations performance gain #5751
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
perf(core): parallelize memory discovery file operations performance gain #5751
Conversation
…uential directory processing to parallel with Promise.all() - Convert sequential file reading to parallel processing - Maintain backward compatibility with all existing tests passing Related to google-gemini#5749
- Add benchmark tests demonstrating 63.8% improvement - Verify parallel processing achieves 2.76x speedup - Document performance gains with real file I/O
|
Thanks for your pull request! It looks like this may be your first contribution to a Google open source project. Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA). View this failed invocation of the CLA check for more information. For the most up to date status, view the checks section at the bottom of the pull request. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Summary of Changes
Hello @mag123c, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!
I've implemented a significant performance optimization for the memoryDiscovery module by parallelizing file I/O operations. This change converts sequential directory processing and file reading into concurrent operations using Promise.all(), drastically reducing the time it takes to discover and load GEMINI.md files. The primary goal is to improve the responsiveness and efficiency of the system when dealing with numerous GEMINI.md files, ensuring a smoother user experience without compromising backward compatibility.
Highlights
- Parallel Directory Processing: I've refactored the
getGeminiMdFilePathsInternalfunction to process directories concurrently usingPromise.all(), which significantly speeds up the initial discovery ofGEMINI.mdfiles across the file system. - Concurrent File Reading: The
readGeminiMdFilesfunction now reads and processes multipleGEMINI.mdfiles in parallel. By leveragingPromise.all(), I've minimized I/O wait times, leading to a much faster overall file loading experience. - New Performance Benchmarking: I've introduced a new performance test file (
memoryDiscovery.performance.test.ts) to rigorously validate the speed improvements. This test demonstrates over 50% gain in file processing, confirming the effectiveness of the parallelization efforts. - Significant Performance Gains: These changes collectively result in a substantial performance boost for memory discovery operations. Benchmarks show a 60%+ improvement and a processing rate exceeding 1920 files/second for
GEMINI.mdfile discovery and reading.
Using Gemini Code Assist
The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.
Invoking Gemini
You can request assistance from Gemini at any point in your pull request via creating an issue comment (i.e. comment on the pull request page) using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands.
| Feature | Command | Description |
|---|---|---|
| Code Review | /gemini review |
Performs a code review for the current pull request in its current state. |
| Pull Request Summary | /gemini summary |
Provides a summary of the current pull request in its current state. |
| Comment | @gemini-code-assist | Responds in comments when explicitly tagged, both in issue comments and review comments. |
| Help | /gemini help |
Displays a list of available commands. |
Customization
To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.
Limitations & Feedback
Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments or fill out our survey to provide feedback.
You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.
Footnotes
-
Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request significantly improves performance by parallelizing file I/O operations in memoryDiscovery.ts. The changes are well-structured and include performance tests to validate the improvements. However, the parallelization is unbounded, which introduces a critical risk of EMFILE: too many open files errors when processing a large number of files or directories. I've added comments highlighting where concurrency limits should be introduced to ensure stability. I've also suggested an improvement to the performance test to ensure correctness in addition to speed.
- Limit concurrent directory processing to 10 - Limit concurrent file reads to 20 - Add deep equality check in tests - Addresses review feedback from gemini-code-assist
…cessing - Remove memoryDiscovery.performance.test.ts to follow project conventions - Add functional tests to verify parallel directory processing - Add test for handling multiple directories correctly - Add test for order preservation and duplicate prevention - Project convention: performance benchmarks are not included in test suite
|
/gemini review |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request significantly improves performance by parallelizing file I/O operations for memory discovery, achieving a substantial speed-up. The implementation correctly uses batched promises to avoid hitting system limits while maintaining the order of results. The added tests effectively validate the new parallel logic. I have one suggestion to enhance the robustness of the directory processing by adding error handling for each parallel task, similar to the pattern used for file reading. This will prevent a single failing directory from halting the entire process.
- Implement try-catch for each directory to prevent single failure from halting entire process - Return empty array for failed directories to allow others to continue - Add error logging for debugging - Remove performance test file to follow project conventions - Add functional tests for parallel processing behavior
reviews feedback addressed
|
|
E-MOBI / EKONOMIK MOBIL,S.R.L THE COMPANY OF THE FUTURE IS IN YOUR MIDST. |
|
Ok |
6d311c4 to
16c4f5b
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for this polish. Appreciate that you used fairly low limit so I didn't have to worry too much about excessive numbers of simultaneous promises.
|
Please resolve the conflict and then I will approve again and you can land. |
|
@jacob314 Thank you for the review! |
… operations performance gain (google-gemini#5751)
…gain (#5751) Co-authored-by: Jacob Richman <[email protected]>
…gain (google-gemini#5751) Co-authored-by: Jacob Richman <[email protected]>
…gain (google-gemini#5751) Co-authored-by: Jacob Richman <[email protected]>
…gain (#5751) Co-authored-by: Jacob Richman <[email protected]>
…gain (google-gemini#5751) Co-authored-by: Jacob Richman <[email protected]>
…gain (google-gemini#5751) Co-authored-by: Jacob Richman <[email protected]>
TLDR
Parallelizes file I/O operations in
memoryDiscovery.tsby converting sequential processing to parallel usingPromise.all(), achieving 60%+ performance improvement while maintaining backward compatibility.Dive Deeper
Problem
The current implementation processes directories and files sequentially, causing unnecessary I/O wait times. When loading multiple GEMINI.md files across directories, each operation blocks until completion before starting the next.
Solution
Promise.allSettled()Performance Results
From benchmark tests with real file I/O:
Reviewer Test Plan
npm test -- src/utils/memoryDiscovery.test.tsnpm test -- src/utils/memoryDiscovery.performance.test.tsTesting Matrix
Linked issues / bugs
Resolves #5749