-
Notifications
You must be signed in to change notification settings - Fork 9.2k
perf(core): implement parallel file processing for 74% performance improvement #4763
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
perf(core): implement parallel file processing for 74% performance improvement #4763
Conversation
Add test cases that require parallel processing in ReadManyFilesTool: - Performance test expecting <200ms for 4 files (fails at ~400ms) - Concurrency test verifying simultaneous execution patterns Tests will pass after implementing Promise.allSettled for file operations.
Replace sequential for-loop with Promise.allSettled for concurrent file operations: - Process file type detection and content reading in parallel - Achieve 74% performance improvement (408ms → 107ms for 4 files) - Maintain comprehensive error handling with graceful failure recovery - Preserve all existing functionality and API compatibility All tests pass including new parallel processing verification tests. Builds on Han5991's async file detection work (google-gemini#3286) to enable batch processing optimization.
Revised test comments to accurately reflect expectations for parallel execution. This aligns the documentation with the intended implementation and clarifies the purpose of the assertions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Summary of Changes
Hello @Han5991, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!
This pull request introduces a major performance enhancement to the ReadManyFilesTool by transitioning its file processing mechanism from sequential to parallel. This fundamental change leverages asynchronous JavaScript capabilities to significantly reduce the time required for batch file operations, making the tool much more efficient for large projects and multi-file content analysis workflows.
Highlights
- Performance Improvement: The core
ReadManyFilesToolhas been refactored to process files in parallel, replacing the previous sequentialfor-loopwithPromise.allSettled(). This change delivers a significant 74% performance improvement for batch file operations (e.g., 408ms down to 107ms for 4 files). - Robustness and Error Handling: The parallel processing implementation ensures that individual file failures do not block or affect the processing of other files, maintaining error isolation. All existing functionality, including error handling, tracking of skipped files, and content formatting, has been preserved.
- Comprehensive Testing: New tests have been added to specifically verify the performance gains, concurrency, and graceful error handling of the parallel file processing. These tests include mocking I/O delays to simulate real-world scenarios and confirm the expected parallel execution patterns.
Using Gemini Code Assist
The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.
Invoking Gemini
You can request assistance from Gemini at any point in your pull request via creating an issue comment (i.e. comment on the pull request page) using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands.
| Feature | Command | Description |
|---|---|---|
| Code Review | /gemini review |
Performs a code review for the current pull request in its current state. |
| Pull Request Summary | /gemini summary |
Provides a summary of the current pull request in its current state. |
| Comment | @gemini-code-assist | Responds in comments when explicitly tagged, both in issue comments and review comments. |
| Help | /gemini help |
Displays a list of available commands. |
Customization
To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.
Limitations & Feedback
Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments or fill out our survey to provide feedback.
You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.
Footnotes
-
Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
The code changes introduce parallel file processing, significantly improving performance. To enhance error handling, I suggest wrapping the async callback in a try...catch block to preserve file paths in case of unexpected errors.
| fileResult; | ||
|
|
||
| if (typeof fileReadResult!.llmContent === 'string') { | ||
| const separator = DEFAULT_OUTPUT_SEPARATOR_FORMAT.replace( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
tweak so you don't need the ! after fileReadResult here and elsewhere. likely you can fix by a union type so fileReadResult is required when success is true. alternately add a case above that should never be it where you report an error if fileReadResult is undefined but success is true with a comment that it shouldn't occur.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jacob314
Fixed! Replaced non-null assertions with union types for better type safety. TypeScript now properly narrows types based on the success field. All tests passing ✅
Refined the parallel file processing logic to include comprehensive error handling. Each file processing operation now catches unexpected errors, ensuring robust and detailed failure reporting without disrupting the overall execution flow.
jacob314
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Introduced `FileProcessingResult` type to unify success and error handling in file processing. Improved type safety, clarified logic, and simplified result handling.
|
(just curios) when this PR will be merged? :) |
…g for 74% performance improvement (google-gemini#4763)
…provement (google-gemini#4763) Co-authored-by: Jacob Richman <[email protected]> Co-authored-by: Sandy Tao <[email protected]>
…provement (google-gemini#4763) Co-authored-by: Jacob Richman <[email protected]> Co-authored-by: Sandy Tao <[email protected]>
…provement (google-gemini#4763) Co-authored-by: Jacob Richman <[email protected]> Co-authored-by: Sandy Tao <[email protected]>
…provement (google-gemini#4763) Co-authored-by: Jacob Richman <[email protected]> Co-authored-by: Sandy Tao <[email protected]>
…provement (google-gemini#4763) Co-authored-by: Jacob Richman <[email protected]> Co-authored-by: Sandy Tao <[email protected]>

TLDR
Implement parallel file processing in ReadManyFilesTool using Promise.allSettled to replace sequential for-loop, achieving 74% performance
improvement (408ms → 107ms for batch operations). Builds on Han5991's async file detection work (#3286) to enable concurrent file type detection
and content reading without breaking existing functionality.
Dive Deeper
Problem
The current implementation processes files sequentially using a for-loop with await, causing unnecessary waiting time when processing multiple files.
Each file operation (type detection + content reading) blocked the next file from starting, leading to poor performance on large projects.
Solution
Promise.allSettled()for concurrent file processingdetectFileTypeandprocessSingleFileContentfunctionsTechnical Implementation
###Performance Results
Reviewer Test Plan
Performance Verification
Create test scenario
mkdir -p test-batch && cd test-batch
for i in {1..20}; do echo "Content $i" > file$i.txt; done
Test the CLI
gemini "read all txt files in this directory"
4. Performance comparison prompts:
- "Show me all TypeScript files in src/" (should be noticeably faster)
- "Read all package.json files in the project" (multi-file processing)
Concurrency Verification
The test suite includes mocked timing tests that verify:
Error Handling Validation
Testing Matrix
Tested on macOS with:
Linked issues / bugs
Builds on:
Performance improvement for:
This PR directly addresses performance bottlenecks in file-heavy operations while maintaining full backward compatibility and error handling
robustness.