Skip to content

Conversation

@Han5991
Copy link
Contributor

@Han5991 Han5991 commented Jul 24, 2025

TLDR

Implement parallel file processing in ReadManyFilesTool using Promise.allSettled to replace sequential for-loop, achieving 74% performance
improvement
(408ms → 107ms for batch operations). Builds on Han5991's async file detection work (#3286) to enable concurrent file type detection
and content reading without breaking existing functionality.

Dive Deeper

Problem

The current implementation processes files sequentially using a for-loop with await, causing unnecessary waiting time when processing multiple files.
Each file operation (type detection + content reading) blocked the next file from starting, leading to poor performance on large projects.

Solution

  • Replace sequential for-loop with Promise.allSettled() for concurrent file processing
  • Maintain error isolation: Individual file failures don't affect other files
  • Preserve all existing functionality: Error handling, skipped file tracking, and content formatting remain identical
  • Leverage existing async infrastructure: Built on Han5991's async detectFileType and processSingleFileContent functions

Technical Implementation

  // Before: Sequential processing
  for (const filePath of sortedFiles) {
    const fileType = await detectFileType(filePath);
    const content = await processSingleFileContent(filePath, ...);
  }

  // After: Parallel processing
  const promises = sortedFiles.map(async (filePath) => {
    const fileType = await detectFileType(filePath);
    const content = await processSingleFileContent(filePath, ...);
    return { filePath, fileType, content };
  });
  const results = await Promise.allSettled(promises);

###Performance Results

  • 4 files: 408ms → 107ms (74% improvement)
  • Scaling: Performance gap increases with more files
  • Memory: Minimal overhead increase
  • Reliability: All existing error handling preserved

Reviewer Test Plan

Performance Verification

  1. Run existing tests: npm test -- --run src/tools/read-many-files.test.ts
    • All 25 tests should pass (22 existing + 3 new)
    • New tests specifically verify parallel processing performance and concurrency
  2. Manual testing with large file sets:

Create test scenario

mkdir -p test-batch && cd test-batch
for i in {1..20}; do echo "Content $i" > file$i.txt; done

Test the CLI

gemini "read all txt files in this directory"
4. Performance comparison prompts:
- "Show me all TypeScript files in src/" (should be noticeably faster)
- "Read all package.json files in the project" (multi-file processing)

Concurrency Verification

The test suite includes mocked timing tests that verify:

  • All file operations start simultaneously (not sequentially)
  • Total processing time is ~max(individual_time) rather than sum(individual_times)
  • Execution order patterns confirm parallel behavior

Error Handling Validation

  • Mixed success/failure: Create some unreadable files to verify partial success handling
  • All failures: Test with non-existent files to ensure graceful degradation
  • Memory pressure: Test with many files to verify no resource leaks

Testing Matrix

🍏 🪟 🐧
npm run
npx
Docker
Podman - -
Seatbelt - -

Tested on macOS with:

  • npm run preflight - All tests, linting, and type checks pass
  • npm test - Full test suite passes (25/25 tests)
  • Manual CLI testing with various file sets

Linked issues / bugs

Builds on:

Performance improvement for:

  • Large project file reading operations
  • Batch file processing scenarios
  • Multi-file content analysis workflows

This PR directly addresses performance bottlenecks in file-heavy operations while maintaining full backward compatibility and error handling
robustness.

Han5991 added 3 commits July 23, 2025 18:26
  Add test cases that require parallel processing in ReadManyFilesTool:
  - Performance test expecting <200ms for 4 files (fails at ~400ms)
  - Concurrency test verifying simultaneous execution patterns

  Tests will pass after implementing Promise.allSettled for file operations.
  Replace sequential for-loop with Promise.allSettled for concurrent file operations:
  - Process file type detection and content reading in parallel
  - Achieve 74% performance improvement (408ms → 107ms for 4 files)
  - Maintain comprehensive error handling with graceful failure recovery
  - Preserve all existing functionality and API compatibility

  All tests pass including new parallel processing verification tests.

  Builds on Han5991's async file detection work (google-gemini#3286) to enable batch processing optimization.
Revised test comments to accurately reflect expectations for parallel execution. This aligns the documentation with the intended implementation and clarifies the purpose of the assertions.
@Han5991 Han5991 requested a review from a team as a code owner July 24, 2025 04:28
Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Summary of Changes

Hello @Han5991, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request introduces a major performance enhancement to the ReadManyFilesTool by transitioning its file processing mechanism from sequential to parallel. This fundamental change leverages asynchronous JavaScript capabilities to significantly reduce the time required for batch file operations, making the tool much more efficient for large projects and multi-file content analysis workflows.

Highlights

  • Performance Improvement: The core ReadManyFilesTool has been refactored to process files in parallel, replacing the previous sequential for-loop with Promise.allSettled(). This change delivers a significant 74% performance improvement for batch file operations (e.g., 408ms down to 107ms for 4 files).
  • Robustness and Error Handling: The parallel processing implementation ensures that individual file failures do not block or affect the processing of other files, maintaining error isolation. All existing functionality, including error handling, tracking of skipped files, and content formatting, has been preserved.
  • Comprehensive Testing: New tests have been added to specifically verify the performance gains, concurrency, and graceful error handling of the parallel file processing. These tests include mocking I/O delays to simulate real-world scenarios and confirm the expected parallel execution patterns.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point in your pull request via creating an issue comment (i.e. comment on the pull request page) using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in issue comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments or fill out our survey to provide feedback.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

The code changes introduce parallel file processing, significantly improving performance. To enhance error handling, I suggest wrapping the async callback in a try...catch block to preserve file paths in case of unexpected errors.

fileResult;

if (typeof fileReadResult!.llmContent === 'string') {
const separator = DEFAULT_OUTPUT_SEPARATOR_FORMAT.replace(
Copy link
Collaborator

@jacob314 jacob314 Jul 24, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tweak so you don't need the ! after fileReadResult here and elsewhere. likely you can fix by a union type so fileReadResult is required when success is true. alternately add a case above that should never be it where you report an error if fileReadResult is undefined but success is true with a comment that it shouldn't occur.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jacob314
Fixed! Replaced non-null assertions with union types for better type safety. TypeScript now properly narrows types based on the success field. All tests passing ✅

Refined the parallel file processing logic to include comprehensive error handling. Each file processing operation now catches unexpected errors, ensuring robust and detailed failure reporting without disrupting the overall execution flow.
Copy link
Collaborator

@jacob314 jacob314 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the optimizing the performance of this and adding tests that verify that the performance is improved! lgtm

@jacob314 jacob314 enabled auto-merge July 24, 2025 04:44
@jacob314 jacob314 disabled auto-merge July 24, 2025 04:44
Introduced `FileProcessingResult` type to unify success and error handling in file processing. Improved type safety, clarified logic, and simplified result handling.
@gemini-cli gemini-cli bot added kind/enhancement priority/p1 Important and should be addressed in the near term. area/core Issues related to User Interface, OS Support, Core Functionality labels Jul 24, 2025
@injae-kim
Copy link

(just curios) when this PR will be merged? :)

@SandyTao520 SandyTao520 enabled auto-merge August 5, 2025 22:42
@SandyTao520 SandyTao520 added this pull request to the merge queue Aug 5, 2025
Merged via the queue into google-gemini:main with commit aebe3ac Aug 5, 2025
14 checks passed
thacio added a commit to thacio/auditaria that referenced this pull request Aug 6, 2025
agmsb pushed a commit to agmsb/gemini-cli that referenced this pull request Aug 6, 2025
JunYang-tes pushed a commit to JunYang-tes/gemini-cli.nvim that referenced this pull request Aug 9, 2025
JeongJaeSoon pushed a commit to JeongJaeSoon/gemini-cli that referenced this pull request Aug 21, 2025
involvex pushed a commit to involvex/gemini-cli that referenced this pull request Sep 11, 2025
reconsumeralization pushed a commit to reconsumeralization/gemini-cli that referenced this pull request Sep 19, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area/core Issues related to User Interface, OS Support, Core Functionality priority/p1 Important and should be addressed in the near term.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Implement batch/parallel file processing for performance optimization

4 participants