perf(core): implement parallel file processing for 74% performance improvement #4763

Han5991 · 2025-07-24T04:28:53Z

TLDR

Implement parallel file processing in ReadManyFilesTool using Promise.allSettled to replace sequential for-loop, achieving 74% performance
improvement (408ms → 107ms for batch operations). Builds on Han5991's async file detection work (#3286) to enable concurrent file type detection
and content reading without breaking existing functionality.

Dive Deeper

Problem

The current implementation processes files sequentially using a for-loop with await, causing unnecessary waiting time when processing multiple files.
Each file operation (type detection + content reading) blocked the next file from starting, leading to poor performance on large projects.

Solution

Replace sequential for-loop with Promise.allSettled() for concurrent file processing
Maintain error isolation: Individual file failures don't affect other files
Preserve all existing functionality: Error handling, skipped file tracking, and content formatting remain identical
Leverage existing async infrastructure: Built on Han5991's async detectFileType and processSingleFileContent functions

Technical Implementation

  // Before: Sequential processing
  for (const filePath of sortedFiles) {
    const fileType = await detectFileType(filePath);
    const content = await processSingleFileContent(filePath, ...);
  }

  // After: Parallel processing
  const promises = sortedFiles.map(async (filePath) => {
    const fileType = await detectFileType(filePath);
    const content = await processSingleFileContent(filePath, ...);
    return { filePath, fileType, content };
  });
  const results = await Promise.allSettled(promises);

###Performance Results

4 files: 408ms → 107ms (74% improvement)
Scaling: Performance gap increases with more files
Memory: Minimal overhead increase
Reliability: All existing error handling preserved

Reviewer Test Plan

Performance Verification

Run existing tests: npm test -- --run src/tools/read-many-files.test.ts
- All 25 tests should pass (22 existing + 3 new)
- New tests specifically verify parallel processing performance and concurrency
Manual testing with large file sets:

Create test scenario

mkdir -p test-batch && cd test-batch
for i in {1..20}; do echo "Content $i" > file$i.txt; done

Test the CLI

gemini "read all txt files in this directory"
4. Performance comparison prompts:
- "Show me all TypeScript files in src/" (should be noticeably faster)
- "Read all package.json files in the project" (multi-file processing)

Concurrency Verification

The test suite includes mocked timing tests that verify:

All file operations start simultaneously (not sequentially)
Total processing time is ~max(individual_time) rather than sum(individual_times)
Execution order patterns confirm parallel behavior

Error Handling Validation

Mixed success/failure: Create some unreadable files to verify partial success handling
All failures: Test with non-existent files to ensure graceful degradation
Memory pressure: Test with many files to verify no resource leaks

Testing Matrix

	🍏	🪟	🐧
npm run	✅	❓	❓
npx	❓	❓	❓
Docker	❓	❓	❓
Podman	❓	-	-
Seatbelt	❓	-	-

Tested on macOS with:

npm run preflight - All tests, linting, and type checks pass
npm test - Full test suite passes (25/25 tests)
Manual CLI testing with various file sets

Linked issues / bugs

Resolves Implement batch/parallel file processing for performance optimization #4712

Builds on:

Related to Convert synchronous file binary detection to async to eliminate event loop blocking #3286 (Han5991's async file detection implementation)

Performance improvement for:

Large project file reading operations
Batch file processing scenarios
Multi-file content analysis workflows

This PR directly addresses performance bottlenecks in file-heavy operations while maintaining full backward compatibility and error handling
robustness.

Add test cases that require parallel processing in ReadManyFilesTool: - Performance test expecting <200ms for 4 files (fails at ~400ms) - Concurrency test verifying simultaneous execution patterns Tests will pass after implementing Promise.allSettled for file operations.

Replace sequential for-loop with Promise.allSettled for concurrent file operations: - Process file type detection and content reading in parallel - Achieve 74% performance improvement (408ms → 107ms for 4 files) - Maintain comprehensive error handling with graceful failure recovery - Preserve all existing functionality and API compatibility All tests pass including new parallel processing verification tests. Builds on Han5991's async file detection work (google-gemini#3286) to enable batch processing optimization.

Revised test comments to accurately reflect expectations for parallel execution. This aligns the documentation with the intended implementation and clarifies the purpose of the assertions.

gemini-code-assist

Summary of Changes

Hello @Han5991, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request introduces a major performance enhancement to the ReadManyFilesTool by transitioning its file processing mechanism from sequential to parallel. This fundamental change leverages asynchronous JavaScript capabilities to significantly reduce the time required for batch file operations, making the tool much more efficient for large projects and multi-file content analysis workflows.

Highlights

Performance Improvement: The core ReadManyFilesTool has been refactored to process files in parallel, replacing the previous sequential for-loop with Promise.allSettled(). This change delivers a significant 74% performance improvement for batch file operations (e.g., 408ms down to 107ms for 4 files).
Robustness and Error Handling: The parallel processing implementation ensures that individual file failures do not block or affect the processing of other files, maintaining error isolation. All existing functionality, including error handling, tracking of skipped files, and content formatting, has been preserved.
Comprehensive Testing: New tests have been added to specifically verify the performance gains, concurrency, and graceful error handling of the parallel file processing. These tests include mocking I/O delays to simulate real-world scenarios and confirm the expected parallel execution patterns.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point in your pull request via creating an issue comment (i.e. comment on the pull request page) using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in issue comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments or fill out our survey to provide feedback.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

packages/core/src/tools/read-many-files.ts

gemini-code-assist

Code Review

The code changes introduce parallel file processing, significantly improving performance. To enhance error handling, I suggest wrapping the async callback in a try...catch block to preserve file paths in case of unexpected errors.

jacob314 · 2025-07-24T04:38:36Z

packages/core/src/tools/read-many-files.ts

+            fileResult;
+
+          if (typeof fileReadResult!.llmContent === 'string') {
+            const separator = DEFAULT_OUTPUT_SEPARATOR_FORMAT.replace(


tweak so you don't need the ! after fileReadResult here and elsewhere. likely you can fix by a union type so fileReadResult is required when success is true. alternately add a case above that should never be it where you report an error if fileReadResult is undefined but success is true with a comment that it shouldn't occur.

@jacob314
Fixed! Replaced non-null assertions with union types for better type safety. TypeScript now properly narrows types based on the success field. All tests passing ✅

Refined the parallel file processing logic to include comprehensive error handling. Each file processing operation now catches unexpected errors, ensuring robust and detailed failure reporting without disrupting the overall execution flow.

jacob314

Thanks for the optimizing the performance of this and adding tests that verify that the performance is improved!

Introduced `FileProcessingResult` type to unify success and error handling in file processing. Improved type safety, clarified logic, and simplified result handling.

injae-kim · 2025-08-01T08:24:56Z

(just curios) when this PR will be merged? :)

…g for 74% performance improvement (google-gemini#4763)

…provement (google-gemini#4763) Co-authored-by: Jacob Richman <[email protected]> Co-authored-by: Sandy Tao <[email protected]>

Han5991 added 3 commits July 23, 2025 18:26

refactor: update test descriptions for parallel processing

c5dd406

Revised test comments to accurately reflect expectations for parallel execution. This aligns the documentation with the intended implementation and clarifies the purpose of the assertions.

Han5991 requested a review from a team as a code owner July 24, 2025 04:28

gemini-code-assist bot reviewed Jul 24, 2025

View reviewed changes

jacob314 reviewed Jul 24, 2025

View reviewed changes

packages/core/src/tools/read-many-files.ts Outdated Show resolved Hide resolved

gemini-code-assist bot reviewed Jul 24, 2025

View reviewed changes

refactor(core): remove redundant comment in read-many-files tool

0e8a4eb

jacob314 reviewed Jul 24, 2025

View reviewed changes

jacob314 approved these changes Jul 24, 2025

View reviewed changes

Merge branch 'main' into han5991/batch-file-processing

b6bce6d

jacob314 enabled auto-merge July 24, 2025 04:44

jacob314 disabled auto-merge July 24, 2025 04:44

refactor(core): standardize file processing result structure

3962256

Introduced `FileProcessingResult` type to unify success and error handling in file processing. Improved type safety, clarified logic, and simplified result handling.

gemini-cli bot added kind/enhancement priority/p1 Important and should be addressed in the near term. area/core Issues related to User Interface, OS Support, Core Functionality labels Jul 24, 2025

SandyTao520 added 2 commits August 5, 2025 15:17

Merge branch 'main' into han5991/batch-file-processing

c36b391

Merge branch 'main' into han5991/batch-file-processing

29eb4ba

SandyTao520 enabled auto-merge August 5, 2025 22:42

SandyTao520 added this pull request to the merge queue Aug 5, 2025

Merged via the queue into google-gemini:main with commit aebe3ac Aug 5, 2025
14 checks passed

thacio added a commit to thacio/auditaria that referenced this pull request Aug 6, 2025

Merge Commit 'aebe3ac': perf(core): implement parallel file processin…

84225c7

…g for 74% performance improvement (google-gemini#4763)

frikyfriky11 mentioned this pull request Aug 13, 2025

Inconsistent Release Changelog Generation: 'Full Changelog' always starts from v0.1.12 #5663

Closed

mag123c mentioned this pull request Aug 15, 2025

perf(core): parallelize memory discovery file operations performance gain #5751

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

perf(core): implement parallel file processing for 74% performance improvement #4763

perf(core): implement parallel file processing for 74% performance improvement #4763

Han5991 commented Jul 24, 2025

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

jacob314 Jul 24, 2025 •

edited

Loading

Uh oh!

Han5991 Jul 24, 2025

Uh oh!

jacob314 left a comment

Uh oh!

injae-kim commented Aug 1, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

perf(core): implement parallel file processing for 74% performance improvement #4763

perf(core): implement parallel file processing for 74% performance improvement #4763

Conversation

Han5991 commented Jul 24, 2025

TLDR

Dive Deeper

Problem

Solution

Technical Implementation

Reviewer Test Plan

Create test scenario

Test the CLI

Testing Matrix

Linked issues / bugs

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Summary of Changes

Highlights

Footnotes

Uh oh!

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

jacob314 Jul 24, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Han5991 Jul 24, 2025

Choose a reason for hiding this comment

Uh oh!

jacob314 left a comment

Choose a reason for hiding this comment

Uh oh!

injae-kim commented Aug 1, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

jacob314 Jul 24, 2025 •

edited

Loading