Skip to content

removed all images except the latest two in AnthropicCuaClient #905

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
Jul 22, 2025

Conversation

tkattkat
Copy link
Collaborator

why

part of STG-586

Currently, we leave all of the images for anthropic cua client within the LLM's context as the task progresses

what changed

We now remove all screenshots aside from the last two when experimental flag is set to true within stagehand config

test plan

tested locally

Copy link

changeset-bot bot commented Jul 22, 2025

🦋 Changeset detected

Latest commit: 63b38b5

The changes in this PR will be included in the next version bump.

Not sure what this means? Click here to learn what changesets are.

Click here if you're a maintainer who wants to add another changeset to this PR

Copy link
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Greptile Summary

This PR implements an image compression optimization for the Anthropic CUA client to address memory and token usage concerns during long-running agent tasks. The change introduces an experimental feature that removes older screenshots from the conversation history while preserving the most recent two images.

The implementation adds an experimental flag that flows through the entire system architecture:

  • From the main Stagehand configuration (lib/index.ts)
  • Through the agent handler (lib/handlers/agentHandler.ts)
  • To the AgentProvider (lib/agent/AgentProvider.ts)
  • Finally reaching the AnthropicCUAClient (lib/agent/AnthropicCUAClient.ts)

When enabled, the system uses a new utility function (lib/agent/imageCompressionUtils.ts) that identifies conversation items containing images and replaces older screenshots with the text placeholder "screenshot taken". This compression happens before each new assistant message is added to the conversation history.

The feature addresses STG-586, where the Anthropic CUA client was accumulating all screenshots throughout task progression, potentially causing context window overflow and excessive token consumption. By keeping only the two most recent screenshots, the system maintains relevant visual context while significantly reducing memory footprint. The experimental flag provides a safe rollout mechanism for testing this optimization without affecting existing users.

Confidence score: 4/5

  • This PR appears safe to merge with proper testing, implementing a well-architected experimental feature for memory optimization
  • The score reflects the experimental nature of the feature and potential edge cases around image content handling that should be monitored
  • The imageCompressionUtils.ts file needs attention for robust error handling and edge case management

6 files reviewed, 3 comments

Edit Code Review Bot Settings | Greptile

@miguelg719 miguelg719 added the parity To note required feature parity in client SDKs label Jul 22, 2025
@stagehand-parity-bot
Copy link

🔄 Feature Parity Issue Created

An issue has been automatically created in the Python SDK repository to track parity implementation:
browserbase/stagehand-python#156

Copy link
Collaborator

@miguelg719 miguelg719 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm just left a comment

@tkattkat tkattkat merged commit 023c2c2 into main Jul 22, 2025
25 of 27 checks passed
seanmcguire12 pushed a commit that referenced this pull request Jul 31, 2025
This PR was opened by the [Changesets
release](https://github.com/changesets/action) GitHub action. When
you're ready to do a release, you can merge this and the packages will
be published to npm automatically. If you're not ready to do a release
yet, that's fine, whenever you add more changesets to main, this PR will
be updated.


# Releases
## @browserbasehq/[email protected]

### Patch Changes

- [#865](#865)
[`6b4e6e3`](6b4e6e3)
Thanks [@seanmcguire12](https://github.com/seanmcguire12)! - improve
type safety for trimTrailingTextNode

- [#897](#897)
[`e77d018`](e77d018)
Thanks [@miguelg719](https://github.com/miguelg719)! - Fix selfHeal to
remember intially received arguments

- [#920](#920)
[`c20adb9`](c20adb9)
Thanks [@seanmcguire12](https://github.com/seanmcguire12)! - fix: tab
handling on API

- [#882](#882)
[`b86df93`](b86df93)
Thanks [@seanmcguire12](https://github.com/seanmcguire12)! - remove
elements that don't have xpaths from observe response

- [#905](#905)
[`023c2c2`](023c2c2)
Thanks [@tkattkat](https://github.com/tkattkat)! - Delete old images
from anthropic cua client

- [#925](#925)
[`8c28647`](8c28647)
Thanks [@miguelg719](https://github.com/miguelg719)! - Remove
\_refreshPageFromApi()

- [#887](#887)
[`87e09c6`](87e09c6)
Thanks [@seanmcguire12](https://github.com/seanmcguire12)! - fix: allow
xpaths with prepended 'xpath=' for targeted extract

- [#864](#864)
[`a611115`](a611115)
Thanks [@miguelg719](https://github.com/miguelg719)! - Temporarily patch
custom clients serialization error on api

- [#881](#881)
[`69913fe`](69913fe)
Thanks [@miguelg719](https://github.com/miguelg719)! - Pass sdk version
number to API for debugging

- [#913](#913)
[`b1b83a1`](b1b83a1)
Thanks [@seanmcguire12](https://github.com/seanmcguire12)! - move iframe
out of 'experimental'

- [#891](#891)
[`be8497c`](be8497c)
Thanks [@seanmcguire12](https://github.com/seanmcguire12)! - fix: nested
iframe xpath bug

- [#883](#883)
[`98704c9`](98704c9)
Thanks [@seanmcguire12](https://github.com/seanmcguire12)! - add timeout
for JS click

- [#907](#907)
[`04978bd`](04978bd)
Thanks [@seanmcguire12](https://github.com/seanmcguire12)! - store
mapping of CDP frame ID -> page

## @browserbasehq/[email protected]

### Patch Changes

- Updated dependencies
\[[`6b4e6e3`](6b4e6e3),
[`e77d018`](e77d018),
[`c20adb9`](c20adb9),
[`b86df93`](b86df93),
[`023c2c2`](023c2c2),
[`8c28647`](8c28647),
[`87e09c6`](87e09c6),
[`a611115`](a611115),
[`69913fe`](69913fe),
[`b1b83a1`](b1b83a1),
[`be8497c`](be8497c),
[`98704c9`](98704c9),
[`04978bd`](04978bd)]:
    -   @browserbasehq/[email protected]

## @browserbasehq/[email protected]

### Patch Changes

- Updated dependencies
\[[`6b4e6e3`](6b4e6e3),
[`e77d018`](e77d018),
[`c20adb9`](c20adb9),
[`b86df93`](b86df93),
[`023c2c2`](023c2c2),
[`8c28647`](8c28647),
[`87e09c6`](87e09c6),
[`a611115`](a611115),
[`69913fe`](69913fe),
[`b1b83a1`](b1b83a1),
[`be8497c`](be8497c),
[`98704c9`](98704c9),
[`04978bd`](04978bd)]:
    -   @browserbasehq/[email protected]

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
parity To note required feature parity in client SDKs
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants