-
Notifications
You must be signed in to change notification settings - Fork 955
removed all images except the latest two in AnthropicCuaClient #905
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
🦋 Changeset detectedLatest commit: 63b38b5 The changes in this PR will be included in the next version bump. Not sure what this means? Click here to learn what changesets are. Click here if you're a maintainer who wants to add another changeset to this PR |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Greptile Summary
This PR implements an image compression optimization for the Anthropic CUA client to address memory and token usage concerns during long-running agent tasks. The change introduces an experimental feature that removes older screenshots from the conversation history while preserving the most recent two images.
The implementation adds an experimental
flag that flows through the entire system architecture:
- From the main Stagehand configuration (
lib/index.ts
) - Through the agent handler (
lib/handlers/agentHandler.ts
) - To the AgentProvider (
lib/agent/AgentProvider.ts
) - Finally reaching the AnthropicCUAClient (
lib/agent/AnthropicCUAClient.ts
)
When enabled, the system uses a new utility function (lib/agent/imageCompressionUtils.ts
) that identifies conversation items containing images and replaces older screenshots with the text placeholder "screenshot taken". This compression happens before each new assistant message is added to the conversation history.
The feature addresses STG-586, where the Anthropic CUA client was accumulating all screenshots throughout task progression, potentially causing context window overflow and excessive token consumption. By keeping only the two most recent screenshots, the system maintains relevant visual context while significantly reducing memory footprint. The experimental flag provides a safe rollout mechanism for testing this optimization without affecting existing users.
Confidence score: 4/5
- This PR appears safe to merge with proper testing, implementing a well-architected experimental feature for memory optimization
- The score reflects the experimental nature of the feature and potential edge cases around image content handling that should be monitored
- The
imageCompressionUtils.ts
file needs attention for robust error handling and edge case management
6 files reviewed, 3 comments
🔄 Feature Parity Issue Created An issue has been automatically created in the Python SDK repository to track parity implementation: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm just left a comment
This PR was opened by the [Changesets release](https://github.com/changesets/action) GitHub action. When you're ready to do a release, you can merge this and the packages will be published to npm automatically. If you're not ready to do a release yet, that's fine, whenever you add more changesets to main, this PR will be updated. # Releases ## @browserbasehq/[email protected] ### Patch Changes - [#865](#865) [`6b4e6e3`](6b4e6e3) Thanks [@seanmcguire12](https://github.com/seanmcguire12)! - improve type safety for trimTrailingTextNode - [#897](#897) [`e77d018`](e77d018) Thanks [@miguelg719](https://github.com/miguelg719)! - Fix selfHeal to remember intially received arguments - [#920](#920) [`c20adb9`](c20adb9) Thanks [@seanmcguire12](https://github.com/seanmcguire12)! - fix: tab handling on API - [#882](#882) [`b86df93`](b86df93) Thanks [@seanmcguire12](https://github.com/seanmcguire12)! - remove elements that don't have xpaths from observe response - [#905](#905) [`023c2c2`](023c2c2) Thanks [@tkattkat](https://github.com/tkattkat)! - Delete old images from anthropic cua client - [#925](#925) [`8c28647`](8c28647) Thanks [@miguelg719](https://github.com/miguelg719)! - Remove \_refreshPageFromApi() - [#887](#887) [`87e09c6`](87e09c6) Thanks [@seanmcguire12](https://github.com/seanmcguire12)! - fix: allow xpaths with prepended 'xpath=' for targeted extract - [#864](#864) [`a611115`](a611115) Thanks [@miguelg719](https://github.com/miguelg719)! - Temporarily patch custom clients serialization error on api - [#881](#881) [`69913fe`](69913fe) Thanks [@miguelg719](https://github.com/miguelg719)! - Pass sdk version number to API for debugging - [#913](#913) [`b1b83a1`](b1b83a1) Thanks [@seanmcguire12](https://github.com/seanmcguire12)! - move iframe out of 'experimental' - [#891](#891) [`be8497c`](be8497c) Thanks [@seanmcguire12](https://github.com/seanmcguire12)! - fix: nested iframe xpath bug - [#883](#883) [`98704c9`](98704c9) Thanks [@seanmcguire12](https://github.com/seanmcguire12)! - add timeout for JS click - [#907](#907) [`04978bd`](04978bd) Thanks [@seanmcguire12](https://github.com/seanmcguire12)! - store mapping of CDP frame ID -> page ## @browserbasehq/[email protected] ### Patch Changes - Updated dependencies \[[`6b4e6e3`](6b4e6e3), [`e77d018`](e77d018), [`c20adb9`](c20adb9), [`b86df93`](b86df93), [`023c2c2`](023c2c2), [`8c28647`](8c28647), [`87e09c6`](87e09c6), [`a611115`](a611115), [`69913fe`](69913fe), [`b1b83a1`](b1b83a1), [`be8497c`](be8497c), [`98704c9`](98704c9), [`04978bd`](04978bd)]: - @browserbasehq/[email protected] ## @browserbasehq/[email protected] ### Patch Changes - Updated dependencies \[[`6b4e6e3`](6b4e6e3), [`e77d018`](e77d018), [`c20adb9`](c20adb9), [`b86df93`](b86df93), [`023c2c2`](023c2c2), [`8c28647`](8c28647), [`87e09c6`](87e09c6), [`a611115`](a611115), [`69913fe`](69913fe), [`b1b83a1`](b1b83a1), [`be8497c`](be8497c), [`98704c9`](98704c9), [`04978bd`](04978bd)]: - @browserbasehq/[email protected] Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
why
part of STG-586
Currently, we leave all of the images for anthropic cua client within the LLM's context as the task progresses
what changed
We now remove all screenshots aside from the last two when experimental flag is set to true within stagehand config
test plan
tested locally