Skip to content

wrap braintrust to get llm usage data #637

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 26 commits into from
Apr 11, 2025
Merged

wrap braintrust to get llm usage data #637

merged 26 commits into from
Apr 11, 2025

Conversation

kamath
Copy link
Contributor

@kamath kamath commented Apr 5, 2025

why

what changed

  • Edited Stagehand Default Error to forward the stack trace/message as well
  • Moved LLM Client evals into separate directory (need to re-add on CI)
  • Made a unit CI test for testing core features in act
  • Refactored Evals to use Braintrust AI proxy and thereby support p much any LLM by name
  • Used Gemini AI SDK for Gemini models since their OpenAI proxy is super finicky with structured outputs
  • Wrapped LLM Client in Braintrust to get LLM metrics data
  • ^ in order to do that I had to edit all evals to take in a Stagehand object instead of calling initStagehand in each eval

test plan

this is it

Copy link

changeset-bot bot commented Apr 5, 2025

🦋 Changeset detected

Latest commit: 142a8af

The changes in this PR will be included in the next version bump.

This PR includes changesets to release 1 package
Name Type
@browserbasehq/stagehand Patch

Not sure what this means? Click here to learn what changesets are.

Click here if you're a maintainer who wants to add another changeset to this PR

constructor(error?: unknown) {
if (error instanceof Error || error instanceof StagehandError) {
super(
`\nHey! We're sorry you ran into an error. \nIf you need help, please open a Github issue or reach out to us on Slack: https://stagehand.dev/slack\n\nFull error:\n${error.message}`,
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

helps us see the stacktrace when StagehandDefaultError is thrown

@@ -3,7 +3,7 @@ import dotenv from "dotenv";
dotenv.config();

const StagehandConfig: ConstructorParams = {
verbose: 1 /* Verbosity level for logging: 0 = silent, 1 = info, 2 = all */,
verbose: 2 /* Verbosity level for logging: 0 = silent, 1 = info, 2 = all */,
Copy link
Contributor Author

@kamath kamath Apr 5, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can change back but nice to have

@kamath kamath force-pushed the anirudh/wrap-braintrust branch from cd6e068 to 3bc4a7c Compare April 5, 2025 05:53
@kamath kamath changed the title [wip] wrap braintrust to get llm usage data wrap braintrust to get llm usage data Apr 6, 2025
"@browserbasehq/stagehand": patch
---

Fix: forward along the stack trace in StagehandDefaultError
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

now it's throwing the error multiple times, need a fast follow PR to only throw StagehandDefaultError once

@@ -1,19 +1,22 @@
import { Stagehand } from "@/dist";
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently not running these in CI. Will add in a fast-follow PR

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what is the run command for these? what is the benefit in taking them out of the tasks directory? Also make sure you remove the step from CI, otherwise it will keep failing

evals/args.ts Outdated
@@ -66,6 +66,8 @@ const DEFAULT_EVAL_CATEGORIES = process.env.EVAL_CATEGORIES
"regression_llm_providers",
"regression_text_extract",
"regression_dom_extract",
"llm_clients",
"unit",
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

currently running neither in CI, will add in a fast-follow PR

@kamath kamath marked this pull request as ready for review April 6, 2025 03:52
Copy link
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PR Summary

This PR centralizes Stagehand initialization to support Braintrust LLM metrics and standardizes error logging across evaluations.

  • /evals/evals.config.json: Reassigned evaluation categories and removed obsolete tasks.
  • /evals/initStagehand.ts: Removed modelName support; now requires a pre-initialized llmClient.
  • /evals/index.eval.ts: Wrapped LLM client initialization with Braintrust proxy and unified error forwarding.
  • /evals/tasks/*: All tasks now accept an externally provided stagehand instance (plus debugUrl/sessionUrl), eliminating internal initStagehand calls.
  • /evals/logger.ts & /lib/StagehandPage.ts: Improved error logging with full stack trace and message forwarding.

88 file(s) reviewed, 1 comment(s)
Edit PR Review Bot Settings | Greptile

Comment on lines +9 to 15
constructor(error?: unknown) {
if (error instanceof Error || error instanceof StagehandError) {
super(
`\nHey! We're sorry you ran into an error. \nIf you need help, please open a Github issue or reach out to us on Slack: https://stagehand.dev/slack\n\nFull error:\n${error.message}`,
);
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

logic: Ensure the constructor always calls super, even when error is undefined or not an instance of Error.

Suggested change
constructor(error?: unknown) {
if (error instanceof Error || error instanceof StagehandError) {
super(
`\nHey! We're sorry you ran into an error. \nIf you need help, please open a Github issue or reach out to us on Slack: https://stagehand.dev/slack\n\nFull error:\n${error.message}`,
);
}
}
constructor(error?: unknown) {
if (error instanceof Error || error instanceof StagehandError) {
super(
`\nHey! We're sorry you ran into an error. \nIf you need help, please open a Github issue or reach out to us on Slack: https://stagehand.dev/slack\n\nFull error:\n${error.message}`,
);
} else {
super('An unknown error occurred. If you need help, please open a Github issue or reach out to us on Slack: https://stagehand.dev/slack');
}
}

sessionUrl: string;
useTextExtract: boolean;
stagehandConfig: ConstructorParams;
};
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

MoveStagehandInitResult to types/evals.ts

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why delete?

@kamath kamath force-pushed the anirudh/wrap-braintrust branch from 2801646 to 78a21b1 Compare April 10, 2025 17:21
@seanmcguire12 seanmcguire12 added act These changes pertain to the act function extract These changes pertain to the extract function observe These changes pertain to the observe function combination These changes affect multiple Stagehand functions targeted-extract These changes pertain to targeted extract labels Apr 10, 2025
@seanmcguire12 seanmcguire12 self-requested a review April 11, 2025 01:39
@seanmcguire12 seanmcguire12 removed act These changes pertain to the act function extract These changes pertain to the extract function observe These changes pertain to the observe function combination These changes affect multiple Stagehand functions text-extract targeted-extract These changes pertain to targeted extract labels Apr 11, 2025
@seanmcguire12 seanmcguire12 added the targeted-extract These changes pertain to targeted extract label Apr 11, 2025
@kamath kamath merged commit 944bbbf into main Apr 11, 2025
14 of 26 checks passed
@github-actions github-actions bot mentioned this pull request Apr 11, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
targeted-extract These changes pertain to targeted extract
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants