-
Notifications
You must be signed in to change notification settings - Fork 952
Description
Stagehand's act()
method incorrectly identifies regular DOM input element as being inside Shadow DOM, returning "method": "not-supported"
and "selector": "not-supported"
for elements that are accessible via standard DOM queries.
Environment
- Stagehand Version:
^2.4.2
- Browser: Chromium (Playwright)
- LLM: Ollama (gemma3:12b) using CustomOpenAIClient
- FE Framework: React with Material-UI (MUI) components
- OS: macOS (darwin 24.5.0)
Stagehand correctly identifies the target element but incorrectly reports it as being inside Shadow DOM:
[2025-08-06 18:12:10.123 +0300] INFO: Acting on instruction: click on create new
[2025-08-06 18:12:10.456 +0300] INFO: Found element: button "Create New"
[2025-08-06 18:12:10.789 +0300] INFO: Action completed successfully
[2025-08-06 18:12:12.091 +0300] INFO: Acting on instruction: fill the name input field
[2025-08-06 18:12:12.091 +0300] INFO: LLM identified: textbox: Enter name
[2025-08-06 18:12:12.091 +0300] ERROR: Element is inside a shadow DOM: 868
category: "observation"
[2025-08-06 18:12:12.091 +0300] INFO: found elements
category: "observation"
elements: [
{
"description": "an element inside a shadow DOM",
"method": "not-supported",
"selector": "not-supported"
}
]
CustomOpenAI.ts (complete implementation):
/**
* Based on the official Stagehand custom OpenAI client template
* Modified for Ollama integration via OpenAI-compatible API
*/
import {
AvailableModel,
CreateChatCompletionOptions,
LLMClient,
} from "@browserbasehq/stagehand";
import OpenAI from "openai";
import { zodResponseFormat } from "openai/helpers/zod";
import type {
ChatCompletion,
ChatCompletionAssistantMessageParam,
ChatCompletionContentPartImage,
ChatCompletionContentPartText,
ChatCompletionCreateParamsNonStreaming,
ChatCompletionMessageParam,
ChatCompletionSystemMessageParam,
ChatCompletionUserMessageParam,
} from "openai/resources/chat/completions";
import { z } from "zod";
class CreateChatCompletionResponseError extends Error {
constructor(message: string) {
super(message);
this.name = "CreateChatCompletionResponseError";
}
}
function validateZodSchema(schema: z.ZodTypeAny, data: unknown) {
try {
schema.parse(data);
return true;
} catch {
return false;
}
}
export class CustomOpenAIClient extends LLMClient {
public type = "openai" as const;
private client: OpenAI;
constructor({ modelName, client }: { modelName: string; client: OpenAI }) {
super(modelName as AvailableModel);
this.client = client;
this.modelName = modelName as AvailableModel;
}
async createChatCompletion<T = ChatCompletion>({
options,
retries = 3,
logger,
}: CreateChatCompletionOptions): Promise<T> {
const { image, requestId, ...optionsWithoutImageAndRequestId } = options;
if (image) {
console.warn(
"Image provided. Vision is not currently supported for openai"
);
}
logger({
category: "openai",
message: "creating chat completion",
level: 1,
auxiliary: {
options: {
value: JSON.stringify({
...optionsWithoutImageAndRequestId,
requestId,
}),
type: "object",
},
modelName: {
value: this.modelName,
type: "string",
},
},
});
let responseFormat: any = undefined;
if (options.response_model) {
responseFormat = zodResponseFormat(
options.response_model.schema,
options.response_model.name
);
}
const { response_model, ...openaiOptions } = {
...optionsWithoutImageAndRequestId,
model: this.modelName,
};
const formattedMessages: ChatCompletionMessageParam[] =
options.messages.map((message) => {
if (Array.isArray(message.content)) {
const contentParts = message.content.map((content) => {
if ("image_url" in content && content.image_url) {
const imageContent: ChatCompletionContentPartImage = {
image_url: {
url: content.image_url.url,
},
type: "image_url",
};
return imageContent;
} else {
const textContent: ChatCompletionContentPartText = {
text: content.text || "",
type: "text",
};
return textContent;
}
});
if (message.role === "system") {
const formattedMessage: ChatCompletionSystemMessageParam = {
...message,
role: "system",
content: contentParts.filter(
(content): content is ChatCompletionContentPartText =>
content.type === "text"
),
};
return formattedMessage;
} else if (message.role === "user") {
const formattedMessage: ChatCompletionUserMessageParam = {
...message,
role: "user",
content: contentParts,
};
return formattedMessage;
} else {
const formattedMessage: ChatCompletionAssistantMessageParam = {
...message,
role: "assistant",
content: contentParts.filter(
(content): content is ChatCompletionContentPartText =>
content.type === "text"
),
};
return formattedMessage;
}
}
const formattedMessage: ChatCompletionUserMessageParam = {
role: "user",
content: message.content || "",
};
return formattedMessage;
});
const body: ChatCompletionCreateParamsNonStreaming = {
...openaiOptions,
model: this.modelName,
messages: formattedMessages,
response_format: responseFormat,
stream: false,
tools: options.tools?.map((tool) => ({
function: {
name: tool.name,
description: tool.description,
parameters: tool.parameters,
},
type: "function",
})),
};
const response = await this.client.chat.completions.create(body);
logger({
category: "openai",
message: "response",
level: 1,
auxiliary: {
response: {
value: JSON.stringify(response),
type: "object",
},
requestId: {
value: requestId || "",
type: "string",
},
},
});
if (options.response_model) {
const extractedData = response.choices[0].message.content;
if (!extractedData) {
throw new CreateChatCompletionResponseError("No content in response");
}
const parsedData = JSON.parse(extractedData);
if (!validateZodSchema(options.response_model.schema, parsedData)) {
if (retries > 0) {
return this.createChatCompletion({
options,
logger,
retries: retries - 1,
});
}
throw new CreateChatCompletionResponseError("Invalid response schema");
}
return {
data: parsedData,
usage: {
prompt_tokens: response.usage?.prompt_tokens ?? 0,
completion_tokens: response.usage?.completion_tokens ?? 0,
total_tokens: response.usage?.total_tokens ?? 0,
},
} as T;
}
return {
data: response.choices[0].message.content,
usage: {
prompt_tokens: response.usage?.prompt_tokens ?? 0,
completion_tokens: response.usage?.completion_tokens ?? 0,
total_tokens: response.usage?.total_tokens ?? 0,
},
} as T;
}
}
stagehand.config.ts:
import { Stagehand } from "@browserbasehq/stagehand";
import OpenAI from "openai";
import { CustomOpenAIClient } from "./external_clients/customOpenAI";
export const createStagehandWithOllama = () => {
return new Stagehand({
env: "LOCAL" as const,
llmClient: new CustomOpenAIClient({
modelName: "gemma3:12b",
client: new OpenAI({
apiKey: "ollama",
baseURL: "http://localhost:11434/v1",
}),
}),
localBrowserLaunchOptions: {
headless: false,
},
domSettleTimeoutMs: 30000,
enableCaching: true,
verbose: 1 as const,
selfHeal: true,
});
};
Some more info I gathered:
DOM Analysis Shows No Shadow DOM
Manual inspection via Playwright reveals:
- 0 iframes on the page
- 0 Shadow DOM hosts on the page
- Regular input element accessible at:
Input 20: placeholder="Enter name", type="text", name="null"
LLM Correctly Identifies Element
The LLM successfully identifies the target: textbox: Enter name
- indicating the accessibility tree parsing is working correctly.
Playwright Can Interact Successfully
Standard Playwright can successfully interact with the same element:
const nameInput = page.locator('input[placeholder="Enter name"]');
await nameInput.fill("test"); // Works perfectly
Test Setup
// Using CustomOpenAI client for Ollama integration
const stagehand = createStagehandWithOllama();
await stagehand.init();
Since I can't provide the web app code / URL (its an internal tool in a company i work in), this is a scenario of what i tried to test:
- Navigate to a form containing MUI input components
- Use Stagehand to interact with buttons/dropdowns (these work fine)
- Attempt to use
await page.act("fill the input field")
on a text input - Observe the Shadow DOM error despite no Shadow DOM being present
Form Structure
The failing input element is within:
- React application
- Material-UI (MUI) form components
- Standard HTML form element (no custom shadow roots)
- Regular DOM hierarchy (verified via
document.querySelectorAll('*')
) - Part of MUI TextField component (but rendered as standard DOM)
Element Details
<input placeholder="Enter name" type="text" name="null" />
This suggests an issue in:
- Shadow DOM detection logic (false positive)
- XPath/selector generation for certain input patterns
- Element traversal within form contexts
Workaround in case some one needs it as i don't think it happens with a standard input element
Currently using a hybrid approach:
// Use Stagehand for elements it handles well
await stagehand.page.act("click on create new");
await stagehand.page.act("click on item1");
// Fallback to Playwright for problematic inputs
const input = stagehand.page.locator('input[placeholder="Enter name"]');
await input.fill("test");
Thanks.