Skip to content

Stagehand incorrectly identifies regular DOM input element as a Shadow DOM #943

@Kazaz-Or

Description

@Kazaz-Or

Stagehand's act() method incorrectly identifies regular DOM input element as being inside Shadow DOM, returning "method": "not-supported" and "selector": "not-supported" for elements that are accessible via standard DOM queries.

Environment

  • Stagehand Version: ^2.4.2
  • Browser: Chromium (Playwright)
  • LLM: Ollama (gemma3:12b) using CustomOpenAIClient
  • FE Framework: React with Material-UI (MUI) components
  • OS: macOS (darwin 24.5.0)

Stagehand correctly identifies the target element but incorrectly reports it as being inside Shadow DOM:

[2025-08-06 18:12:10.123 +0300] INFO: Acting on instruction: click on create new
[2025-08-06 18:12:10.456 +0300] INFO: Found element: button "Create New"
[2025-08-06 18:12:10.789 +0300] INFO: Action completed successfully
[2025-08-06 18:12:12.091 +0300] INFO: Acting on instruction: fill the name input field
[2025-08-06 18:12:12.091 +0300] INFO: LLM identified: textbox: Enter name
[2025-08-06 18:12:12.091 +0300] ERROR: Element is inside a shadow DOM: 868
    category: "observation"
[2025-08-06 18:12:12.091 +0300] INFO: found elements
    category: "observation"
    elements: [
      {
        "description": "an element inside a shadow DOM",
        "method": "not-supported",
        "selector": "not-supported"
      }
    ]

CustomOpenAI.ts (complete implementation):

/**
 * Based on the official Stagehand custom OpenAI client template
 * Modified for Ollama integration via OpenAI-compatible API
 */
import {
  AvailableModel,
  CreateChatCompletionOptions,
  LLMClient,
} from "@browserbasehq/stagehand";
import OpenAI from "openai";
import { zodResponseFormat } from "openai/helpers/zod";
import type {
  ChatCompletion,
  ChatCompletionAssistantMessageParam,
  ChatCompletionContentPartImage,
  ChatCompletionContentPartText,
  ChatCompletionCreateParamsNonStreaming,
  ChatCompletionMessageParam,
  ChatCompletionSystemMessageParam,
  ChatCompletionUserMessageParam,
} from "openai/resources/chat/completions";
import { z } from "zod";

class CreateChatCompletionResponseError extends Error {
  constructor(message: string) {
    super(message);
    this.name = "CreateChatCompletionResponseError";
  }
}

function validateZodSchema(schema: z.ZodTypeAny, data: unknown) {
  try {
    schema.parse(data);
    return true;
  } catch {
    return false;
  }
}

export class CustomOpenAIClient extends LLMClient {
  public type = "openai" as const;
  private client: OpenAI;

  constructor({ modelName, client }: { modelName: string; client: OpenAI }) {
    super(modelName as AvailableModel);
    this.client = client;
    this.modelName = modelName as AvailableModel;
  }

  async createChatCompletion<T = ChatCompletion>({
    options,
    retries = 3,
    logger,
  }: CreateChatCompletionOptions): Promise<T> {
    const { image, requestId, ...optionsWithoutImageAndRequestId } = options;

    if (image) {
      console.warn(
        "Image provided. Vision is not currently supported for openai"
      );
    }

    logger({
      category: "openai",
      message: "creating chat completion",
      level: 1,
      auxiliary: {
        options: {
          value: JSON.stringify({
            ...optionsWithoutImageAndRequestId,
            requestId,
          }),
          type: "object",
        },
        modelName: {
          value: this.modelName,
          type: "string",
        },
      },
    });

    let responseFormat: any = undefined;
    if (options.response_model) {
      responseFormat = zodResponseFormat(
        options.response_model.schema,
        options.response_model.name
      );
    }

    const { response_model, ...openaiOptions } = {
      ...optionsWithoutImageAndRequestId,
      model: this.modelName,
    };

    const formattedMessages: ChatCompletionMessageParam[] =
      options.messages.map((message) => {
        if (Array.isArray(message.content)) {
          const contentParts = message.content.map((content) => {
            if ("image_url" in content && content.image_url) {
              const imageContent: ChatCompletionContentPartImage = {
                image_url: {
                  url: content.image_url.url,
                },
                type: "image_url",
              };
              return imageContent;
            } else {
              const textContent: ChatCompletionContentPartText = {
                text: content.text || "",
                type: "text",
              };
              return textContent;
            }
          });

          if (message.role === "system") {
            const formattedMessage: ChatCompletionSystemMessageParam = {
              ...message,
              role: "system",
              content: contentParts.filter(
                (content): content is ChatCompletionContentPartText =>
                  content.type === "text"
              ),
            };
            return formattedMessage;
          } else if (message.role === "user") {
            const formattedMessage: ChatCompletionUserMessageParam = {
              ...message,
              role: "user",
              content: contentParts,
            };
            return formattedMessage;
          } else {
            const formattedMessage: ChatCompletionAssistantMessageParam = {
              ...message,
              role: "assistant",
              content: contentParts.filter(
                (content): content is ChatCompletionContentPartText =>
                  content.type === "text"
              ),
            };
            return formattedMessage;
          }
        }

        const formattedMessage: ChatCompletionUserMessageParam = {
          role: "user",
          content: message.content || "",
        };

        return formattedMessage;
      });

    const body: ChatCompletionCreateParamsNonStreaming = {
      ...openaiOptions,
      model: this.modelName,
      messages: formattedMessages,
      response_format: responseFormat,
      stream: false,
      tools: options.tools?.map((tool) => ({
        function: {
          name: tool.name,
          description: tool.description,
          parameters: tool.parameters,
        },
        type: "function",
      })),
    };

    const response = await this.client.chat.completions.create(body);

    logger({
      category: "openai",
      message: "response",
      level: 1,
      auxiliary: {
        response: {
          value: JSON.stringify(response),
          type: "object",
        },
        requestId: {
          value: requestId || "",
          type: "string",
        },
      },
    });

    if (options.response_model) {
      const extractedData = response.choices[0].message.content;
      if (!extractedData) {
        throw new CreateChatCompletionResponseError("No content in response");
      }
      const parsedData = JSON.parse(extractedData);

      if (!validateZodSchema(options.response_model.schema, parsedData)) {
        if (retries > 0) {
          return this.createChatCompletion({
            options,
            logger,
            retries: retries - 1,
          });
        }

        throw new CreateChatCompletionResponseError("Invalid response schema");
      }

      return {
        data: parsedData,
        usage: {
          prompt_tokens: response.usage?.prompt_tokens ?? 0,
          completion_tokens: response.usage?.completion_tokens ?? 0,
          total_tokens: response.usage?.total_tokens ?? 0,
        },
      } as T;
    }

    return {
      data: response.choices[0].message.content,
      usage: {
        prompt_tokens: response.usage?.prompt_tokens ?? 0,
        completion_tokens: response.usage?.completion_tokens ?? 0,
        total_tokens: response.usage?.total_tokens ?? 0,
      },
    } as T;
  }
}

stagehand.config.ts:

import { Stagehand } from "@browserbasehq/stagehand";
import OpenAI from "openai";
import { CustomOpenAIClient } from "./external_clients/customOpenAI";

export const createStagehandWithOllama = () => {
  return new Stagehand({
    env: "LOCAL" as const,
    llmClient: new CustomOpenAIClient({
      modelName: "gemma3:12b",
      client: new OpenAI({
        apiKey: "ollama",
        baseURL: "http://localhost:11434/v1",
      }),
    }),
    localBrowserLaunchOptions: {
      headless: false,
    },
    domSettleTimeoutMs: 30000,
    enableCaching: true,
    verbose: 1 as const,
    selfHeal: true,
  });
};

Some more info I gathered:

DOM Analysis Shows No Shadow DOM

Manual inspection via Playwright reveals:

  • 0 iframes on the page
  • 0 Shadow DOM hosts on the page
  • Regular input element accessible at: Input 20: placeholder="Enter name", type="text", name="null"

LLM Correctly Identifies Element

The LLM successfully identifies the target: textbox: Enter name - indicating the accessibility tree parsing is working correctly.

Playwright Can Interact Successfully

Standard Playwright can successfully interact with the same element:

const nameInput = page.locator('input[placeholder="Enter name"]');
await nameInput.fill("test"); // Works perfectly

Test Setup

// Using CustomOpenAI client for Ollama integration
const stagehand = createStagehandWithOllama();
await stagehand.init();

Since I can't provide the web app code / URL (its an internal tool in a company i work in), this is a scenario of what i tried to test:

  1. Navigate to a form containing MUI input components
  2. Use Stagehand to interact with buttons/dropdowns (these work fine)
  3. Attempt to use await page.act("fill the input field") on a text input
  4. Observe the Shadow DOM error despite no Shadow DOM being present

Form Structure

The failing input element is within:

  • React application
  • Material-UI (MUI) form components
  • Standard HTML form element (no custom shadow roots)
  • Regular DOM hierarchy (verified via document.querySelectorAll('*'))
  • Part of MUI TextField component (but rendered as standard DOM)

Element Details

<input placeholder="Enter name" type="text" name="null" />

This suggests an issue in:

  • Shadow DOM detection logic (false positive)
  • XPath/selector generation for certain input patterns
  • Element traversal within form contexts

Workaround in case some one needs it as i don't think it happens with a standard input element

Currently using a hybrid approach:

// Use Stagehand for elements it handles well
await stagehand.page.act("click on create new");
await stagehand.page.act("click on item1");

// Fallback to Playwright for problematic inputs
const input = stagehand.page.locator('input[placeholder="Enter name"]');
await input.fill("test");

Thanks.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions