Skip to content

Bug: AgentCoreMemorySaver causes Bedrock ValidationException with orphaned tool_calls #726

@murilosimao

Description

@murilosimao

Bug Description

When using AgentCoreMemorySaver with agents that make tool calls, the application fails with a Bedrock ValidationException on subsequent invocations after checkpoint restoration.

Error Message

ValidationException: messages.10: `tool_use` ids were found without `tool_result` blocks immediately after: toolu_bdrk_01VEoHG5RKNdXV5vX9HegPop. Each `tool_use` block must have a corresponding `tool_result` block in the next message.

Root Cause

The issue occurs when a checkpoint is saved during tool execution - between an AIMessage with tool_calls and its corresponding ToolMessage. When this checkpoint is restored, it contains "orphaned" tool_calls (tool_calls without corresponding ToolMessages), which Bedrock rejects as invalid.

Steps to Reproduce

  1. Create an agent with AgentCoreMemorySaver as checkpointer
  2. Invoke the agent with a query that triggers tool calls
  3. A checkpoint is saved during tool execution (before ToolMessage is added)
  4. Restore the checkpoint in a new invocation
  5. Bedrock raises ValidationException due to orphaned tool_calls

Minimal Reproduction Script

import asyncio
import os
from langchain_core.messages import AIMessage, HumanMessage
from langchain.chat_models import init_chat_model
from langgraph.checkpoint.base import Checkpoint
from langgraph_checkpoint_aws.agentcore.saver import AgentCoreMemorySaver


async def reproduce_bug():
    memory_id = os.getenv("AGENTCORE_MEMORY_ID")
    checkpointer = AgentCoreMemorySaver(memory_id, region_name="us-east-1")

    config = {
        "configurable": {
            "thread_id": "test-thread",
            "actor_id": "test-actor",
        }
    }

    # Simulate checkpoint during tool execution
    messages_with_orphan = [
        HumanMessage(content="What's the weather?"),
        AIMessage(
            content="",
            tool_calls=[{
                "id": "toolu_123",
                "name": "get_weather",
                "args": {"city": "SF"},
            }],
        ),
        # No ToolMessage - checkpoint saved during tool execution
    ]

    checkpoint = Checkpoint(
        v=1,
        id="checkpoint_1",
        ts="2024-01-01T00:00:00Z",
        channel_values={"messages": messages_with_orphan},
        channel_versions={"messages": "1"},
        versions_seen={},
        pending_sends=[],
    )

    checkpointer.put(config, checkpoint, {}, {"messages": "1"})
    
    # Load and try to use with Bedrock
    loaded_tuple = checkpointer.get_tuple(config)
    loaded_messages = loaded_tuple.checkpoint["channel_values"]["messages"]

    llm = init_chat_model(
        "us.anthropic.claude-haiku-4-5-20251001-v1:0",
        model_provider="bedrock",
        region_name="us-east-1",
    )

    # This will raise ValidationException
    await llm.ainvoke(loaded_messages)


if __name__ == "__main__":
    asyncio.run(reproduce_bug())

Expected Behavior

The checkpoint should load successfully and the agent should continue execution without errors.

Actual Behavior

Bedrock raises ValidationException because it requires all tool_use blocks to have corresponding tool_result blocks immediately after.

Environment

  • langgraph-checkpoint-aws version: 1.0.0
  • langchain-aws version: (latest)
  • Python version: 3.13
  • AWS Region: us-east-1
  • Bedrock Model: claude-haiku-4-5

Impact

This bug affects any agent using:

  • AgentCoreMemorySaver for checkpointing
  • Tool calls in their workflow
  • Bedrock (or similar LLM providers with strict tool_use/tool_result validation)

The agent becomes unusable after the first tool call because subsequent invocations fail when loading the checkpoint.

Proposed Solution

Clean orphaned tool_calls from AIMessages when loading checkpoints by:

  1. Identifying tool_calls without corresponding ToolMessages
  2. Removing those orphaned tool_calls from the AIMessage
  3. Preserving complete tool_call/ToolMessage pairs

This ensures messages are always in a valid state for LLM providers.

Related PR

I have a fix ready and will submit a PR shortly.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions