[RFC] Plan Execute & Reflect Agent

Summary

This RFC proposes a new ML-Commons agent type, plan_execute_and_reflect, which breaks down complex user tasks into discrete steps, executes them via a lightweight executor (Conversational ReAct) agent, and dynamically refines its plan based on intermediate results. This separation of planning and execution leverages specialized LLMs and supports both step-wise and batch re-evaluation strategies.

Motivation

As AI-driven workflows grow more complex, single-turn agents struggle to orchestrate multi-step logic reliably. OpenSearch ML-Commons currently lacks a standardized long-running agent capable of:

Decomposing user objectives into actionable sub-tasks
Executing each sub-task efficiently
Adapting the plan in-flight based on partial results
Maintaining history and supporting asynchronous execution

By introducing a plan_execute_and_reflect agent, ML-Commons users gain a flexible, extensible framework for advanced, multi-step interactions with OpenSearch and external tools.

By leveraging this agent, users can build features like deep-research, observability agent, etc.
Example: #3650

Goals

Dynamic Planning: Use a dedicated planner LLM to generate and refine execution plans.
Efficient Execution: Delegate sub-task execution to a ReAct executor that can employ a cheaper/faster model.
Adaptive Re-evaluation: Support both per-step and batch re-evaluation strategies to balance accuracy and cost.
Tooling Integration: Leverage existing OpenSearch tools (e.g., ListIndexTool, SearchIndexTool, IndexMappingTool) via function calling.
Asynchronous & Long-running: Allow agents to run asynchronously with checkpointing and resume capabilities.
Minimal User Overhead: Hide orchestration details; require minimal agent configuration.

Design

Phases

Planning: Planner LLM produces an ordered list of steps.
Step Execution: A ReAct executor LLM invokes defined tools to satisfy each step.
Re-evaluation: Planner LLM ingests execution history and either refines the plan or returns final result.

Planner vs Executor LLM

Planner: High-capability LLM responsible for plan generation and re-evaluation. Does not invoke tools directly.
Executor (ReAct) Agent: Simpler LLM wired with tools via function calling to perform each plan step.

Re-evaluation Strategy

Agent supports two modes:

Per-step: After each step execution, trigger re-evaluation—adaptive but higher LLM calls.
Batch: Execute full plan then re-evaluate—fewer calls but potentially wasted work.

Features

Supports MCP: Users can provide an MCP server to the agent to use the tools available in that server
Asynchronous execution: Agent has the capability to run asynchronously.
Function Calling: Leverage function calling for tool execution

API Definition

1. Model Registration Example

POST /_plugins/_ml/models/_register
{
  "name": "My model",
  "function_name": "remote",
  "description": "test model",
  "connector": {
    "name": "My connector",
    "description": "my test connector",
    "version": 1,
    "protocol": "<PROTOCOL>",
    "parameters": {
      "region": "<REGION>",
      "service_name": "<SERVICE>",
      "model": "<MODEL>"
    },
    "credential": {
      "access_key": "{{ _.access_key }}",
      "secret_key": "{{ _.secret_key }}",
      "session_token": "{{ _.session_token }}"
    },
    "actions": [
      {
        "action_type": "predict",
        "method": "POST",
        "url": "<URL>",
        "headers": {
          "content-type": "application/json"
        },
        "request_body": "{ \"system\": [{\"text\": \"**${parameters.system_prompt}**\"}], \"messages\": [**${parameters._chat_history:-}**{\"role\":\"user\",\"content\":[{\"text\":\"**${parameters.prompt}**\"}]}**${parameters._interactions:-}]${parameters.tool_configs:-}** }"
      }
    ]
  }
}

2. Agent Registration Example

POST /_plugins/_ml/agents/_register
{
  "name": "My Chat Agent with Claude 3.7",
  "type": "plan_execute_and_reflect",
  "description": "this is a test agent",
  "llm": {
    "model_id": "<MODEL_ID>",
    "parameters": {
      "prompt": "${parameters.question}"
    }
  },
  "memory": {
    "type": "conversation_index"
  },
  "parameters": {
    "_llm_interface": "<Function calling interface for model type>" // Example: bedrock/converse/claude
  },
  "tools": [
    {
      "type": "ListIndexTool"
    },
    {
      "type": "SearchIndexTool"
    },
    {
      "type": "IndexMappingTool"
    }
....
  ],
  "app_type": "os_chat"
}

3. Execution Request

POST /_plugins/_ml/agents/<agent_id>/_execute
{
  "parameters": {
    "question": "Give me statistics about my cluster"
  }
}

Memory & History

DeepResearch agent stores top-level steps and results in ml-memory-* indices, supports checkpoint/resume via memory_id.
ReAct executor holds sub-step history in-memory per step; final result of each step is saved to memory indices.

Prompts

Below prompts can be modified by the user by providing them as parameters.

Planner Prompt

For the given objective, come up with a simple step by step plan. This plan should involve individual tasks, that if executed correctly will yield the correct answer. Do not add any superfluous steps. The result of the final step should be the final answer. Make sure that each step has all the information needed - do not skip steps. At all costs, do not execute the steps. You will be told when to execute the steps.

Objective: <PROMPT>

ALWAYS follow the given response instructions. Do not return any content that does not follow the response instructions. Do not add anything before or after the expected JSON

Response Instructions: Always respond with a valid JSON object that strictly follows the below schema:
{
        "steps": array[string],
        "result": string
}
Use "steps" to return an array of strings where each string is a step to complete the objective, leave it empty if you know the final result. Please wrap each step in quotes and escape any special characters within the string.
Use "result" return the final response when you have enough information, leave it empty if you want to execute more steps
Here are examples of valid responses:

Example 1 - When you need to execute steps:
{
        "steps": ["Search for logs containing error messages in the last hour", "Analyze the frequency of each error type", "Check system metrics during error spikes"],
        "result": ""
}

Example 2 - When you have the final result:
{
        "steps": [],
        "result": "Based on the analysis, the root cause of the system slowdown was a memory leak in the authentication service, which started at 14:30 UTC."
}

IMPORTANT RULES:
1. DO NOT use commas within individual steps
2. DO NOT add any content before or after the JSON
3. ONLY respond with a pure JSON object
4. DO NOT USE ANY TOOLS. TOOLS ARE PROVIDED ONLY FOR YOU TO MAKE A PLAN.

Re-evaluation Prompt

For the given objective, come up with a simple step by step plan. This plan should involve individual tasks, that if executed correctly will yield the correct answer. Do not add any superfluous steps. The result of the final step should be the final answer. Make sure that each step has all the information needed - do not skip steps. At all costs, do not execute the steps. You will be told when to execute the steps.

Objective: <PROMPT>

Original plan:
[.., ...]

You have currently executed the following steps:
[Step 1, Result of Step 1]

Update your plan accordingly. If no more steps are needed and you can return to the user, then respond with that. Otherwise, fill out the plan. Only add steps to the plan that still NEED to be done. Do not return previously done steps as part of the plan. Please follow the below response format.

ALWAYS follow the given response instructions. Do not return any content that does not follow the response instructions. Do not add anything before or after the expected JSON

Always respond with a valid JSON object that strictly follows the below schema:
{
        "steps": array[string],
        "result": string
}
Use "steps" to return an array of strings where each string is a step to complete the objective, leave it empty if you know the final result. Please wrap each step in quotes and escape any special characters within the string.
Use "result" return the final response when you have enough information, leave it empty if you want to execute more steps
Here are examples of valid responses:

Example 1 - When you need to execute steps:
{
        "steps": ["Search for logs containing error messages in the last hour", "Analyze the frequency of each error type", "Check system metrics during error spikes"],
        "result": ""
}

Example 2 - When you have the final result:
{
        "steps": [],
        "result": "Based on the analysis, the root cause of the system slowdown was a memory leak in the authentication service, which started at 14:30 UTC."
}

IMPORTANT RULES:
1. DO NOT use commas within individual steps
2. DO NOT add any content before or after the JSON
3. ONLY respond with a pure JSON object
4. DO NOT USE ANY TOOLS. TOOLS ARE PROVIDED ONLY FOR YOU TO MAKE A PLAN.

Example Workflows

Re-evaluate After Each Step

In the above diagram the workflow is as follows:

User provides the deep-research agent with a task.
Deep-research agent forwards the task to the planner LLM
LLM returns with a plan (a series of steps to execute)
Deep-research agent forwards the first step of the plan to the reAct agent
reAgent executes the steps and returns the response
Deep-research agent forwards the executed steps and the original plan to the planner LLM
Planner-LLM either returns the final result or a refined plan.
If the Planner-LLM returns the final result, it is returned back to the user. Else, if the planner-LLM returns a new plan, we execute the steps (4-7) until the LLM has enough information to return back to the user.

Re-evaluate After All Steps

In the above diagram the workflow is as follows:

User provides the deep-research agent with a task.
Deep-research agent forwards the task to the planner LLM
LLM returns with a plan (a series of steps to execute)
Deep-research agent forwards all steps of the plan to the reAct agent
reAgent executes all the steps and returns the response
** Loop 4-5 runs until all steps are executed
Deep-research agent forwards the executed steps and the original plan to the planner LLM
Planner-LLM either returns the final result or a refined plan.
If the Planner-LLM returns the final result, it is returned back to the user. Else, if the planner-LLM returns a new plan, we execute the steps (4-7) until the LLM has enough information to return back to the user.

Enhancements & Future Work

Parallel Step Execution via function-calling optimizations.
Checkpointing for fault tolerance.

Please share feedback, alternative designs, and suggested refinements.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[RFC] Plan Execute & Reflect Agent #3745

Summary

Motivation

Goals

Design

Phases

Planner vs Executor LLM

Re-evaluation Strategy

Features

API Definition

1. Model Registration Example

2. Agent Registration Example

3. Execution Request

Memory & History

Prompts

Planner Prompt

Re-evaluation Prompt

Example Workflows

Re-evaluate After Each Step

Re-evaluate After All Steps

Enhancements & Future Work

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[RFC] Plan Execute & Reflect Agent #3745

Description

Summary

Motivation

Goals

Design

Phases

Planner vs Executor LLM

Re-evaluation Strategy

Features

API Definition

1. Model Registration Example

2. Agent Registration Example

3. Execution Request

Memory & History

Prompts

Planner Prompt

Re-evaluation Prompt

Example Workflows

Re-evaluate After Each Step

Re-evaluate After All Steps

Enhancements & Future Work

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions