Description
Summary
This RFC proposes a new ML-Commons agent type, plan_execute_and_reflect, which breaks down complex user tasks into discrete steps, executes them via a lightweight executor (Conversational ReAct) agent, and dynamically refines its plan based on intermediate results. This separation of planning and execution leverages specialized LLMs and supports both step-wise and batch re-evaluation strategies.
Motivation
As AI-driven workflows grow more complex, single-turn agents struggle to orchestrate multi-step logic reliably. OpenSearch ML-Commons currently lacks a standardized long-running agent capable of:
- Decomposing user objectives into actionable sub-tasks
- Executing each sub-task efficiently
- Adapting the plan in-flight based on partial results
- Maintaining history and supporting asynchronous execution
By introducing a plan_execute_and_reflect agent, ML-Commons users gain a flexible, extensible framework for advanced, multi-step interactions with OpenSearch and external tools.
By leveraging this agent, users can build features like deep-research, observability agent, etc.
Example: #3650
Goals
- Dynamic Planning: Use a dedicated planner LLM to generate and refine execution plans.
- Efficient Execution: Delegate sub-task execution to a ReAct executor that can employ a cheaper/faster model.
- Adaptive Re-evaluation: Support both per-step and batch re-evaluation strategies to balance accuracy and cost.
- Tooling Integration: Leverage existing OpenSearch tools (e.g., ListIndexTool, SearchIndexTool, IndexMappingTool) via function calling.
- Asynchronous & Long-running: Allow agents to run asynchronously with checkpointing and resume capabilities.
- Minimal User Overhead: Hide orchestration details; require minimal agent configuration.
Design
Phases
- Planning: Planner LLM produces an ordered list of steps.
- Step Execution: A ReAct executor LLM invokes defined tools to satisfy each step.
- Re-evaluation: Planner LLM ingests execution history and either refines the plan or returns final result.
Planner vs Executor LLM
- Planner: High-capability LLM responsible for plan generation and re-evaluation. Does not invoke tools directly.
- Executor (ReAct) Agent: Simpler LLM wired with tools via function calling to perform each plan step.
Re-evaluation Strategy
Agent supports two modes:
- Per-step: After each step execution, trigger re-evaluation—adaptive but higher LLM calls.
- Batch: Execute full plan then re-evaluate—fewer calls but potentially wasted work.
Features
- Supports MCP: Users can provide an MCP server to the agent to use the tools available in that server
- Asynchronous execution: Agent has the capability to run asynchronously.
- Function Calling: Leverage function calling for tool execution
API Definition
1. Model Registration Example
POST /_plugins/_ml/models/_register
{
"name": "My model",
"function_name": "remote",
"description": "test model",
"connector": {
"name": "My connector",
"description": "my test connector",
"version": 1,
"protocol": "<PROTOCOL>",
"parameters": {
"region": "<REGION>",
"service_name": "<SERVICE>",
"model": "<MODEL>"
},
"credential": {
"access_key": "{{ _.access_key }}",
"secret_key": "{{ _.secret_key }}",
"session_token": "{{ _.session_token }}"
},
"actions": [
{
"action_type": "predict",
"method": "POST",
"url": "<URL>",
"headers": {
"content-type": "application/json"
},
"request_body": "{ \"system\": [{\"text\": \"**${parameters.system_prompt}**\"}], \"messages\": [**${parameters._chat_history:-}**{\"role\":\"user\",\"content\":[{\"text\":\"**${parameters.prompt}**\"}]}**${parameters._interactions:-}]${parameters.tool_configs:-}** }"
}
]
}
}
2. Agent Registration Example
POST /_plugins/_ml/agents/_register
{
"name": "My Chat Agent with Claude 3.7",
"type": "plan_execute_and_reflect",
"description": "this is a test agent",
"llm": {
"model_id": "<MODEL_ID>",
"parameters": {
"prompt": "${parameters.question}"
}
},
"memory": {
"type": "conversation_index"
},
"parameters": {
"_llm_interface": "<Function calling interface for model type>" // Example: bedrock/converse/claude
},
"tools": [
{
"type": "ListIndexTool"
},
{
"type": "SearchIndexTool"
},
{
"type": "IndexMappingTool"
}
....
],
"app_type": "os_chat"
}
3. Execution Request
POST /_plugins/_ml/agents/<agent_id>/_execute
{
"parameters": {
"question": "Give me statistics about my cluster"
}
}
Memory & History
- DeepResearch agent stores top-level steps and results in
ml-memory-*
indices, supports checkpoint/resume viamemory_id
. - ReAct executor holds sub-step history in-memory per step; final result of each step is saved to memory indices.
Prompts
Below prompts can be modified by the user by providing them as parameters.
Planner Prompt
For the given objective, come up with a simple step by step plan. This plan should involve individual tasks, that if executed correctly will yield the correct answer. Do not add any superfluous steps. The result of the final step should be the final answer. Make sure that each step has all the information needed - do not skip steps. At all costs, do not execute the steps. You will be told when to execute the steps.
Objective: <PROMPT>
ALWAYS follow the given response instructions. Do not return any content that does not follow the response instructions. Do not add anything before or after the expected JSON
Response Instructions: Always respond with a valid JSON object that strictly follows the below schema:
{
"steps": array[string],
"result": string
}
Use "steps" to return an array of strings where each string is a step to complete the objective, leave it empty if you know the final result. Please wrap each step in quotes and escape any special characters within the string.
Use "result" return the final response when you have enough information, leave it empty if you want to execute more steps
Here are examples of valid responses:
Example 1 - When you need to execute steps:
{
"steps": ["Search for logs containing error messages in the last hour", "Analyze the frequency of each error type", "Check system metrics during error spikes"],
"result": ""
}
Example 2 - When you have the final result:
{
"steps": [],
"result": "Based on the analysis, the root cause of the system slowdown was a memory leak in the authentication service, which started at 14:30 UTC."
}
IMPORTANT RULES:
1. DO NOT use commas within individual steps
2. DO NOT add any content before or after the JSON
3. ONLY respond with a pure JSON object
4. DO NOT USE ANY TOOLS. TOOLS ARE PROVIDED ONLY FOR YOU TO MAKE A PLAN.
Re-evaluation Prompt
For the given objective, come up with a simple step by step plan. This plan should involve individual tasks, that if executed correctly will yield the correct answer. Do not add any superfluous steps. The result of the final step should be the final answer. Make sure that each step has all the information needed - do not skip steps. At all costs, do not execute the steps. You will be told when to execute the steps.
Objective: <PROMPT>
Original plan:
[.., ...]
You have currently executed the following steps:
[Step 1, Result of Step 1]
Update your plan accordingly. If no more steps are needed and you can return to the user, then respond with that. Otherwise, fill out the plan. Only add steps to the plan that still NEED to be done. Do not return previously done steps as part of the plan. Please follow the below response format.
ALWAYS follow the given response instructions. Do not return any content that does not follow the response instructions. Do not add anything before or after the expected JSON
Always respond with a valid JSON object that strictly follows the below schema:
{
"steps": array[string],
"result": string
}
Use "steps" to return an array of strings where each string is a step to complete the objective, leave it empty if you know the final result. Please wrap each step in quotes and escape any special characters within the string.
Use "result" return the final response when you have enough information, leave it empty if you want to execute more steps
Here are examples of valid responses:
Example 1 - When you need to execute steps:
{
"steps": ["Search for logs containing error messages in the last hour", "Analyze the frequency of each error type", "Check system metrics during error spikes"],
"result": ""
}
Example 2 - When you have the final result:
{
"steps": [],
"result": "Based on the analysis, the root cause of the system slowdown was a memory leak in the authentication service, which started at 14:30 UTC."
}
IMPORTANT RULES:
1. DO NOT use commas within individual steps
2. DO NOT add any content before or after the JSON
3. ONLY respond with a pure JSON object
4. DO NOT USE ANY TOOLS. TOOLS ARE PROVIDED ONLY FOR YOU TO MAKE A PLAN.
Example Workflows
Re-evaluate After Each Step
In the above diagram the workflow is as follows:
- User provides the deep-research agent with a task.
- Deep-research agent forwards the task to the planner LLM
- LLM returns with a plan (a series of steps to execute)
- Deep-research agent forwards the first step of the plan to the reAct agent
- reAgent executes the steps and returns the response
- Deep-research agent forwards the executed steps and the original plan to the planner LLM
- Planner-LLM either returns the final result or a refined plan.
- If the Planner-LLM returns the final result, it is returned back to the user. Else, if the planner-LLM returns a new plan, we execute the steps (4-7) until the LLM has enough information to return back to the user.
Re-evaluate After All Steps
In the above diagram the workflow is as follows:
- User provides the deep-research agent with a task.
- Deep-research agent forwards the task to the planner LLM
- LLM returns with a plan (a series of steps to execute)
- Deep-research agent forwards all steps of the plan to the reAct agent
- reAgent executes all the steps and returns the response
** Loop 4-5 runs until all steps are executed - Deep-research agent forwards the executed steps and the original plan to the planner LLM
- Planner-LLM either returns the final result or a refined plan.
- If the Planner-LLM returns the final result, it is returned back to the user. Else, if the planner-LLM returns a new plan, we execute the steps (4-7) until the LLM has enough information to return back to the user.
Enhancements & Future Work
- Parallel Step Execution via function-calling optimizations.
- Checkpointing for fault tolerance.
Please share feedback, alternative designs, and suggested refinements.
Metadata
Metadata
Assignees
Labels
Type
Projects
Status