Aurelio logo
Updated on January 19, 2025

LangChain Agent Executor Deep Dive

AI Engineering

In this chapter, we will continue from the introduction to agents and dive deeper into agents. Learning how to build our custom agent execution loop for v0.3 of LangChain.

What is the Agent Executor?

When we talk about agents, a significant part of an "agent" is simple code logic, iteratively rerunning LLM calls and processing their output. The exact logic varies significantly, but one well-known example is the ReAct agent.

ReAct process

Reason + Action (ReAct) agents use iterative reasoning and action steps to incorporate chain-of-thought and tool-use into their execution. During the reasoning step, the LLM generates the steps to take to answer the query. Next, the LLM generates the action input, which our code logic parses into a tool call.

Agentic graph of ReAct

Following our action step, we get an observation from the tool call. Then, we feed the observation back into the agent executor logic for a final answer or further reasoning and action steps.

The agent and agent executor we will be building will follow this pattern.

Creating an Agent

We will construct the agent using LangChain Epression Language (LCEL). We cover LCEL more in the LCEL chapter, but as before, all we need to know now is that we construct our agent using syntax and components like so:

text
agent = (
<input parameters, including chat history and user query>
| <prompt>
| <LLM with tools>
)

We need this agent to remember previous interactions within the conversation. To do that, we will use the ChatPromptTemplate with a system message, a placeholder for our chat history, a placeholder for the user query, and a placeholder for the agent scratchpad.

The agent scratchpad is where the agent writes its notes as it works through multiple internal thought and tool-use steps to produce a final output for the user.

python
from langchain_core.prompts import ChatPromptTemplate, MessagesPlaceholder

prompt = ChatPromptTemplate.from_messages([
("system", (
"You're a helpful assistant. When answering a user's question "
"you should first use one of the tools provided. After using a "
"tool the tool output will be provided in the "
"'scratchpad' below. If you have an answer in the "
"scratchpad you should not use any more tools and "
"instead answer directly to the user."
)),
MessagesPlaceholder(variable_name="chat_history"),
("human", "{input}"),
("ai", "Scratchpad: {agent_scratchpad}"),
])

Next, we must define our LLM. We will use the gpt-4o-mini model with a temperature of 0.0.

python
import os
from getpass import getpass
from langchain_openai import ChatOpenAI

os.environ["OPENAI_API_KEY"] = os.getenv("OPENAI_API_KEY") \
or getpass("Enter your OpenAI API key: ")

llm = ChatOpenAI(
model_name="gpt-4o-mini",
temperature=0.0,
)

To add tools to our LLM, we will use the bind_tools method within the LCEL constructor, which will take and add our tools to the LLM. We'll also include the tool_choice="any" argument to bind_tools, which tells the LLM that it MUST use a tool, ie it cannot provide a final answer directly (therefore not using a tool):

python
from langchain_core.runnables.base import RunnableSerializable

tools = [add, subtract, multiply, exponentiate]

# define the agent runnable
agent: RunnableSerializable = (
{
"input": lambda x: x["input"],
"chat_history": lambda x: x["chat_history"],
"agent_scratchpad": lambda x: x.get("agent_scratchpad", "")
}
| prompt
| llm.bind_tools(tools, tool_choice="any")
)

We invoke the agent with the invoke method, passing in the input and chat history.

python
out = agent.invoke({"input": "What is 10 + 10", "chat_history": []})
out
python
AIMessage(
content='',
additional_kwargs={
'tool_calls': [
{
'function': {
'arguments': '{"x":10,"y":10}',
'name': 'add'
},
'id': 'call_bI8aZpMN1y907LncsX9rhY6y',
'type': 'function'
}
],
'refusal': None
},
response_metadata={
'token_usage': {
'completion_tokens': 18,
'prompt_tokens': 205,
'total_tokens': 223,
'completion_tokens_details': {
'accepted_prediction_tokens': 0,
'audio_tokens': 0,
'reasoning_tokens': 0,
'rejected_prediction_tokens': 0
},
'prompt_tokens_details': {'audio_tokens': 0, 'cached_tokens': 0}
}
}
)

Because we set tool_choice="any" to force the tool output, the usual content field will be empty as LangChain reserves that field for natural language output, i.e. the final answer of the LLM. To find our tool output, we need to look at the tool_calls field:

python
out.tool_calls
python
[{'name': 'add',
'args': {'x': 10, 'y': 10},
'id': 'call_bI8aZpMN1y907LncsX9rhY6y',
'type': 'tool_call'}]

From here, we have the tool name that our LLM wants to use and the args that it wants to pass to that tool. We can see that the tool add is being used with the arguments x=10 and y=10. The agent.invoke method has not executed the tool function; we need to write that part of the agent code ourselves.

Executing the tool code requires two steps:

  1. Map the tool name to the tool function.

  2. Execute the tool function with the generated args.

python
# create tool name to function mapping
name2tool = {tool.name: tool.func for tool in tools}

Now execute to get our answer:

python
tool_output = name2tool[out.tool_calls[0]["name"]](**out.tool_calls[0]["args"])
tool_output
text
20

That is our answer and tool execution logic. We feed this back into our LLM via the agent_scratchpad placeholder.

python
out = agent.invoke({
"input": "What is 10 + 10",
"chat_history": [],
"agent_scratchpad": (
f"The {out.tool_calls[0]['name']} tool returned {tool_output}"
)
})
out
python
AIMessage(
content='',
additional_kwargs={
'tool_calls': [
{
'function': {
'arguments': '{"x":10,"y":10}',
'name': 'add'
},
'id': 'call_vIKn0eWVupXsSpJBT1budTHr',
'type': 'function'
}
],
'refusal': None
},
response_metadata={
'token_usage': {
'completion_tokens': 18,
'prompt_tokens': 210,
'total_tokens': 228,
'completion_tokens_details': {
'accepted_prediction_tokens': 0, 'audio_tokens': 0,
'reasoning_tokens': 0, 'rejected_prediction_tokens': 0
},
'prompt_tokens_details': {'audio_tokens': 0, 'cached_tokens': 0}
}
}
)

Despite having the answer in our agent_scratchpad, the LLM still tries to use the tool again. This behaviour happens because we bonded the tools to the LLM with tool_choice="any". When we set tool_choice to "any" or "required", we tell the LLM that it MUST use a tool, i.e., it cannot provide a final answer.

There's two options to fix this:

  1. Set tool_choice="auto" to tell the LLM that it can choose to use a tool or provide a final answer.

  2. Create a final_answer tool - we'll explain this shortly.

First, let's try option 1:

python
agent: RunnableSerializable = (
{
"input": lambda x: x["input"],
"chat_history": lambda x: x["chat_history"],
"agent_scratchpad": lambda x: x.get("agent_scratchpad", "")
}
| prompt
| llm.bind_tools(tools, tool_choice="auto")
)

We'll start from the start again, so agent_scratchpad is empty:

python
out = agent.invoke({"input": "What is 10 + 10", "chat_history": []})
out
python
AIMessage(
content='',
additional_kwargs={
'tool_calls': [
{
'function': {
'arguments': '{"x":10,"y":10}',
'name': 'add'
},
'id': 'call_YOCTOCe2iHyIJhcfaiDVafpA',
'type': 'function'
}
],
'refusal': None
},
response_metadata={
'token_usage': {
'completion_tokens': 18,
'prompt_tokens': 205,
'total_tokens': 223,
'completion_tokens_details': {
'accepted_prediction_tokens': 0, 'audio_tokens': 0,
'reasoning_tokens': 0, 'rejected_prediction_tokens': 0
},
'prompt_tokens_details': {'audio_tokens': 0, 'cached_tokens': 0}
}
}
)

Now we execute the tool and pass it's output into the agent_scratchpad placeholder:

python
tool_output = name2tool[out.tool_calls[0]["name"]](**out.tool_calls[0]["args"])
out = agent.invoke({
"input": "What is 10 + 10",
"chat_history": [],
"agent_scratchpad": (
f"The {out.tool_calls[0]['name']} tool returned {tool_output}"
)
})
out
python
AIMessage(
content='10 + 10 equals 20.',
additional_kwargs={'refusal': None},
response_metadata={
'token_usage': {
'completion_tokens': 10,
'prompt_tokens': 210,
'total_tokens': 220,
'completion_tokens_details': {
'accepted_prediction_tokens': 0, 'audio_tokens': 0,
'reasoning_tokens': 0, 'rejected_prediction_tokens': 0
},
'prompt_tokens_details': {'audio_tokens': 0, 'cached_tokens': 0}
}
}
)

We now have the final answer in the content field! This method is perfectly functional; however, we recommend option 2 as it provides more control over the agent's output.

There are several reasons that option 2 can provide more control, those are:

  • It removes the possibility of an agent using the direct content field when it is not appropriate; for example, some LLMs (particularly smaller ones) may try to use the content field when using a tool.

  • We can enforce a specific structured output in our answers. Structured outputs are handy when we require particular fields for downstream code or multi-part answers. For example, a RAG agent may return a natural language answer and a list of sources used to generate that answer.

To implement option 2, we must create a final_answer tool. We will add a tools_used field to give our output some structure—in a real-world use case, we probably wouldn't want to generate this field, but it's useful for our example here.

python
@tool
def final_answer(answer: str, tools_used: list[str]) -> str:
"""Use this tool to provide a final answer to the user.
The answer should be in natural language as this will be provided
to the user directly. The tools_used must include a list of tool
names that were used within the `scratchpad`.
"""
return None

Our final_answer tool doesn't necessarily need to do anything; in this example, we're using it purely to structure our final response. We can now add this tool to our agent:

python
tools = [final_answer, add, subtract, multiply, exponentiate]

agent: RunnableSerializable = (
{
"input": lambda x: x["input"],
"chat_history": lambda x: x["chat_history"],
"agent_scratchpad": lambda x: x.get("agent_scratchpad", "")
}
| prompt
| llm.bind_tools(tools, tool_choice="any") # we're forcing tool use again
)

Now we invoke:

python
out = agent.invoke({"input": "What is 10 + 10", "chat_history": []})
out.tool_calls
python
[{'name': 'add',
'args': {'x': 10, 'y': 10},
'id': 'call_fhhm33BCyJdxlyguAuP9STEK',
'type': 'tool_call'}]

We execute the tool and provide it's output to the agent again:

python
tool_out = name2tool[out.tool_calls[0]["name"]](**out.tool_calls[0]["args"])
out = agent.invoke({
"input": "What is 10 + 10",
"chat_history": [],
"agent_scratchpad": (
f"The {out.tool_calls[0]['name']} tool returned {tool_out}"
)
})
out
python
AIMessage(
content='',
additional_kwargs={
'tool_calls': [
{
'function': {
'name': 'final_answer',
'arguments': '{"answer":"10 + 10 equals 20.","tools_used":["functions.add"]}'
},
'id': 'call_reBCXwxUOIePCItSSEuTKGCn',
'type': 'function'
}
],
'refusal': None
},
response_metadata={
'token_usage': {
'completion_tokens': 28,
'prompt_tokens': 282,
'total_tokens': 310,
'completion_tokens_details': {
'accepted_prediction_tokens': 0,
'audio_tokens': 0,
'reasoning_tokens': 0,
'rejected_prediction_tokens': 0
},
'prompt_tokens_details': {'audio_tokens': 0, 'cached_tokens': 0}
}
}
)

We see that content remains empty because we force tool use. But we now have the final_answer tool, which the agent executor passes via the tool_calls field:

python
out.tool_calls
python
[
{
'name': 'final_answer',
'args': {
'answer': '10 + 10 equals 20.',
'tools_used': ['functions.add']
},
'id': 'call_reBCXwxUOIePCItSSEuTKGCn',
'type': 'tool_call'
}
]

Because we see the final_answer tool here, we don't pass this back into our agent, and instead, this tells us to stop execution and pass the args output onto our downstream process or user directly:

python
out.tool_calls[0]["args"]
python
{'answer': '10 + 10 equals 20.', 'tools_used': ['functions.add']}

Building a Custom Agent Execution Loop

We've worked through each step of our agent code, but it doesn't run without us running every step. We must write a class to handle all the logic we just worked through.

python
from langchain_core.messages import BaseMessage, HumanMessage, AIMessage


class CustomAgentExecutor:
chat_history: list[BaseMessage]

def __init__(self, max_iterations: int = 3):
self.chat_history = []
self.max_iterations = max_iterations
self.agent: RunnableSerializable = (
{
"input": lambda x: x["input"],
"chat_history": lambda x: x["chat_history"],
"agent_scratchpad": lambda x: x.get("agent_scratchpad", "")
}
| prompt
| llm.bind_tools(tools, tool_choice="any") # we're forcing tool use again
)

def invoke(self, input: str) -> dict:
# invoke the agent but we do this iteratively in a loop until
# reaching a final answer
count = 0
agent_scratchpad = ""
while count < self.max_iterations:
# invoke a step for the agent to generate a tool call
out = self.agent.invoke({
"input": input,
"chat_history": self.chat_history,
"agent_scratchpad": agent_scratchpad
})
# if the tool call is the final answer tool, we stop
if out.tool_calls[0]["name"] == "final_answer":
break
# otherwise we execute the tool and add its output to the agent scratchpad
tool_out = name2tool[out.tool_calls[0]["name"]](**out.tool_calls[0]["args"])
# add the tool output to the agent scratchpad
action_str = f"The {out.tool_calls[0]['name']} tool returned {tool_out}"
agent_scratchpad += "\n" + action_str
# add a print so we can see intermediate steps
print(f"{count}: {action_str}")
count += 1
# add the final output to the chat history
final_answer = out.tool_calls[0]["args"]
# this is a dictionary, so we convert it to a string for compatibility with
# the chat history
final_answer_str = json.dumps(final_answer)
self.chat_history.append({"input": input, "output": final_answer_str})
self.chat_history.extend([
HumanMessage(content=input),
AIMessage(content=final_answer_str)
])
# return the final answer in dict form
return final_answer

Now initialize the agent executor:

python
agent_executor = CustomAgentExecutor()

And test the invoke method:

python
agent_executor.invoke(input="What is 10 + 10")
text
0: The add tool returned 20

{'answer': '10 + 10 equals 20.', 'tools_used': ['functions.add']}

We then get our answer and the tools that were used — all through our custom agent executor.