OpenAI released a new open-source agents framework called Agents SDK. This well-built library is positioned as the production-grade upgrade of OpenAI's experimental Swarm multi-agent framework. In reality, Agents SDK is most comparable to frameworks like Pydantic AI, LangChain, or Llama-Index, providing a structured way to build AI agent applications.
Through this article, we'll learn everything we need to start building with Agents SDK by working through practical examples and exploring its key capabilities.
What is Agents SDK?
OpenAI's Agents SDK provides a structured approach to building agent applications with these key features:
- Agent loop: Built-in loop handling tool calls, sending results to the LLM, and continuing execution until completion
- Python-first: Uses native Python features rather than introducing new abstraction layers
- Handoffs: Coordination and delegation capabilities between multiple agents
- Guardrails: Input/output validation with early termination for failed checks
- Function tools: Python functions as tools with automatic schema generation and Pydantic validation
- Tracing: Built-in visualization, debugging and monitoring of agent workflows
While the library is technically open-source, it's designed to work primarily with OpenAI's models. However, support for other LLMs via LiteLLM has been added.
Let's explore these features through practical implementation.
Installation and Setup
First, install the library:
pip install -qU openai-agents==0.0.3
Then set your OpenAI API key:
import os
from getpass import getpass
os.environ["OPENAI_API_KEY"] = os.getenv("OPENAI_API_KEY") or \
getpass("Enter your OpenAI API key: ")
To get an API key, you'll need to visit platform.openai.com, create an account if you don't have one, and generate a new secret key from the API keys section.
Creating Your First Agent
Creating a basic agent requires minimal code:
from agents import Agent
agent = Agent(
name="Assistant",
instructions="You're a helpful assistant",
model="gpt-4o-mini",
)
This simple initialization creates an agent with a name, basic instructions (ie a developer message),
and specifies which model to use. We're using gpt-4o-mini
here for a good balance of performance and cost.
Running Your Agent
The SDK provides three methods for executing agents:
Runner.run()
- Asynchronous executionRunner.run_sync()
- Synchronous executionRunner.run_streamed()
- Asynchronous execution with streamed responses
In production applications, you'll most likely want to use asynchronous methods (run()
or
run_streamed()
) as they provide better scalability and efficiency. Synchronous execution should
generally be avoided except for simple scripts or testing.
Basic Execution
Using the asynchronous approach:
from agents import Runner
result = await Runner.run(
starting_agent=agent,
input="tell me a short story"
)
result.final_output
This method runs the agent and waits for the complete response before returning it. It's straightforward but offers a less engaging user experience as users won't see any progress until the entire response is generated.
Streaming Responses
Streaming is particularly useful for user-facing applications as it provides immediate feedback:
response = Runner.run_streamed(
starting_agent=agent,
input="hello there"
)
async for event in response.stream_events():
print(event)
When you run this code, you'll see a lot of information being returned - different event types for agent updates, token generation, tool calls, and more. This raw output can be overwhelming, so you'll want to filter for specific events:
from openai.types.responses import ResponseTextDeltaEvent
response = Runner.run_streamed(
starting_agent=agent,
input="tell me a short story"
)
async for event in response.stream_events():
if event.type == "raw_response_event" and \
isinstance(event.data, ResponseTextDeltaEvent):
print(event.data.delta, end="", flush=True)
This filtered approach shows only the generated text tokens, creating a smoother user experience by displaying the response as it's being generated.
Function Tools
One of the SDK's key features is the ability to convert Python functions into tools the agent can use. OpenAI has cycled through various names for this capability - from "function calling" to "tool calling" and now "function tools" in this SDK.
Here's how to implement a simple multiplication tool:
from agents import function_tool
@function_tool
def multiply(x: float, y: float) -> float:
"""Multiplies `x` and `y` to provide a precise
answer."""
return x*y
When defining function tools, ensure you include:
- Clear function name
- Descriptive parameter names
- Type annotations for inputs and outputs
- Explanatory docstring (becomes the tool description)
These elements help the agent understand when and how to use the tool. The docstring is particularly important as it's what the agent will "read" to understand the tool's purpose.
Pass your tools to the agent during initialization:
agent = Agent(
name="Assistant",
instructions=(
"You're a helpful assistant, remember to always "
"use the provided tools whenever possible. Do not "
"rely on your own knowledge too much and instead "
"use your tools to help you answer queries."
),
model="gpt-4o-mini",
tools=[multiply] # list of available tools
)
Notice how we've extended the instructions to encourage the agent to use the provided tools rather than relying on its internal knowledge for tasks like calculations.
When executing, the agent will now have access to this function:
response = Runner.run_streamed(
starting_agent=agent,
input="what is 7.814 multiplied by 103.892?"
)
For real-world applications, you'll want to create a more sophisticated event handler for streaming that can display both tool usage and final responses. Here's a comprehensive example:
from openai.types.responses import (
ResponseFunctionCallArgumentsDeltaEvent, # tool call streaming
ResponseTextDeltaEvent, # text response streaming
ResponseCreatedEvent, # start of new event
)
response = Runner.run_streamed(
starting_agent=agent,
input="what is 7.814 multiplied by 103.892?"
)
async for event in response.stream_events():
if event.type == "raw_response_event":
if isinstance(event.data, ResponseFunctionCallArgumentsDeltaEvent):
# streamed parameters for our tool call
print(event.data.delta, end="", flush=True)
elif isinstance(event.data, ResponseTextDeltaEvent):
# streamed final answer tokens
print(event.data.delta, end="", flush=True)
elif event.type == "agent_updated_stream_event":
# current agent in use
print(f"> Current Agent: {event.new_agent.name}")
elif event.type == "run_item_stream_event":
# events for user-facing stream
if event.name == "tool_called":
# full tool call after all tokens streamed
print()
print(f"> Tool Called, name: {event.item.raw_item.name}")
print(f"> Tool Called, args: {event.item.raw_item.arguments}")
elif event.name == "tool_output":
# response from tool execution
print(f"> Tool Output: {event.item.raw_item['output']}")
This handler provides a cleaner output showing:
- Which agent is being used (important for multi-agent workflows)
- Tool call parameters as they're being generated
- When a tool is called with its name and arguments
- The output from the tool execution
- The final response text as it's generated
With gpt-4o-mini
, these operations happen very quickly, but with more complex tools or slower
models, streaming provides valuable feedback to users.
Guardrails
Guardrails are essential safety mechanisms for agent interactions, especially in production environments. The SDK supports both input and output guardrails to validate messages before/after LLM processing.
First, define a structure for guardrail outputs using Pydantic:
from pydantic import BaseModel
# Define structure for guardrail agent outputs
class GuardrailOutput(BaseModel):
is_triggered: bool
reasoning: str
# Create an agent that checks for political opinions
politics_agent = Agent(
name="Politics check",
instructions="Check if the user is asking you about political opinions",
output_type=GuardrailOutput,
)
The output_type
parameter forces the agent to provide structured output matching our defined
schema, making it easier to process programmatically.
You can test this agent directly:
query = "what do you think about the labour party in the UK?"
result = await Runner.run(starting_agent=politics_agent, input=query)
result.final_output
When you run this, you'll get a nicely structured response with is_triggered=True
and reasoning
explaining why the guardrail was triggered.
To integrate this as a guardrail for your main agent, create a function with the @input_guardrail
decorator:
from agents import (
GuardrailFunctionOutput,
RunContextWrapper,
input_guardrail
)
@input_guardrail
async def politics_guardrail(
ctx: RunContextWrapper[None],
agent: Agent,
input: str,
) -> GuardrailFunctionOutput:
# Run agent to check if guardrail is triggered
response = await Runner.run(starting_agent=politics_agent, input=input)
# Format response into GuardrailFunctionOutput
return GuardrailFunctionOutput(
output_info=response.final_output,
tripwire_triggered=response.final_output.is_triggered,
)
This function must follow the exact signature pattern shown above, even if you don't use all the
parameters. The return type must be GuardrailFunctionOutput
for the SDK to process it correctly.
Apply the guardrail to your agent:
agent = Agent(
name="Assistant",
instructions=(
"You're a helpful assistant, remember to always "
"use the provided tools whenever possible. Do not "
"rely on your own knowledge too much and instead "
"use your tools to help you answer queries."
),
model="gpt-4o-mini",
tools=[multiply],
input_guardrails=[politics_guardrail], # list of input guardrails
)
When a guardrail is triggered, it raises an exception:
# This will execute normally
result = await Runner.run(
starting_agent=agent,
input="what is 7.814 multiplied by 103.892?"
)
# This will trigger the guardrail exception
result = await Runner.run(
starting_agent=agent,
input="what do you think about the labour party in the UK?"
)
# Raises: InputGuardrailTripwireTriggered: Guardrail InputGuardrail triggered tripwire
In production applications, you'll need to handle these exceptions appropriately, perhaps by displaying a user-friendly message explaining why their query was rejected.
Output guardrails follow a similar pattern but use the @output_guardrail
decorator and the
output_guardrails
parameter when creating an agent, allowing you to validate the agent's responses
before they reach the user.
Conversational Agents
Most practical applications involve multi-turn conversations rather than single queries. The SDK makes it easy to maintain context across interactions:
# First message
result = await Runner.run(
starting_agent=agent,
input="remember the number 7.814 for me please"
)
# Convert to input list format
result.to_input_list()
# [{'content': 'remember the number 7.814 for me please', 'role': 'user'},
# {'id': 'msg_67d17d702a708191ae8704bdade87f7803f330fa9bc6d689',
# 'content': [{'annotations': [],
# 'text': "I can't store or remember information for future use. However, you can save it in a note or use a reminder app. If you have any questions or need assistance with something else, feel free to ask!",
# 'type': 'output_text'}],
# 'role': 'assistant',
# 'status': 'completed',
# 'type': 'message'}]
# Pass history + new message
result = await Runner.run(
starting_agent=agent,
input=result.to_input_list() + [
{"role": "user", "content": "multiply the last number by 103.892"}
]
)
# The agent will now have context of the previous conversation
The SDK provides a convenient to_input_list()
method that converts the result into a properly
formatted message history. You can then append new messages and pass the entire history to your
next agent call.
Even though the agent might claim it can't remember information (as shown in the example), the
conversation history is preserved in the input list format, allowing subsequent interactions to
reference previous messages. When asked to "multiply the last number"
, the agent can identify
"7.814"
from the history and perform the calculation.
Conclusion
OpenAI's Agents SDK provides a structured framework for building LLM-powered agents with tools, guardrails, and conversation management. The Python-first approach makes it accessible while still offering significant flexibility for complex agent applications.
While the library is positioned as "open-source," it's currently optimized for OpenAI's models. If you're building production systems with OpenAI, this SDK offers several advantages over building agent systems from scratch, particularly:
- Built-in streaming with comprehensive event types
- Structured guardrails for safety and control
- Simple tool definition and execution
- Conversation context management
- Tracing capabilities for debugging and monitoring
For production use cases, consider the performance implications of synchronous vs. asynchronous execution, and always implement guardrails appropriate to your application's requirements. As with most agent frameworks, there are still limitations in how agents understand complex instructions, but the SDK provides a solid foundation for building practical AI applications.
The library is evolving rapidly and includes additional features beyond what's covered here, such as agent handoffs and tracing capabilities, which make debugging and monitoring significantly easier in production environments.