Building with OpenAI's Agents SDK

OpenAI released a new "open-source" agents framework called Agents SDK. This well-built library is positioned as the production-grade upgrade of OpenAI's experimental Swarm multi-agent framework. In reality, Agents SDK is most comparable to frameworks like Pydantic AI, LangChain, or Llama-Index, providing a structured way to build AI agent applications.

Through this article, we'll learn everything we need to start building with Agents SDK by working through practical examples and exploring its key capabilities.

What is Agents SDK?

OpenAI's Agents SDK provides a structured approach to building agent applications with these key features:

Agent loop: Built-in loop handling tool calls, sending results to the LLM, and continuing execution until completion
Python-first: Uses native Python features rather than introducing new abstraction layers
Handoffs: Coordination and delegation capabilities between multiple agents
Guardrails: Input/output validation with early termination for failed checks
Function tools: Python functions as tools with automatic schema generation and Pydantic validation
Tracing: Built-in visualization, debugging and monitoring of agent workflows

While the library is technically open-source, it's currently designed to work primarily with OpenAI's models. However, as it is "open-source" the broader community should be able to modify the library to better support other LLMs in the future.

Let's explore these features through practical implementation.

Installation and Setup

First, install the library:

python

pip install -qU openai-agents==0.0.3

Then set your OpenAI API key:

python

import os
from getpass import getpass

os.environ["OPENAI_API_KEY"] = os.getenv("OPENAI_API_KEY") or \
  getpass("Enter your OpenAI API key: ")

To get an API key, you'll need to visit platform.openai.com, create an account if you don't have one, and generate a new secret key from the API keys section.

Creating Your First Agent

Creating a basic agent requires minimal code:

python

from agents import Agent

agent = Agent(
    name="Assistant",
    instructions="You're a helpful assistant",
    model="gpt-4o-mini",
)

This simple initialization creates an agent with a name, basic instructions (essentially a system prompt), and specifies which model to use. We're using gpt-4o-mini here for a good balance of performance and cost.

Running Your Agent

The SDK provides three methods for executing agents:

Runner.run() - Asynchronous execution
Runner.run_sync() - Synchronous execution
Runner.run_streamed() - Asynchronous execution with streamed responses

In production applications, you'll most likely want to use asynchronous methods (run() or run_streamed()) as they provide better scalability and efficiency. Synchronous execution should generally be avoided except for simple scripts or testing.

Basic Execution

Using the asynchronous approach:

python

from agents import Runner

result = await Runner.run(
    starting_agent=agent,
    input="tell me a short story"
)
result.final_output

This method runs the agent and waits for the complete response before returning it. It's straightforward but offers a less engaging user experience as users won't see any progress until the entire response is generated.

Streaming Responses

Streaming is particularly useful for user-facing applications as it provides immediate feedback:

python

response = Runner.run_streamed(
    starting_agent=agent,
    input="hello there"
)
async for event in response.stream_events():
    print(event)

When you run this code, you'll see a lot of information being returned - different event types for agent updates, token generation, tool calls, and more. This raw output can be overwhelming, so you'll want to filter for specific events:

python

from openai.types.responses import ResponseTextDeltaEvent

response = Runner.run_streamed(
    starting_agent=agent,
    input="tell me a short story"
)

async for event in response.stream_events():
    if event.type == "raw_response_event" and \
        isinstance(event.data, ResponseTextDeltaEvent):
        print(event.data.delta, end="", flush=True)

This filtered approach shows only the generated text tokens, creating a smoother user experience by displaying the response as it's being generated.

Function Tools

One of the SDK's key features is the ability to convert Python functions into tools the agent can use. OpenAI has cycled through various names for this capability - from "function calling" to "tool calling" and now "function tools" in this SDK.

Here's how to implement a simple multiplication tool:

python

from agents import function_tool

@function_tool
def multiply(x: float, y: float) -> float:
    """Multiplies `x` and `y` to provide a precise
    answer."""
    return x*y

When defining function tools, ensure you include:

Clear function name
Descriptive parameter names
Type annotations for inputs and outputs
Explanatory docstring (becomes the tool description)

These elements help the agent understand when and how to use the tool. The docstring is particularly important as it's what the agent will "read" to understand the tool's purpose.

Pass your tools to the agent during initialization:

python

agent = Agent(
    name="Assistant",
    instructions=(
        "You're a helpful assistant, remember to always "
        "use the provided tools whenever possible. Do not "
        "rely on your own knowledge too much and instead "
        "use your tools to help you answer queries."
    ),
    model="gpt-4o-mini",
    tools=[multiply]  # list of available tools
)

Notice how we've extended the instructions to encourage the agent to use the provided tools rather than relying on its internal knowledge for tasks like calculations.

When executing, the agent will now have access to this function:

python

response = Runner.run_streamed(
    starting_agent=agent,
    input="what is 7.814 multiplied by 103.892?"
)

For real-world applications, you'll want to create a more sophisticated event handler for streaming that can display both tool usage and final responses. Here's a comprehensive example:

python

from openai.types.responses import (
    ResponseFunctionCallArgumentsDeltaEvent,  # tool call streaming
    ResponseTextDeltaEvent,  # text response streaming
    ResponseCreatedEvent,  # start of new event
)

response = Runner.run_streamed(
    starting_agent=agent,
    input="what is 7.814 multiplied by 103.892?"
)

async for event in response.stream_events():
    if event.type == "raw_response_event":
        if isinstance(event.data, ResponseFunctionCallArgumentsDeltaEvent):
            # streamed parameters for our tool call
            print(event.data.delta, end="", flush=True)
        elif isinstance(event.data, ResponseTextDeltaEvent):
            # streamed final answer tokens
            print(event.data.delta, end="", flush=True)
    elif event.type == "agent_updated_stream_event":
        # current agent in use
        print(f"> Current Agent: {event.new_agent.name}")
    elif event.type == "run_item_stream_event":
        # events for user-facing stream
        if event.name == "tool_called":
            # full tool call after all tokens streamed
            print()
            print(f"> Tool Called, name: {event.item.raw_item.name}")
            print(f"> Tool Called, args: {event.item.raw_item.arguments}")
        elif event.name == "tool_output":
            # response from tool execution
            print(f"> Tool Output: {event.item.raw_item['output']}")

This handler provides a cleaner output showing:

Which agent is being used (important for multi-agent workflows)
Tool call parameters as they're being generated
When a tool is called with its name and arguments
The output from the tool execution
The final response text as it's generated

With gpt-4o-mini, these operations happen very quickly, but with more complex tools or slower models, streaming provides valuable feedback to users.

Guardrails

Guardrails are essential safety mechanisms for agent interactions, especially in production environments. The SDK supports both input and output guardrails to validate messages before/after LLM processing.

First, define a structure for guardrail outputs using Pydantic:

python

from pydantic import BaseModel

# Define structure for guardrail agent outputs
class GuardrailOutput(BaseModel):
    is_triggered: bool
    reasoning: str

# Create an agent that checks for political opinions
politics_agent = Agent(
    name="Politics check",
    instructions="Check if the user is asking you about political opinions",
    output_type=GuardrailOutput,
)

The output_type parameter forces the agent to provide structured output matching our defined schema, making it easier to process programmatically.

You can test this agent directly:

python

query = "what do you think about the labour party in the UK?"
result = await Runner.run(starting_agent=politics_agent, input=query)
result.final_output

When you run this, you'll get a nicely structured response with is_triggered=True and reasoning explaining why the guardrail was triggered.

To integrate this as a guardrail for your main agent, create a function with the @input_guardrail decorator:

python

from agents import (
    GuardrailFunctionOutput,
    RunContextWrapper,
    input_guardrail
)

@input_guardrail
async def politics_guardrail(
    ctx: RunContextWrapper[None],
    agent: Agent,
    input: str,
) -> GuardrailFunctionOutput:
    # Run agent to check if guardrail is triggered
    response = await Runner.run(starting_agent=politics_agent, input=input)
    # Format response into GuardrailFunctionOutput
    return GuardrailFunctionOutput(
        output_info=response.final_output,
        tripwire_triggered=response.final_output.is_triggered,
    )

This function must follow the exact signature pattern shown above, even if you don't use all the parameters. The return type must be GuardrailFunctionOutput for the SDK to process it correctly.

Apply the guardrail to your agent:

python

agent = Agent(
    name="Assistant",
    instructions=(
        "You're a helpful assistant, remember to always "
        "use the provided tools whenever possible. Do not "
        "rely on your own knowledge too much and instead "
        "use your tools to help you answer queries."
    ),
    model="gpt-4o-mini",
    tools=[multiply],
    input_guardrails=[politics_guardrail],  # list of input guardrails
)

When a guardrail is triggered, it raises an exception:

python

# This will execute normally
result = await Runner.run(
    starting_agent=agent,
    input="what is 7.814 multiplied by 103.892?"
)

# This will trigger the guardrail exception
result = await Runner.run(
    starting_agent=agent,
    input="what do you think about the labour party in the UK?"
)
# Raises: InputGuardrailTripwireTriggered: Guardrail InputGuardrail triggered tripwire

In production applications, you'll need to handle these exceptions appropriately, perhaps by displaying a user-friendly message explaining why their query was rejected.

Output guardrails follow a similar pattern but use the @output_guardrail decorator and the output_guardrails parameter when creating an agent, allowing you to validate the agent's responses before they reach the user.

Conversational Agents

Most practical applications involve multi-turn conversations rather than single queries. The SDK makes it easy to maintain context across interactions:

python

# First message
result = await Runner.run(
    starting_agent=agent,
    input="remember the number 7.814 for me please"
)

# Convert to input list format
result.to_input_list()
# [{'content': 'remember the number 7.814 for me please', 'role': 'user'},
#  {'id': 'msg_67d17d702a708191ae8704bdade87f7803f330fa9bc6d689',
#   'content': [{'annotations': [],
#     'text': "I can't store or remember information for future use. However, you can save it in a note or use a reminder app. If you have any questions or need assistance with something else, feel free to ask!",
#     'type': 'output_text'}],
#   'role': 'assistant',
#   'status': 'completed',
#   'type': 'message'}]

# Pass history + new message
result = await Runner.run(
    starting_agent=agent,
    input=result.to_input_list() + [
        {"role": "user", "content": "multiply the last number by 103.892"}
    ]
)
# The agent will now have context of the previous conversation

The SDK provides a convenient to_input_list() method that converts the result into a properly formatted message history. You can then append new messages and pass the entire history to your next agent call.

Even though the agent might claim it can't remember information (as shown in the example), the conversation history is preserved in the input list format, allowing subsequent interactions to reference previous messages. When asked to "multiply the last number", the agent can identify "7.814" from the history and perform the calculation.

Conclusion

OpenAI's Agents SDK provides a structured framework for building LLM-powered agents with tools, guardrails, and conversation management. The Python-first approach makes it accessible while still offering significant flexibility for complex agent applications.

While the library is positioned as "open-source," it's currently optimized for OpenAI's models. If you're building production systems with OpenAI, this SDK offers several advantages over building agent systems from scratch, particularly:

Built-in streaming with comprehensive event types
Structured guardrails for safety and control
Simple tool definition and execution
Conversation context management
Tracing capabilities for debugging and monitoring

For production use cases, consider the performance implications of synchronous vs. asynchronous execution, and always implement guardrails appropriate to your application's requirements. As with most agent frameworks, there are still limitations in how agents understand complex instructions, but the SDK provides a solid foundation for building practical AI applications.

The library is evolving rapidly and includes additional features beyond what's covered here, such as agent handoffs and tracing capabilities, which make debugging and monitoring significantly easier in production environments.