Building with OpenAI's Agents SDK
AI Engineering
OpenAI released a new "open-source" agents framework called Agents SDK. This well-built library is positioned as the production-grade upgrade of OpenAI's experimental Swarm multi-agent framework. In reality, Agents SDK is most comparable to frameworks like Pydantic AI, LangChain, or Llama-Index, providing a structured way to build AI agent applications.
Through this article, we'll learn everything we need to start building with Agents SDK by working through practical examples and exploring its key capabilities.
What is Agents SDK?
OpenAI's Agents SDK provides a structured approach to building agent applications with these key features:
- Agent loop: Built-in loop handling tool calls, sending results to the LLM, and continuing execution until completion
- Python-first: Uses native Python features rather than introducing new abstraction layers
- Handoffs: Coordination and delegation capabilities between multiple agents
- Guardrails: Input/output validation with early termination for failed checks
- Function tools: Python functions as tools with automatic schema generation and Pydantic validation
- Tracing: Built-in visualization, debugging and monitoring of agent workflows
While the library is technically open-source, it's currently designed to work primarily with OpenAI's models. However, as it is "open-source" the broader community should be able to modify the library to better support other LLMs in the future.
Let's explore these features through practical implementation.
Installation and Setup
First, install the library:
Then set your OpenAI API key:
To get an API key, you'll need to visit platform.openai.com, create an account if you don't have one, and generate a new secret key from the API keys section.
Creating Your First Agent
Creating a basic agent requires minimal code:
This simple initialization creates an agent with a name, basic instructions (essentially a system prompt),
and specifies which model to use. We're using gpt-4o-mini
here for a good balance of performance and cost.
Running Your Agent
The SDK provides three methods for executing agents:
Runner.run()
- Asynchronous executionRunner.run_sync()
- Synchronous executionRunner.run_streamed()
- Asynchronous execution with streamed responses
In production applications, you'll most likely want to use asynchronous methods (run()
or
run_streamed()
) as they provide better scalability and efficiency. Synchronous execution should
generally be avoided except for simple scripts or testing.
Basic Execution
Using the asynchronous approach:
This method runs the agent and waits for the complete response before returning it. It's straightforward but offers a less engaging user experience as users won't see any progress until the entire response is generated.
Streaming Responses
Streaming is particularly useful for user-facing applications as it provides immediate feedback:
When you run this code, you'll see a lot of information being returned - different event types for agent updates, token generation, tool calls, and more. This raw output can be overwhelming, so you'll want to filter for specific events:
This filtered approach shows only the generated text tokens, creating a smoother user experience by displaying the response as it's being generated.
Function Tools
One of the SDK's key features is the ability to convert Python functions into tools the agent can use. OpenAI has cycled through various names for this capability - from "function calling" to "tool calling" and now "function tools" in this SDK.
Here's how to implement a simple multiplication tool:
When defining function tools, ensure you include:
- Clear function name
- Descriptive parameter names
- Type annotations for inputs and outputs
- Explanatory docstring (becomes the tool description)
These elements help the agent understand when and how to use the tool. The docstring is particularly important as it's what the agent will "read" to understand the tool's purpose.
Pass your tools to the agent during initialization:
Notice how we've extended the instructions to encourage the agent to use the provided tools rather than relying on its internal knowledge for tasks like calculations.
When executing, the agent will now have access to this function:
For real-world applications, you'll want to create a more sophisticated event handler for streaming that can display both tool usage and final responses. Here's a comprehensive example:
This handler provides a cleaner output showing:
- Which agent is being used (important for multi-agent workflows)
- Tool call parameters as they're being generated
- When a tool is called with its name and arguments
- The output from the tool execution
- The final response text as it's generated
With gpt-4o-mini
, these operations happen very quickly, but with more complex tools or slower
models, streaming provides valuable feedback to users.
Guardrails
Guardrails are essential safety mechanisms for agent interactions, especially in production environments. The SDK supports both input and output guardrails to validate messages before/after LLM processing.
First, define a structure for guardrail outputs using Pydantic:
The output_type
parameter forces the agent to provide structured output matching our defined
schema, making it easier to process programmatically.
You can test this agent directly:
When you run this, you'll get a nicely structured response with is_triggered=True
and reasoning
explaining why the guardrail was triggered.
To integrate this as a guardrail for your main agent, create a function with the @input_guardrail
decorator:
This function must follow the exact signature pattern shown above, even if you don't use all the
parameters. The return type must be GuardrailFunctionOutput
for the SDK to process it correctly.
Apply the guardrail to your agent:
When a guardrail is triggered, it raises an exception:
In production applications, you'll need to handle these exceptions appropriately, perhaps by displaying a user-friendly message explaining why their query was rejected.
Output guardrails follow a similar pattern but use the @output_guardrail
decorator and the
output_guardrails
parameter when creating an agent, allowing you to validate the agent's responses
before they reach the user.
Conversational Agents
Most practical applications involve multi-turn conversations rather than single queries. The SDK makes it easy to maintain context across interactions:
The SDK provides a convenient to_input_list()
method that converts the result into a properly
formatted message history. You can then append new messages and pass the entire history to your
next agent call.
Even though the agent might claim it can't remember information (as shown in the example), the
conversation history is preserved in the input list format, allowing subsequent interactions to
reference previous messages. When asked to "multiply the last number"
, the agent can identify
"7.814"
from the history and perform the calculation.
Conclusion
OpenAI's Agents SDK provides a structured framework for building LLM-powered agents with tools, guardrails, and conversation management. The Python-first approach makes it accessible while still offering significant flexibility for complex agent applications.
While the library is positioned as "open-source," it's currently optimized for OpenAI's models. If you're building production systems with OpenAI, this SDK offers several advantages over building agent systems from scratch, particularly:
- Built-in streaming with comprehensive event types
- Structured guardrails for safety and control
- Simple tool definition and execution
- Conversation context management
- Tracing capabilities for debugging and monitoring
For production use cases, consider the performance implications of synchronous vs. asynchronous execution, and always implement guardrails appropriate to your application's requirements. As with most agent frameworks, there are still limitations in how agents understand complex instructions, but the SDK provides a solid foundation for building practical AI applications.
The library is evolving rapidly and includes additional features beyond what's covered here, such as agent handoffs and tracing capabilities, which make debugging and monitoring significantly easier in production environments.