Conversational Memory in LangChain
AI EngineeringConversational memory allows our chatbots and agents to remember previous interactions within a conversation. Without conversational memory, our chatbots would only ever be able to respond to the last message they received, essentially forgetting all previous messages with each new message.
Naturally, conversations require our chatbots to be able to respond over multiple interactions and refer to previous messages to understand the context of the conversation.
⚠️ We will be using OpenAI for this example allowing us to run everything via API. If you would like to use Ollama instead, please see the Ollama version of this example.
LangChain's Memory Types
LangChain versions 0.0.x
consisted of various conversational memory types. Most of these are due for deprecation but still hold value in understanding the different approaches that we can take to building conversational memory.
Throughout the notebook we will be referring to these older memory types and then rewriting them using the recommended RunnableWithMessageHistory
class. We will learn about:
ConversationBufferMemory
: the simplest and most intuitive form of conversational memory, keeping track of a conversation without any additional bells and whistles.ConversationBufferWindowMemory
: similar toConversationBufferMemory
, but only keeps track of the lastk
messages.ConversationSummaryMemory
: rather than keeping track of the entire conversation, this memory type keeps track of a summary of the conversation.ConversationSummaryBufferMemory
: merges theConversationSummaryMemory
andConversationTokenBufferMemory
types.
We'll work through each of these memory types in turn, and rewrite each one using the RunnableWithMessageHistory
class.
Initialize our LLM
Before jumping into our memory types, let's initialize our LLM. We will use OpenAI's gpt-4o-mini
model, if you need an API key you can get one from OpenAI's website.
1. ConversationBufferMemory
ConversationBufferMemory
is the simplest form of conversational memory, it is literally just a place that we store messages, and then use to feed messages into our LLM.
Let's start with LangChain's original ConversationBufferMemory
object, we are setting return_messages=True
to return the messages as a list of ChatMessage
objects — unless using a non-chat model we would always set this to True
as without it the messages are passed as a direct string which can lead to unexpected behavior from chat LLMs.
There are several ways that we can add messages to our memory, using the save_context
method we can add a user query (via the input
key) and the AI's response (via the output
key). So, to create the following conversation:
We do:
Before using the memory, we need to load in any variables for that memory type — in this case, there are none, so we just pass an empty dictionary:
With that, we've created our buffer memory. Before feeding it into our LLM let's quickly view the alternative method for adding messages to our memory. With this other method, we pass individual user and AI messages via the add_user_message
and add_ai_message
methods. To reproduce what we did above, we do:
The outcome is exactly the same in either case. To pass this onto our LLM, we need to create a ConversationChain
object — which is already deprecated in favor of the RunnableWithMessageHistory
class, which we will cover in a moment.
ConversationBufferMemory
with RunnableWithMessageHistory
As mentioned, the ConversationBufferMemory
type is due for deprecation. Instead, we can use the RunnableWithMessageHistory
class to implement the same functionality.
When implementing RunnableWithMessageHistory
we will use LangChain Expression Language (LCEL) and for this we need to define our prompt template and LLM components. Our llm
has already been defined, so now we just define a ChatPromptTemplate
object.
We can link our prompt_template
and our llm
together to create a pipeline via LCEL.
Our RunnableWithMessageHistory
requires our pipeline
to be wrapped in a RunnableWithMessageHistory
object. This object requires a few input parameters. One of those is get_session_history
, which requires a function that returns a ChatMessageHistory
object based on a session ID. We define this function ourselves:
We also need to tell our runnable which variable name to use for the chat history (ie history
) and which to use for the user's query (ie query
).
Now we invoke our runnable:
Our chat history will now be memorized and retrieved whenever we invoke our runnable with the same session ID.
We have now recreated the ConversationBufferMemory
type using the RunnableWithMessageHistory
class. Let's continue onto other memory types and see how these can be implemented.
2. ConversationBufferWindowMemory
The ConversationBufferWindowMemory
type is similar to ConversationBufferMemory
, but only keeps track of the last k
messages. There are a few reasons why we would want to keep only the last k
messages:
-
More messages mean more tokens are sent with each request, more tokens increases latency and cost.
-
LLMs tend to perform worse when given more tokens, making them more likely to deviate from instructions, hallucinate, or "forget" information provided to them. Conciseness is key to high performing LLMs.
-
If we keep all messages we will eventually hit the LLM's context window limit, by adding a window size
k
we can ensure we never hit this limit.
The buffer window solves many problems that we encounter with the standard buffer memory, while still being a very simple and intuitive form of conversational memory.
We populate this memory using the same methods as before:
As before, we use the ConversationChain
object (again, this is deprecated and we will rewrite it with RunnableWithMessageHistory
in a moment).
Now let's see if our LLM remembers our name:
The reason our LLM can no longer remember our name is because we have set the k
parameter to 4
, meaning that only the last messages are stored in memory, as we can see above this does not include the first message where we introduced ourselves.
Based on the agent forgetting our name, we might wonder why we would ever use this memory type compared to the standard buffer memory. Well, as with most things in AI, it is always a trade-off. Here we are able to support much longer conversations, use less tokens, and improve latency — but these come at the cost of forgetting non-recent messages.
ConversationBufferWindowMemory
with RunnableWithMessageHistory
To implement this memory type using the RunnableWithMessageHistory
class, we can use the same approach as before. We define our prompt_template
and llm
as before, and then wrap our pipeline in a RunnableWithMessageHistory
object.
For the window feature, we need to define a custom version of the InMemoryChatMessageHistory
class that removes any messages beyond the last k
messages.
Now we invoke our runnable, this time passing a k
parameter via the config
parameter.
We can also modify the messages that are stored in memory by modifying the records inside the chat_map
dictionary directly.
Now let's see at which k
value our LLM remembers our name — from the above we can already see that with k=4
our name is not mentioned, so when running with k=4
we should expect the LLM to forget our name:
With k=4
our LLM is unable to remember our name, so let's initialize a new session with k=14
.
We'll manually insert the remaining messages as before:
Now let's see if the LLM remembers our name:
That's it! Our LLM is now remembering our name, confirming that we've correctly refactored our buffer window memory with the recommended RunnableWithMessageHistory
class.
3. ConversationSummaryMemory
Next up we have ConversationSummaryMemory
, this memory type keeps track of a summary of the conversation rather than the entire conversation. This is useful for long conversations where we don't need to keep track of the entire conversation, but we do want to keep some thread of the full conversation.
As before, we'll start with the original memory class before reimplementing it with the RunnableWithMessageHistory
class.
Unlike with the previous memory types, we need to provide an llm
to initialize ConversationSummaryMemory
. The reason for this is that we need an LLM to generate the conversation summaries.
Beyond this small tweak, using ConversationSummaryMemory
is the same as with our previous memory types when using the deprecated ConversationChain
object.
Let's test:
We can see how the conversation summary varies with each new message. Let's see if the LLM is able to recall our name:
As this information was stored in the summary the LLM successfully recalled our name. This may not always be the case, by summarizing the conversation we inevitably compress the full amount of information and so we may lose key details occasionally. Nonetheless, this is a great memory type for long conversations while retaining some key information.
ConversationSummaryMemory
with RunnableWithMessageHistory
Let's implement this memory type using the RunnableWithMessageHistory
class. As with the window buffer memory, we need to define a custom implementation of the InMemoryChatMessageHistory
class. We'll call this one ConversationSummaryMessageHistory
.
As before, we need to define our RunnableWithMessageHistory
with the ConfigurableFieldSpec
objects.
Now we invoke our runnable, this time passing a llm
parameter via the config
parameter.
Let's see what summary was generated:
Let's continue the conversation and see if the summary is updated:
The summary has been updated to include the new messages. So far so good! Let's continue by adding a few more messages before returning to the name question.
Let's see the latest summary:
The information about our name has been maintained. Let's see if this is enough for our LLM to correctly recall our name.
Perfect! We've successfully implemented the ConversationSummaryMemory
type using the RunnableWithMessageHistory
class.
4. ConversationSummaryBufferMemory
Our final memory type acts as a combination of ConversationSummaryMemory
and ConversationBufferMemory
. It keeps the buffer for the conversation up until the previous n
tokens, anything beyond that limit is summarized then dropped from the buffer. Producing something like:
As before, we set up the deprecated memory type using the ConversationChain
object.
First invoke with a single message:
Looks good so far, let's continue with a few more messages:
We can see with each new message the initial SystemMessage
is updated with a new summary of the conversation. This initial SystemMessage
is then followed by the most recent AIMessage
and HumanMessage
objects.
ConversationSummaryBufferMemory
with RunnableWithMessageHistory
As with the previous memory types, we will implement this memory type again using the RunnableWithMessageHistory
class. In our implementation we will modify the buffer window to be based on the number of messages rather than number of tokens. This tweak will make our implementation more closely aligned with original buffer window.
We will implement all of this via a new ConversationSummaryBufferMessageHistory
class.
Redefine the get_chat_history
function to use our new ConversationSummaryBufferMessageHistory
class.
Setup our pipeline with the new configurable fields.
Finally, we invoke our runnable:
No old messages to update summary with
With that we've successfully implemented the ConversationSummaryBufferMemory
type using RunnableWithMessageHistory
!