Conversational Memory in LangChain

Conversational memory allows our chatbots and agents to remember previous interactions within a conversation. Without conversational memory, our chatbots would only ever be able to respond to the last message they received, essentially forgetting all previous messages with each new message.

Naturally, conversations require our chatbots to be able to respond over multiple interactions and refer to previous messages to understand the context of the conversation.

⚠️ We will be using OpenAI for this example allowing us to run everything via API. If you would like to use Ollama instead, please see the Ollama version of this example.

LangChain's Memory Types

LangChain versions 0.0.x consisted of various conversational memory types. Most of these are due for deprecation but still hold value in understanding the different approaches that we can take to building conversational memory.

Throughout the notebook we will be referring to these older memory types and then rewriting them using the recommended RunnableWithMessageHistory class. We will learn about:

ConversationBufferMemory: the simplest and most intuitive form of conversational memory, keeping track of a conversation without any additional bells and whistles.
ConversationBufferWindowMemory: similar to ConversationBufferMemory, but only keeps track of the last k messages.
ConversationSummaryMemory: rather than keeping track of the entire conversation, this memory type keeps track of a summary of the conversation.
ConversationSummaryBufferMemory: merges the ConversationSummaryMemory and ConversationTokenBufferMemory types.

We'll work through each of these memory types in turn, and rewrite each one using the RunnableWithMessageHistory class.

Initialize our LLM

Before jumping into our memory types, let's initialize our LLM. We will use OpenAI's gpt-4o-mini model, if you need an API key you can get one from OpenAI's website.

python

import os
from getpass import getpass
from langchain_openai import ChatOpenAI

os.environ["OPENAI_API_KEY"] = os.environ["OPENAI_API_KEY"] or getpass("Enter your OpenAI API key: ")

# For normal accurate responses
llm = ChatOpenAI(temperature=0.0, model="gpt-4o-mini")

1. `ConversationBufferMemory`

ConversationBufferMemory is the simplest form of conversational memory, it is literally just a place that we store messages, and then use to feed messages into our LLM.

Let's start with LangChain's original ConversationBufferMemory object, we are setting return_messages=True to return the messages as a list of ChatMessage objects — unless using a non-chat model we would always set this to True as without it the messages are passed as a direct string which can lead to unexpected behavior from chat LLMs.

python

from langchain.memory import ConversationBufferMemory

memory = ConversationBufferMemory(return_messages=True)

text

    LangChainDeprecationWarning: Please see the migration guide at: https://python.langchain.com/docs/versions/migrating_memory/
      memory = ConversationBufferMemory(return_messages=True)

There are several ways that we can add messages to our memory, using the save_context method we can add a user query (via the input key) and the AI's response (via the output key). So, to create the following conversation:

text

User: Hi, my name is Josh
AI: Hey Josh, what's up? I'm an AI model called Zeta.
User: I'm researching the different types of conversational memory.
AI: That's interesting, what are some examples?
User: I've been looking at ConversationBufferMemory and ConversationBufferWindowMemory.
AI: That's interesting, what's the difference?
User: Buffer memory just stores the entire conversation, right?
AI: That makes sense, what about ConversationBufferWindowMemory?
User: Buffer window memory stores the last k messages, dropping the rest.
AI: Very cool!

We do:

python

memory.save_context(
    {"input": "Hi, my name is Josh"},  # user message
    {"output": "Hey Josh, what's up? I'm an AI model called Zeta."}  # AI response
)
memory.save_context(
    {"input": "I'm researching the different types of conversational memory."},  # user message
    {"output": "That's interesting, what are some examples?"}  # AI response
)
memory.save_context(
    {"input": "I've been looking at ConversationBufferMemory and ConversationBufferWindowMemory."},  # user message
    {"output": "That's interesting, what's the difference?"}  # AI response
)
memory.save_context(
    {"input": "Buffer memory just stores the entire conversation, right?"},  # user message
    {"output": "That makes sense, what about ConversationBufferWindowMemory?"}  # AI response
)
memory.save_context(
    {"input": "Buffer window memory stores the last k messages, dropping the rest."},  # user message
    {"output": "Very cool!"}  # AI response
)

Before using the memory, we need to load in any variables for that memory type — in this case, there are none, so we just pass an empty dictionary:

python

memory.load_memory_variables({})

python

{'history': [
    HumanMessage(content='Hi, my name is Josh', additional_kwargs={}, response_metadata={}),
    AIMessage(content="Hey Josh, what's up? I'm an AI model called Zeta.", additional_kwargs={}, response_metadata={}),
    HumanMessage(content="I'm researching the different types of conversational memory.", additional_kwargs={}, response_metadata={}),
    AIMessage(content="That's interesting, what are some examples?", additional_kwargs={}, response_metadata={}),
    HumanMessage(content="I've been looking at ConversationBufferMemory and ConversationBufferWindowMemory.", additional_kwargs={}, response_metadata={}),
    AIMessage(content="That's interesting, what's the difference?", additional_kwargs={}, response_metadata={}),
    HumanMessage(content='Buffer memory just stores the entire conversation, right?', additional_kwargs={}, response_metadata={}),
    AIMessage(content='That makes sense, what about ConversationBufferWindowMemory?', additional_kwargs={}, response_metadata={}),
    HumanMessage(content='Buffer window memory stores the last k messages, dropping the rest.', additional_kwargs={}, response_metadata={}),
    AIMessage(content='Very cool!', additional_kwargs={}, response_metadata={})
]}

With that, we've created our buffer memory. Before feeding it into our LLM let's quickly view the alternative method for adding messages to our memory. With this other method, we pass individual user and AI messages via the add_user_message and add_ai_message methods. To reproduce what we did above, we do:

python

memory = ConversationBufferMemory(return_messages=True)

memory.chat_memory.add_user_message("Hi, my name is Josh")
memory.chat_memory.add_ai_message("Hey Josh, what's up? I'm an AI model called Zeta.")
memory.chat_memory.add_user_message("I'm researching the different types of conversational memory.")
memory.chat_memory.add_ai_message("That's interesting, what are some examples?")
memory.chat_memory.add_user_message("I've been looking at ConversationBufferMemory and ConversationBufferWindowMemory.")
memory.chat_memory.add_ai_message("That's interesting, what's the difference?")
memory.chat_memory.add_user_message("Buffer memory just stores the entire conversation, right?")
memory.chat_memory.add_ai_message("That makes sense, what about ConversationBufferWindowMemory?")
memory.chat_memory.add_user_message("Buffer window memory stores the last k messages, dropping the rest.")
memory.chat_memory.add_ai_message("Very cool!")

memory.load_memory_variables({})

python

{'history': [
    HumanMessage(content='Hi, my name is Josh', additional_kwargs={}, response_metadata={}),
    AIMessage(content="Hey Josh, what's up? I'm an AI model called Zeta.", additional_kwargs={}, response_metadata={}),
    HumanMessage(content="I'm researching the different types of conversational memory.", additional_kwargs={}, response_metadata={}),
    AIMessage(content="That's interesting, what are some examples?", additional_kwargs={}, response_metadata={}),
    HumanMessage(content="I've been looking at ConversationBufferMemory and ConversationBufferWindowMemory.", additional_kwargs={}, response_metadata={}),
    AIMessage(content="That's interesting, what's the difference?", additional_kwargs={}, response_metadata={}),
    HumanMessage(content='Buffer memory just stores the entire conversation, right?', additional_kwargs={}, response_metadata={}),
    AIMessage(content='That makes sense, what about ConversationBufferWindowMemory?', additional_kwargs={}, response_metadata={}),
    HumanMessage(content='Buffer window memory stores the last k messages, dropping the rest.', additional_kwargs={}, response_metadata={}),
    AIMessage(content='Very cool!', additional_kwargs={}, response_metadata={})
]}

The outcome is exactly the same in either case. To pass this onto our LLM, we need to create a ConversationChain object — which is already deprecated in favor of the RunnableWithMessageHistory class, which we will cover in a moment.

python

from langchain.chains import ConversationChain

chain = ConversationChain(
    llm=llm, 
    memory=memory,
    verbose=True
)

python

chain.invoke({"input": "what is my name again?"})

python

{
    'input': 'what is my name again?',
    'response': 'Your name is Josh!',  # <-- here is our (correct) response!
    'history': [
        HumanMessage(content='Hi, my name is Josh', additional_kwargs={}, response_metadata={}),
        AIMessage(content="Hey Josh, what's up? I'm an AI model called Zeta.", additional_kwargs={}, response_metadata={}),
        HumanMessage(content="I'm researching the different types of conversational memory.", additional_kwargs={}, response_metadata={}),
        AIMessage(content="That's interesting, what are some examples?", additional_kwargs={}, response_metadata={}),
        HumanMessage(content="I've been looking at ConversationBufferMemory and ConversationBufferWindowMemory.", additional_kwargs={}, response_metadata={}),
        AIMessage(content="That's interesting, what's the difference?", additional_kwargs={}, response_metadata={}),
        HumanMessage(content='Buffer memory just stores the entire conversation, right?', additional_kwargs={}, response_metadata={}),
        AIMessage(content='That makes sense, what about ConversationBufferWindowMemory?', additional_kwargs={}, response_metadata={}),
        HumanMessage(content='Buffer window memory stores the last k messages, dropping the rest.', additional_kwargs={}, response_metadata={}),
        AIMessage(content='Very cool!', additional_kwargs={}, response_metadata={}),
        HumanMessage(content='what is my name again?', additional_kwargs={}, response_metadata={}),
        AIMessage(content='Your name is Josh!', additional_kwargs={}, response_metadata={})
    ]
}

`ConversationBufferMemory` with `RunnableWithMessageHistory`

As mentioned, the ConversationBufferMemory type is due for deprecation. Instead, we can use the RunnableWithMessageHistory class to implement the same functionality.

When implementing RunnableWithMessageHistory we will use LangChain Expression Language (LCEL) and for this we need to define our prompt template and LLM components. Our llm has already been defined, so now we just define a ChatPromptTemplate object.

python

from langchain.prompts import (
    SystemMessagePromptTemplate, 
    HumanMessagePromptTemplate,
    MessagesPlaceholder,
    ChatPromptTemplate
)

system_prompt = "You are a helpful assistant called Zeta."

prompt_template = ChatPromptTemplate.from_messages([
    SystemMessagePromptTemplate.from_template(system_prompt),
    MessagesPlaceholder(variable_name="history"),
    HumanMessagePromptTemplate.from_template("{query}"),
])

We can link our prompt_template and our llm together to create a pipeline via LCEL.

python

pipeline = prompt_template | llm

Our RunnableWithMessageHistory requires our pipeline to be wrapped in a RunnableWithMessageHistory object. This object requires a few input parameters. One of those is get_session_history, which requires a function that returns a ChatMessageHistory object based on a session ID. We define this function ourselves:

python

from langchain_core.chat_history import InMemoryChatMessageHistory

chat_map = {}
def get_chat_history(session_id: str) -> InMemoryChatMessageHistory:
    if session_id not in chat_map:
        # if session ID doesn't exist, create a new chat history
        chat_map[session_id] = InMemoryChatMessageHistory()
    return chat_map[session_id]

We also need to tell our runnable which variable name to use for the chat history (ie history) and which to use for the user's query (ie query).

python

from langchain_core.runnables.history import RunnableWithMessageHistory

pipeline_with_history = RunnableWithMessageHistory(
    pipeline,
    get_session_history=get_chat_history,
    input_messages_key="query",
    history_messages_key="history"
)

Now we invoke our runnable:

python

pipeline_with_history.invoke(
    {"query": "Hi, my name is Josh"},
    config={"session_id": "id_123"}
)

python

AIMessage(
    content='Hello, Josh! How can I assist you today?',
    additional_kwargs={'refusal': None},
    response_metadata={
        'token_usage': {
            'completion_tokens': 12,
            'prompt_tokens': 26,
            'total_tokens': 38,
            'completion_tokens_details': {
                'accepted_prediction_tokens': 0,
                'audio_tokens': 0,
                'reasoning_tokens': 0,
                'rejected_prediction_tokens': 0
            },
            'prompt_tokens_details': {
                'audio_tokens': 0,
                'cached_tokens': 0
            }
        },
        'model_name': 'gpt-4o-2024-08-06',
        'system_fingerprint': 'fp_5f20662549',
        'finish_reason': 'stop',
        'logprobs': None
    },
    id='run-1401ceae-1dc5-4e39-88fd-521df526290e-0',
    usage_metadata={
        'input_tokens': 26,
        'output_tokens': 12,
        'total_tokens': 38,
        'input_token_details': {
            'audio': 0,
            'cache_read': 0
        },
        'output_token_details': {
            'audio': 0,
            'reasoning': 0
        }
    }
)

Our chat history will now be memorized and retrieved whenever we invoke our runnable with the same session ID.

python

pipeline_with_history.invoke(
    {"query": "What is my name again?"},
    config={"session_id": "id_123"}
)

python

AIMessage(
    content='Your name is Josh. How can I help you today?',
    additional_kwargs={'refusal': None},
    response_metadata={
        'token_usage': {
            'completion_tokens': 13,
            'prompt_tokens': 51,
            'total_tokens': 64,
            'completion_tokens_details': {
                'accepted_prediction_tokens': 0,
                'audio_tokens': 0,
                'reasoning_tokens': 0,
                'rejected_prediction_tokens': 0
            },
            'prompt_tokens_details': {
                'audio_tokens': 0,
                'cached_tokens': 0
            }
        },
        'model_name': 'gpt-4o-2024-08-06',
        'system_fingerprint': 'fp_5f20662549',
        'finish_reason': 'stop',
        'logprobs': None
    },
    id='run-6a461850-5790-4ad0-b86d-9d21f49a9534-0',
    usage_metadata={
        'input_tokens': 51,
        'output_tokens': 13,
        'total_tokens': 64,
        'input_token_details': {
            'audio': 0,
            'cache_read': 0
        },
        'output_token_details': {
            'audio': 0,
            'reasoning': 0
        }
    }
)

We have now recreated the ConversationBufferMemory type using the RunnableWithMessageHistory class. Let's continue onto other memory types and see how these can be implemented.

2. `ConversationBufferWindowMemory`

The ConversationBufferWindowMemory type is similar to ConversationBufferMemory, but only keeps track of the last k messages. There are a few reasons why we would want to keep only the last k messages:

More messages mean more tokens are sent with each request, more tokens increases latency and cost.
LLMs tend to perform worse when given more tokens, making them more likely to deviate from instructions, hallucinate, or "forget" information provided to them. Conciseness is key to high performing LLMs.
If we keep all messages we will eventually hit the LLM's context window limit, by adding a window size k we can ensure we never hit this limit.

The buffer window solves many problems that we encounter with the standard buffer memory, while still being a very simple and intuitive form of conversational memory.

python

from langchain.memory import ConversationBufferWindowMemory

memory = ConversationBufferWindowMemory(k=4, return_messages=True)

We populate this memory using the same methods as before:

python

memory.chat_memory.add_user_message("Hi, my name is Josh")
memory.chat_memory.add_ai_message("Hey Josh, what's up? I'm an AI model called Zeta.")
memory.chat_memory.add_user_message("I'm researching the different types of conversational memory.")
memory.chat_memory.add_ai_message("That's interesting, what are some examples?")
memory.chat_memory.add_user_message("I've been looking at ConversationBufferMemory and ConversationBufferWindowMemory.")
memory.chat_memory.add_ai_message("That's interesting, what's the difference?")
memory.chat_memory.add_user_message("Buffer memory just stores the entire conversation, right?")
memory.chat_memory.add_ai_message("That makes sense, what about ConversationBufferWindowMemory?")
memory.chat_memory.add_user_message("Buffer window memory stores the last k messages, dropping the rest.")
memory.chat_memory.add_ai_message("Very cool!")

python

{'history': [
    HumanMessage(content="I'm researching the different types of conversational memory.", additional_kwargs={}, response_metadata={}),
    AIMessage(content="That's interesting, what are some examples?", additional_kwargs={}, response_metadata={}),
    HumanMessage(content="I've been looking at ConversationBufferMemory and ConversationBufferWindowMemory.", additional_kwargs={}, response_metadata={}),
    AIMessage(content="That's interesting, what's the difference?", additional_kwargs={}, response_metadata={}),
    HumanMessage(content='Buffer memory just stores the entire conversation, right?', additional_kwargs={}, response_metadata={}),
    AIMessage(content='That makes sense, what about ConversationBufferWindowMemory?', additional_kwargs={}, response_metadata={}),
    HumanMessage(content='Buffer window memory stores the last k messages, dropping the rest.', additional_kwargs={}, response_metadata={}),
    AIMessage(content='Very cool!', additional_kwargs={}, response_metadata={})
]}

As before, we use the ConversationChain object (again, this is deprecated and we will rewrite it with RunnableWithMessageHistory in a moment).

python

chain = ConversationChain(
    llm=llm, 
    memory=memory,
    verbose=True
)

Now let's see if our LLM remembers our name:

python

chain.invoke({"input": "what is my name again?"})

python

{
    'input': 'what is my name again?',
    'response': (  # v--- this time our LLM is unable to remember our name
        "I'm sorry, but I don't have the ability to remember personal details like "
        "your name from our previous interactions. If you'd like, you can tell me your "
        "name again!"
    ),
    'history': [
        HumanMessage(content="I'm researching the different types of conversational memory.", additional_kwargs={}, response_metadata={}),
        AIMessage(content="That's interesting, what are some examples?", additional_kwargs={}, response_metadata={}),
        HumanMessage(content="I've been looking at ConversationBufferMemory and ConversationBufferWindowMemory.", additional_kwargs={}, response_metadata={}),
        AIMessage(content="That's interesting, what's the difference?", additional_kwargs={}, response_metadata={}),
        HumanMessage(content='Buffer memory just stores the entire conversation, right?', additional_kwargs={}, response_metadata={}),
        AIMessage(content='That makes sense, what about ConversationBufferWindowMemory?', additional_kwargs={}, response_metadata={}),
        HumanMessage(content='Buffer window memory stores the last k messages, dropping the rest.', additional_kwargs={}, response_metadata={}),
        AIMessage(content='Very cool!', additional_kwargs={}, response_metadata={})
    ]
}

The reason our LLM can no longer remember our name is because we have set the k parameter to 4, meaning that only the last messages are stored in memory, as we can see above this does not include the first message where we introduced ourselves.

Based on the agent forgetting our name, we might wonder why we would ever use this memory type compared to the standard buffer memory. Well, as with most things in AI, it is always a trade-off. Here we are able to support much longer conversations, use less tokens, and improve latency — but these come at the cost of forgetting non-recent messages.

`ConversationBufferWindowMemory` with `RunnableWithMessageHistory`

To implement this memory type using the RunnableWithMessageHistory class, we can use the same approach as before. We define our prompt_template and llm as before, and then wrap our pipeline in a RunnableWithMessageHistory object.

For the window feature, we need to define a custom version of the InMemoryChatMessageHistory class that removes any messages beyond the last k messages.

python

from pydantic import BaseModel, Field
from langchain_core.chat_history import BaseChatMessageHistory
from langchain_core.messages import BaseMessage

class BufferWindowMessageHistory(BaseChatMessageHistory, BaseModel):
    messages: list[BaseMessage] = Field(default_factory=list)
    k: int = Field(default_factory=int)

    def __init__(self, k: int):
        super().__init__(k=k)
        print(f"Initializing BufferWindowMessageHistory with k={k}")

    def add_messages(self, messages: list[BaseMessage]) -> None:
        """Add messages to the history, removing any messages beyond
        the last `k` messages.
        """
        self.messages.extend(messages)
        self.messages = self.messages[-self.k:]

    def clear(self) -> None:
        """Clear the history."""
        self.messages = []

python

chat_map = {}
def get_chat_history(session_id: str, k: int = 4) -> BufferWindowMessageHistory:
    print(f"get_chat_history called with session_id={session_id} and k={k}")
    if session_id not in chat_map:
        # if session ID doesn't exist, create a new chat history
        chat_map[session_id] = BufferWindowMessageHistory(k=k)
    # remove anything beyond the last
    return chat_map[session_id]

python

from langchain_core.runnables import ConfigurableFieldSpec

pipeline_with_history = RunnableWithMessageHistory(
    pipeline,
    get_session_history=get_chat_history,
    input_messages_key="query",
    history_messages_key="history",
    history_factory_config=[
        ConfigurableFieldSpec(
            id="session_id",
            annotation=str,
            name="Session ID",
            description="The session ID to use for the chat history",
            default="id_default",
        ),
        ConfigurableFieldSpec(
            id="k",
            annotation=int,
            name="k",
            description="The number of messages to keep in the history",
            default=4,
        )
    ]
)

Now we invoke our runnable, this time passing a k parameter via the config parameter.

python

pipeline_with_history.invoke(
    {"query": "Hi, my name is Josh"},
    config={"configurable": {"session_id": "id_k4", "k": 4}}
)

text

get_chat_history called with session_id=id_k4 and k=4
Initializing BufferWindowMessageHistory with k=4

python

AIMessage(
    content='Hello, Josh! How can I assist you today?',
    additional_kwargs={'refusal': None},
    response_metadata={
        'token_usage': {
            'completion_tokens': 12,
            'prompt_tokens': 26,
            'total_tokens': 38,
            'completion_tokens_details': {
                'accepted_prediction_tokens': 0,
                'audio_tokens': 0,
                'reasoning_tokens': 0,
                'rejected_prediction_tokens': 0
            },
            'prompt_tokens_details': {
                'audio_tokens': 0,
                'cached_tokens': 0
            }
        },
        'model_name': 'gpt-4o-2024-08-06',
        'system_fingerprint': 'fp_5f20662549',
        'finish_reason': 'stop',
        'logprobs': None
    },
    id='run-9a97c29a-f83b-4f06-a4be-11af114eb10f-0',
    usage_metadata={
        'input_tokens': 26,
        'output_tokens': 12,
        'total_tokens': 38,
        'input_token_details': {'audio': 0,'cache_read': 0},
        'output_token_details': {'audio': 0, 'reasoning': 0}
    }
)

We can also modify the messages that are stored in memory by modifying the records inside the chat_map dictionary directly.

python

chat_map["id_k4"].clear()  # clear the history

# manually insert history
chat_map["id_k4"].add_user_message("Hi, my name is Josh")
chat_map["id_k4"].add_ai_message("I'm an AI model called Zeta.")
chat_map["id_k4"].add_user_message("I'm researching the different types of conversational memory.")
chat_map["id_k4"].add_ai_message("That's interesting, what are some examples?")
chat_map["id_k4"].add_user_message("I've been looking at ConversationBufferMemory and ConversationBufferWindowMemory.")
chat_map["id_k4"].add_ai_message("That's interesting, what's the difference?")
chat_map["id_k4"].add_user_message("Buffer memory just stores the entire conversation, right?")
chat_map["id_k4"].add_ai_message("That makes sense, what about ConversationBufferWindowMemory?")
chat_map["id_k4"].add_user_message("Buffer window memory stores the last k messages, dropping the rest.")
chat_map["id_k4"].add_ai_message("Very cool!")

# if we now view the messages, we'll see that ONLY the last 4 messages are stored
chat_map["id_k4"].messages

python

[
    HumanMessage(content='Buffer memory just stores the entire conversation, right?', additional_kwargs={}, response_metadata={}),
    AIMessage(content='That makes sense, what about ConversationBufferWindowMemory?', additional_kwargs={}, response_metadata={}),
    HumanMessage(content='Buffer window memory stores the last k messages, dropping the rest.', additional_kwargs={}, response_metadata={}),
    AIMessage(content='Very cool!', additional_kwargs={}, response_metadata={})
]

Now let's see at which k value our LLM remembers our name — from the above we can already see that with k=4 our name is not mentioned, so when running with k=4 we should expect the LLM to forget our name:

python

pipeline_with_history.invoke(
    {"query": "what is my name again?"},
    config={"configurable": {"session_id": "id_k4", "k": 4}}
)

text

get_chat_history called with session_id=id_k4 and k=4

python

AIMessage(
    content=(
        "I'm sorry, but I don't have access to your name. If you could remind me, I'd "
        "be happy to use it in our conversation!"
    ),
    additional_kwargs={'refusal': None},
    response_metadata={
        'token_usage': {
            'completion_tokens': 29,
            'prompt_tokens': 79,
            'total_tokens': 108,
            'completion_tokens_details': {
                'accepted_prediction_tokens': 0,
                'audio_tokens': 0,
                'reasoning_tokens': 0,
                'rejected_prediction_tokens': 0
            },
            'prompt_tokens_details': {'audio_tokens': 0, 'cached_tokens': 0}
        },
        'model_name': 'gpt-4o-2024-08-06',
        'system_fingerprint': 'fp_d28bcae782',
        'finish_reason': 'stop',
        'logprobs': None
    },
    id='run-081cfc7a-930b-439f-9564-eab5efd5d979-0',
    usage_metadata={
        'input_tokens': 79,
        'output_tokens': 29,
        'total_tokens': 108,
        'input_token_details': {'audio': 0, 'cache_read': 0},
        'output_token_details': {'audio': 0, 'reasoning': 0}
    }
)

With k=4 our LLM is unable to remember our name, so let's initialize a new session with k=14.

python

pipeline_with_history.invoke(
    {"query": "Hi, my name is Josh"},
    config={"session_id": "id_k14", "k": 14}
)

text

get_chat_history called with session_id=id_k14 and k=14
Initializing BufferWindowMessageHistory with k=14

python

AIMessage(
    content='Hello, Josh! How can I assist you today?',
    additional_kwargs={'refusal': None},
    response_metadata={
        'token_usage': {
            'completion_tokens': 12,
            'prompt_tokens': 26,
            'total_tokens': 38,
            'completion_tokens_details': {
                'accepted_prediction_tokens': 0,
                'audio_tokens': 0,
                'reasoning_tokens': 0,
                'rejected_prediction_tokens': 0
            },
            'prompt_tokens_details': {'audio_tokens': 0, 'cached_tokens': 0}
        },
        'model_name': 'gpt-4o-2024-08-06',
        'system_fingerprint': 'fp_5f20662549',
        'finish_reason': 'stop',
        'logprobs': None
    },
    id='run-ac6b5720-b4b9-4235-8483-70371a7ccc91-0',
    usage_metadata={
        'input_tokens': 26,
        'output_tokens': 12,
        'total_tokens': 38,
        'input_token_details': {'audio': 0, 'cache_read': 0},
        'output_token_details': {'audio': 0, 'reasoning': 0}
    }
)

We'll manually insert the remaining messages as before:

python

chat_map["id_k14"].add_user_message("I'm researching the different types of conversational memory.")
chat_map["id_k14"].add_ai_message("That's interesting, what are some examples?")
chat_map["id_k14"].add_user_message("I've been looking at ConversationBufferMemory and ConversationBufferWindowMemory.")
chat_map["id_k14"].add_ai_message("That's interesting, what's the difference?")
chat_map["id_k14"].add_user_message("Buffer memory just stores the entire conversation, right?")
chat_map["id_k14"].add_ai_message("That makes sense, what about ConversationBufferWindowMemory?")
chat_map["id_k14"].add_user_message("Buffer window memory stores the last k messages, dropping the rest.")
chat_map["id_k14"].add_ai_message("Very cool!")

# with k=14 all of our messages will be stored
chat_map["id_k14"].messages

python

[
    HumanMessage(content='Hi, my name is Josh', additional_kwargs={}, response_metadata={}),
    AIMessage(content='Hello, Josh! How can I assist you today?', additional_kwargs={'refusal': None}, response_metadata={'token_usage': {'completion_tokens': 12, 'prompt_tokens': 26, 'total_tokens': 38, 'completion_tokens_details': {'accepted_prediction_tokens': 0, 'audio_tokens': 0, 'reasoning_tokens': 0, 'rejected_prediction_tokens': 0}, 'prompt_tokens_details': {'audio_tokens': 0, 'cached_tokens': 0}}, 'model_name': 'gpt-4o-2024-08-06', 'system_fingerprint': 'fp_5f20662549', 'finish_reason': 'stop', 'logprobs': None}, id='run-ac6b5720-b4b9-4235-8483-70371a7ccc91-0', usage_metadata={'input_tokens': 26, 'output_tokens': 12, 'total_tokens': 38, 'input_token_details': {'audio': 0, 'cache_read': 0}, 'output_token_details': {'audio': 0, 'reasoning': 0}}),
    HumanMessage(content="I'm researching the different types of conversational memory.", additional_kwargs={}, response_metadata={}),
    AIMessage(content="That's interesting, what are some examples?", additional_kwargs={}, response_metadata={}),
    HumanMessage(content="I've been looking at ConversationBufferMemory and ConversationBufferWindowMemory.", additional_kwargs={}, response_metadata={}),
    AIMessage(content="That's interesting, what's the difference?", additional_kwargs={}, response_metadata={}),
    HumanMessage(content='Buffer memory just stores the entire conversation, right?', additional_kwargs={}, response_metadata={}),
    AIMessage(content='That makes sense, what about ConversationBufferWindowMemory?', additional_kwargs={}, response_metadata={}),
    HumanMessage(content='Buffer window memory stores the last k messages, dropping the rest.', additional_kwargs={}, response_metadata={}),
    AIMessage(content='Very cool!', additional_kwargs={}, response_metadata={})
]

Now let's see if the LLM remembers our name:

python

pipeline_with_history.invoke(
    {"query": "what is my name again?"},
    config={"session_id": "id_k14", "k": 14}
)

text

get_chat_history called with session_id=id_k14 and k=14

python

AIMessage(
    content='Your name is Josh.',  # <--- as we'd expect, the LLM now remembers our name
    additional_kwargs={'refusal': None},
    response_metadata={
        'token_usage': {
            'completion_tokens': 6,
            'prompt_tokens': 157,
            'total_tokens': 163,
            'completion_tokens_details': {
                'accepted_prediction_tokens': 0,
                'audio_tokens': 0,
                'reasoning_tokens': 0,
                'rejected_prediction_tokens': 0
            },
            'prompt_tokens_details': {
                'audio_tokens': 0,
                'cached_tokens': 0
            }
        },
        'model_name': 'gpt-4o-2024-08-06',
        'system_fingerprint': 'fp_d28bcae782',
        'finish_reason': 'stop',
        'logprobs': None
    },
    id='run-26a656b9-7240-4c77-ac0a-bdfe8eb2dc5c-0',
    usage_metadata={
        'input_tokens': 157,
        'output_tokens': 6,
        'total_tokens': 163,
        'input_token_details': {
            'audio': 0,
            'cache_read': 0
        },
        'output_token_details': {
            'audio': 0,
            'reasoning': 0
        }
    }
)

That's it! Our LLM is now remembering our name, confirming that we've correctly refactored our buffer window memory with the recommended RunnableWithMessageHistory class.

3. `ConversationSummaryMemory`

Next up we have ConversationSummaryMemory, this memory type keeps track of a summary of the conversation rather than the entire conversation. This is useful for long conversations where we don't need to keep track of the entire conversation, but we do want to keep some thread of the full conversation.

As before, we'll start with the original memory class before reimplementing it with the RunnableWithMessageHistory class.

python

from langchain.memory import ConversationSummaryMemory

memory = ConversationSummaryMemory(llm=llm)

Unlike with the previous memory types, we need to provide an llm to initialize ConversationSummaryMemory. The reason for this is that we need an LLM to generate the conversation summaries.

Beyond this small tweak, using ConversationSummaryMemory is the same as with our previous memory types when using the deprecated ConversationChain object.

python

chain = ConversationChain(
    llm=llm, 
    memory = memory,
    verbose=True
)

Let's test:

python

chain.invoke({"input": "hello there my name is Josh"})
chain.invoke({"input": "I am researching the different types of conversational memory."})
chain.invoke({"input": "I have been looking at ConversationBufferMemory and ConversationBufferWindowMemory."})
chain.invoke({"input": "Buffer memory just stores the entire conversation"})
chain.invoke({"input": "Buffer window memory stores the last k messages, dropping the rest."})

text

[1m> Entering new ConversationChain chain...[0m
Prompt after formatting:
[32;1m[1;3mThe following is a friendly conversation between a human and an AI. The AI
is talkative and provides lots of specific details from its context. If the AI does not
know the answer to a question, it truthfully says it does not know.

Current conversation:

Human: hello there my name is Josh
AI:[0m

[1m> Finished chain.[0m


[1m> Entering new ConversationChain chain...[0m
Prompt after formatting:
[32;1m[1;3mThe following is a friendly conversation between a human and an AI. The AI
is talkative and provides lots of specific details from its context. If the AI does not
know the answer to a question, it truthfully says it does not know.

Current conversation:
Josh introduces himself to the AI, which greets him warmly and offers to chat or help
with any questions. The AI asks how Josh's day is going.
Human: I am researching the different types of conversational memory.
AI:[0m

[1m> Finished chain.[0m


[1m> Entering new ConversationChain chain...[0m
Prompt after formatting:
[32;1m[1;3mThe following is a friendly conversation between a human and an AI. The AI
is talkative and provides lots of specific details from its context. If the AI does not
know the answer to a question, it truthfully says it does not know.

Current conversation:
Josh introduces himself to the AI, which greets him warmly and offers to chat or help
with any questions. The AI asks how Josh's day is going. Josh mentions he is
researching different types of conversational memory. The AI explains that
conversational memory in AI can be categorized into short-term memory, which tracks
the current conversation context; long-term memory, which stores information over
multiple interactions; episodic memory, which recalls specific past interactions; and
semantic memory, which involves understanding facts and general knowledge. Each type
plays a crucial role in making AI interactions more natural and engaging.
Human: I have been looking at ConversationBufferMemory and
ConversationBufferWindowMemory.
AI:[0m

[1m> Finished chain.[0m


[1m> Entering new ConversationChain chain...[0m
Prompt after formatting:
[32;1m[1;3mThe following is a friendly conversation between a human and an AI. The AI
is talkative and provides lots of specific details from its context. If the AI does not
know the answer to a question, it truthfully says it does not know.

Current conversation:
Josh introduces himself to the AI, which greets him warmly and offers to chat or help
with any questions. The AI asks how Josh's day is going. Josh mentions he is
researching different types of conversational memory. The AI explains that
conversational memory in AI can be categorized into short-term memory, which tracks the
current conversation context; long-term memory, which stores information over multiple
interactions; episodic memory, which recalls specific past interactions; and semantic
memory, which involves understanding facts and general knowledge. Each type plays a
crucial role in making AI interactions more natural and engaging. Josh is looking at
ConversationBufferMemory and ConversationBufferWindowMemory. The AI explains that
ConversationBufferMemory stores the entire conversation history, useful for maintaining
context over long interactions but can become unwieldy. ConversationBufferWindowMemory
keeps only the most recent part of the conversation, maintaining context without being
overwhelmed by the entire history, useful for concise interactions. Both are crucial
for meaningful dialogues, each with advantages depending on the use case. The AI
inquires if Josh is considering implementing one of these or just exploring the
concepts.
Human: Buffer memory just stores the entire conversation
AI:[0m

[1m> Finished chain.[0m


[1m> Entering new ConversationChain chain...[0m
Prompt after formatting:
[32;1m[1;3mThe following is a friendly conversation between a human and an AI. The AI
is talkative and provides lots of specific details from its context. If the AI does not
know the answer to a question, it truthfully says it does not know.

Current conversation:
Josh introduces himself to the AI, which greets him warmly and offers to chat or help
with any questions. The AI asks how Josh's day is going. Josh mentions he is
researching different types of conversational memory. The AI explains that
conversational memory in AI can be categorized into short-term memory, which tracks the
current conversation context; long-term memory, which stores information over multiple
interactions; episodic memory, which recalls specific past interactions; and semantic
memory, which involves understanding facts and general knowledge. Each type plays a
crucial role in making AI interactions more natural and engaging. Josh is looking at
ConversationBufferMemory and ConversationBufferWindowMemory. The AI explains that
ConversationBufferMemory stores the entire conversation history, useful for maintaining
context over long interactions but can become unwieldy. ConversationBufferWindowMemory
keeps only the most recent part of the conversation, maintaining context without being
overwhelmed by the entire history, useful for concise interactions. Both are crucial
for meaningful dialogues, each with advantages depending on the use case. The AI
inquires if Josh is considering implementing one of these or just exploring the
concepts. Josh confirms that Buffer memory stores the entire conversation, and the AI
elaborates on the benefits and challenges of using ConversationBufferMemory, noting its
comprehensive context retention but potential unwieldiness in lengthy interactions, and
contrasts it with ConversationBufferWindowMemory for manageability. The AI asks if Josh
is considering applying these concepts to a specific project or just exploring them.
Human: Buffer window memory stores the last k messages, dropping the rest.
AI:[0m

[1m> Finished chain.[0m

We can see how the conversation summary varies with each new message. Let's see if the LLM is able to recall our name:

python

chain.invoke({"input": "What is my name again?"})

python

{
    'input': 'What is my name again?',
    'response': (
        "Your name is Josh! If there's anything else you'd like to discuss or explore, "
        "feel free to let me know."
    ),
    'history': (
        "Josh introduces himself to the AI, which greets him warmly and offers to chat "
        "or help with any questions. The AI asks how Josh's day is going. Josh "
        "mentions he is researching different types of conversational memory. The AI "
        "explains that conversational memory in AI can be categorized into short-term "
        "memory, which tracks the current conversation context; long-term memory, "
        "which stores information over multiple interactions; episodic memory, which "
        "recalls specific past interactions; and semantic memory, which involves "
        "understanding facts and general knowledge. Each type plays a crucial role in "
        "making AI interactions more natural and engaging. Josh is looking at "
        "ConversationBufferMemory and ConversationBufferWindowMemory. The AI explains "
        "that ConversationBufferMemory stores the entire conversation history, useful "
        "for maintaining context over long interactions but can become unwieldy. "
        "ConversationBufferWindowMemory keeps only the most recent part of the "
        "conversation, maintaining context without being overwhelmed by the entire "
        "history, useful for concise interactions. Both are crucial for meaningful "
        "dialogues, each with advantages depending on the use case. The AI inquires if "
        "Josh is considering implementing one of these or just exploring the concepts. "
        "Josh confirms that Buffer memory stores the entire conversation, and the AI "
        "elaborates on the benefits and challenges of using ConversationBufferMemory, "
        "noting its comprehensive context retention but potential unwieldiness in "
        "lengthy interactions, and contrasts it with ConversationBufferWindowMemory "
        "for manageability. The AI asks if Josh is considering applying these concepts "
        "to a specific project or just exploring them. Josh notes that Buffer window "
        "memory stores the last k messages, dropping the rest. The AI agrees, "
        "explaining that this approach helps maintain context without overwhelming the "
        "system, making it ideal for applications where real-time context is crucial, "
        "and asks if Josh is considering how this might fit into a specific project or "
        "just exploring the memory models."
    )
}

As this information was stored in the summary the LLM successfully recalled our name. This may not always be the case, by summarizing the conversation we inevitably compress the full amount of information and so we may lose key details occasionally. Nonetheless, this is a great memory type for long conversations while retaining some key information.

`ConversationSummaryMemory` with `RunnableWithMessageHistory`

Let's implement this memory type using the RunnableWithMessageHistory class. As with the window buffer memory, we need to define a custom implementation of the InMemoryChatMessageHistory class. We'll call this one ConversationSummaryMessageHistory.

python

from langchain_core.messages import SystemMessage


class ConversationSummaryMessageHistory(BaseChatMessageHistory, BaseModel):
    messages: list[BaseMessage] = Field(default_factory=list)
    llm: ChatOpenAI = Field(default_factory=ChatOpenAI)

    def __init__(self, llm: ChatOpenAI):
        super().__init__(llm=llm)

    def add_messages(self, messages: list[BaseMessage]) -> None:
        """Add messages to the history, removing any messages beyond
        the last `k` messages.
        """
        self.messages.extend(messages)
        # construct the summary chat messages
        summary_prompt = ChatPromptTemplate.from_messages([
            SystemMessagePromptTemplate.from_template(
                "Given the existing conversation summary and the new messages, "
                "generate a new summary of the conversation. Ensuring to maintain "
                "as much relevant information as possible."
            ),
            HumanMessagePromptTemplate.from_template(
                "Existing conversation summary:\n{existing_summary}\n\n"
                "New messages:\n{messages}"
            )
        ])
        # format the messages and invoke the LLM
        new_summary = self.llm.invoke(
            summary_prompt.format_messages(existing_summary=self.messages, messages=messages)
        )
        # replace the existing history with a single system summary message 
        self.messages = [SystemMessage(content=new_summary.content)]

    def clear(self) -> None:
        """Clear the history."""
        self.messages = []

python

chat_map = {}
def get_chat_history(session_id: str, llm: ChatOpenAI) -> ConversationSummaryMessageHistory:
    if session_id not in chat_map:
        # if session ID doesn't exist, create a new chat history
        chat_map[session_id] = ConversationSummaryMessageHistory(llm=llm)
    # return the chat history
    return chat_map[session_id]

As before, we need to define our RunnableWithMessageHistory with the ConfigurableFieldSpec objects.

python

pipeline_with_history = RunnableWithMessageHistory(
    pipeline,
    get_session_history=get_chat_history,
    input_messages_key="query",
    history_messages_key="history",
    history_factory_config=[
        ConfigurableFieldSpec(
            id="session_id",
            annotation=str,
            name="Session ID",
            description="The session ID to use for the chat history",
            default="id_default",
        ),
        ConfigurableFieldSpec(
            id="llm",
            annotation=ChatOpenAI,
            name="LLM",
            description="The LLM to use for the conversation summary",
            default=llm,
        )
    ]
)

Now we invoke our runnable, this time passing a llm parameter via the config parameter.

python

pipeline_with_history.invoke(
    {"query": "Hi, my name is Josh"},
    config={"session_id": "id_123", "llm": llm}
)

python

AIMessage(
    content='Hello, Josh! How can I assist you today?',
    additional_kwargs={'refusal': None},
    response_metadata={
        'token_usage': {
            'completion_tokens': 11,
            'prompt_tokens': 26,
            'total_tokens': 37,
            'completion_tokens_details': {
                'accepted_prediction_tokens': 0,
                'audio_tokens': 0,
                'reasoning_tokens': 0,
                'rejected_prediction_tokens': 0
            },
            'prompt_tokens_details': {'audio_tokens': 0, 'cached_tokens': 0}
        },
        'model_name': 'gpt-4o-2024-08-06',
        'system_fingerprint': 'fp_5f20662549',
        'finish_reason': 'stop',
        'logprobs': None
    },
    id='run-cd14fd0e-b0bc-44f2-8bdb-0ee975a062b7-0',
    usage_metadata={
        'input_tokens': 26,
        'output_tokens': 11,
        'total_tokens': 37,
        'input_token_details': {'audio': 0, 'cache_read': 0},
        'output_token_details': {'audio': 0, 'reasoning': 0}
    }
)

Let's see what summary was generated:

python

chat_map["id_123"].messages

python

[
    SystemMessage(
        content=(
            'Josh introduced himself, and the AI greeted him, asking how it could '
            'assist him.'
        ),
        additional_kwargs={},
        response_metadata={}
    )
]

Let's continue the conversation and see if the summary is updated:

python

pipeline_with_history.invoke(
    {"query": "I'm researching the different types of conversational memory."},
    config={"session_id": "id_123", "llm": llm}
)

chat_map["id_123"].messages

python

[
    SystemMessage(
        content=(
            "Josh introduced himself, and the AI greeted him, asking how it could "
            "assist. Josh mentioned he was researching different types of "
            "conversational memory. The AI explained that conversational memory in AI "
            "refers to the system's ability to remember and utilize past interactions "
            "to enhance future conversations. It detailed several types of "
            "conversational memory: \n\n1. **Short-term Memory**: Retains information "
            "during a single session to maintain context.\n2. **Long-term Memory**: "
            "Remembers information across sessions for personalization.\n3. "
            "**Episodic Memory**: Stores specific events or interactions as distinct "
            "episodes.\n4. **Semantic Memory**: Stores general knowledge and facts for "
            "language understanding.\n5. **Working Memory**: Temporarily holds and "
            "manipulates information for reasoning.\n6. **Contextual Memory**: "
            "Understands and remembers the context, including user intent and mood.\n\n"
            "Each type plays a crucial role in enabling AI systems to engage in more "
            "natural and effective conversations, with different types prioritized "
            "based on the application."
        ),
        additional_kwargs={},
        response_metadata={}
    )
]

The summary has been updated to include the new messages. So far so good! Let's continue by adding a few more messages before returning to the name question.

python

for msg in [
    "I have been looking at ConversationBufferMemory and ConversationBufferWindowMemory.",
    "Buffer memory just stores the entire conversation",
    "Buffer window memory stores the last k messages, dropping the rest."
]:
    pipeline_with_history.invoke(
        {"query": msg},
        config={"session_id": "id_123", "llm": llm}
    )

Let's see the latest summary:

python

chat_map["id_123"].messages

python

[
    SystemMessage(
        content=(
            'Josh introduced himself and mentioned he was researching different types '
            'of conversational memory. The AI explained various types, including '
            'short-term, long-term, episodic, semantic, working, and contextual '
            'memory, each playing a crucial role in enhancing AI conversations. Josh '
            'then expressed interest in ConversationBufferMemory and '
            'ConversationBufferWindowMemory. The AI elaborated on these mechanisms, '
            'explaining that ConversationBufferMemory retains the entire conversation '
            'history within a session, which is useful for maintaining context but can '
            'be computationally expensive. In contrast, ConversationBufferWindowMemory '
            'retains only a fixed-size window of recent exchanges, balancing context '
            'with computational efficiency. The choice between them depends on '
            'application needs and resources. Josh reiterated that buffer memory '
            'stores the entire conversation, and the AI confirmed this, emphasizing '
            'the benefits and potential computational costs of using '
            'ConversationBufferMemory. Josh also noted that buffer window memory '
            'stores the last \\( k \\) messages, dropping the rest, which the AI '
            'confirmed, explaining that this approach helps maintain recent context '
            'while managing computational resources more efficiently.'
        ),
        additional_kwargs={},
        response_metadata={}
    )
]

The information about our name has been maintained. Let's see if this is enough for our LLM to correctly recall our name.

python

pipeline_with_history.invoke(
    {"query": "What is my name again?"},
    config={"session_id": "id_123", "llm": llm}
)

python

AIMessage(
    content='Your name is Josh.',
    additional_kwargs={'refusal': None},
    response_metadata={
        'token_usage': {
            'completion_tokens': 6,
            'prompt_tokens': 230,
            'total_tokens': 236,
            'completion_tokens_details': {
                'accepted_prediction_tokens': 0,
                'audio_tokens': 0,
                'reasoning_tokens': 0,
                'rejected_prediction_tokens': 0
            },
            'prompt_tokens_details': {'audio_tokens': 0, 'cached_tokens': 0}
        },
        'model_name': 'gpt-4o-2024-08-06',
        'system_fingerprint': 'fp_d28bcae782',
        'finish_reason': 'stop',
        'logprobs': None
    },
    id='run-2122f629-fa00-47f8-87b2-cb5ee838c1b8-0',
    usage_metadata={
        'input_tokens': 230,
        'output_tokens': 6,
        'total_tokens': 236,
        'input_token_details': {'audio': 0, 'cache_read': 0},
        'output_token_details': {'audio': 0, 'reasoning': 0}
    }
)

Perfect! We've successfully implemented the ConversationSummaryMemory type using the RunnableWithMessageHistory class.

4. `ConversationSummaryBufferMemory`

Our final memory type acts as a combination of ConversationSummaryMemory and ConversationBufferMemory. It keeps the buffer for the conversation up until the previous n tokens, anything beyond that limit is summarized then dropped from the buffer. Producing something like:

text

# ~~ a summary of previous interactions ~~
The user named Josh introduced himself and the AI responded, introducing itself as an AI model called Zeta.
Josh then said he was researching the different types of conversational memory and Zeta asked for some
examples.
# ~~ the most recent messages ~~
Human: I have been looking at ConversationBufferMemory and ConversationBufferWindowMemory.
AI: That's interesting, what's the difference?
Human: Buffer memory just stores the entire conversation
AI: That makes sense, what about ConversationBufferWindowMemory?
Human: Buffer window memory stores the last k messages, dropping the rest.
AI: Very cool!

python

from langchain.memory import ConversationSummaryBufferMemory

memory = ConversationSummaryBufferMemory(
    llm=llm,
    max_token_limit=300,
    return_messages=True
)

As before, we set up the deprecated memory type using the ConversationChain object.

python

chain = ConversationChain(
    llm=llm, 
    memory=memory,
    verbose=True
)

First invoke with a single message:

python

chain.invoke({"input": "Hi, my name is Josh"})

python

{
    'input': 'Hi, my name is Josh',
    'response': (
        "Hello Josh! It's great to meet you. I'm an AI here to chat and help with any "
        "questions you might have. How's your day going so far?"
    ),
    'history': [
        HumanMessage(
            content='Hi, my name is Josh', additional_kwargs={}, response_metadata={}
        ),
        AIMessage(
            content=(
                "Hello Josh! It's great to meet you. I'm an AI here to chat and help "
                "with any questions you might have. How's your day going so far?"
            ),
            additional_kwargs={}, response_metadata={}
        )
    ]
}

Looks good so far, let's continue with a few more messages:

python

for msg in [
    "I'm researching the different types of conversational memory.",
    "I have been looking at ConversationBufferMemory and ConversationBufferWindowMemory.",
    "Buffer memory just stores the entire conversation",
    "Buffer window memory stores the last k messages, dropping the rest."
]:
    chain.invoke({"input": msg})

text

[1m> Entering new ConversationChain chain...[0m
Prompt after formatting:
[32;1m[1;3mThe following is a friendly conversation between a human and an AI. The AI
is talkative and provides lots of specific details from its context. If the AI does not
know the answer to a question, it truthfully says it does not know.

Current conversation:
[
    HumanMessage(
        content='Hi, my name is Josh', additional_kwargs={}, response_metadata={}
    ),
    AIMessage(
        content=(
            "Hello Josh! It's great to meet you. I'm an AI here to chat and help with "
            "any questions you might have. How's your day going so far?"
        ),
        additional_kwargs={}, response_metadata={}
    )
]
Human: I'm researching the different types of conversational memory.
AI:[0m

[1m> Finished chain.[0m


[1m> Entering new ConversationChain chain...[0m
Prompt after formatting:
[32;1m[1;3mThe following is a friendly conversation between a human and an AI. The AI
is talkative and provides lots of specific details from its context. If the AI does not
know the answer to a question, it truthfully says it does not know.

Current conversation:
[
    SystemMessage(
        content=(
            'Josh introduces himself to the AI, which greets him and asks about his '
            'day. Josh is researching different types of conversational memory, and '
            'the AI explains several types: short-term memory for maintaining current '
            'conversation context, long-term memory for storing information across '
            'interactions, episodic memory for recalling specific past interactions, '
            'semantic memory for general knowledge, and working memory for real-time '
            'information processing. Each type enhances AI interactions by making them '
            'more natural and effective.'
        ),
        additional_kwargs={}, response_metadata={}
    )
]
Human: I have been looking at ConversationBufferMemory and ConversationBufferWindowMemory.
AI:[0m

[1m> Finished chain.[0m


[1m> Entering new ConversationChain chain...[0m
Prompt after formatting:
[32;1m[1;3mThe following is a friendly conversation between a human and an AI. The AI
is talkative and provides lots of specific details from its context. If the AI does not
know the answer to a question, it truthfully says it does not know.

Current conversation:
[
    SystemMessage(
        content=(
            'Josh introduces himself to the AI, which greets him and asks about his '
            'day. Josh is researching different types of conversational memory, and '
            'the AI explains several types: short-term memory for maintaining current '
            'conversation context, long-term memory for storing information across '
            'interactions, episodic memory for recalling specific past interactions, '
            'semantic memory for general knowledge, and working memory for real-time '
            'information processing. Each type enhances AI interactions by making them '
            'more natural and effective.'
        ),
        additional_kwargs={}, response_metadata={}
    ),
    HumanMessage(
        content=(
            'I have been looking at ConversationBufferMemory and '
            'ConversationBufferWindowMemory.'
        ),
        additional_kwargs={}, response_metadata={}
    ),
    AIMessage(
        content=(
            'That\'s great, Josh! Both ConversationBufferMemory and '
            'ConversationBufferWindowMemory are fascinating concepts in the realm of '
            'conversational AI. \n\nConversationBufferMemory is essentially a way for '
            'an AI to keep track of the entire conversation history. It stores all the '
            'exchanges between the human and the AI, allowing the AI to refer back to '
            'any part of the conversation as needed. This can be particularly useful '
            'for maintaining context over longer interactions, ensuring that the AI '
            'can recall details from earlier in the conversation.\n\nOn the other '
            'hand, ConversationBufferWindowMemory is a bit more selective. Instead of '
            'storing the entire conversation, it keeps a "window" of the most recent '
            'exchanges. This is useful for maintaining context without overwhelming '
            'the system with too much information, especially in lengthy '
            'conversations. The size of this window can be adjusted depending on the '
            'needs of the application, allowing the AI to focus on the most relevant '
            'recent information.\n\nBoth types of memory are crucial for creating more '
            'natural and coherent interactions, as they help the AI maintain context '
            'and continuity in conversations. Do you have any specific questions about '
            'how these might be implemented or used in practice?'
        ),
        additional_kwargs={}, response_metadata={}
    )
]
Human: Buffer memory just stores the entire conversation
AI:[0m

[1m> Finished chain.[0m


[1m> Entering new ConversationChain chain...[0m
Prompt after formatting:
[32;1m[1;3mThe following is a friendly conversation between a human and an AI. The AI
is talkative and provides lots of specific details from its context. If the AI does not
know the answer to a question, it truthfully says it does not know.

Current conversation:
[
    SystemMessage(
        content=(
            'Josh introduces himself to the AI, which greets him and asks about his '
            'day. Josh is researching different types of conversational memory, and '
            'the AI explains several types: short-term memory for maintaining current '
            'conversation context, long-term memory for storing information across '
            'interactions, episodic memory for recalling specific past interactions, '
            'semantic memory for general knowledge, and working memory for real-time '
            'information processing. Each type enhances AI interactions by making them '
            'more natural and effective. Josh mentions looking at '
            'ConversationBufferMemory and ConversationBufferWindowMemory. The AI '
            'explains that ConversationBufferMemory keeps track of the entire '
            'conversation history, useful for maintaining context over longer '
            'interactions. In contrast, ConversationBufferWindowMemory maintains a '
            '"window" of the most recent exchanges, which is useful for focusing on '
            'relevant recent information without overwhelming the system. Both types '
            'are crucial for creating natural and coherent interactions by helping the '
            'AI maintain context and continuity.'
        ),
        additional_kwargs={}, response_metadata={}
    ),
    HumanMessage(
        content='Buffer memory just stores the entire conversation',
        additional_kwargs={}, response_metadata={}
    ),
    AIMessage(
        content=(
            "Yes, that's correct! Buffer memory, specifically "
            "ConversationBufferMemory, stores the entire conversation history between "
            "the human and the AI. This means every exchange, from the beginning of "
            "the interaction to the current point, is kept in memory. This "
            "comprehensive storage allows the AI to refer back to any part of the "
            "conversation, which can be particularly useful for maintaining context "
            "and continuity, especially in complex or lengthy discussions. It ensures "
            "that the AI can recall details from earlier in the conversation, which "
            "can help in providing more relevant and coherent responses. However, it's "
            "important to manage this memory efficiently to avoid potential issues "
            "with processing large amounts of data. Do you have any other questions "
            "about how buffer memory works or its applications?"
        ),
        additional_kwargs={}, response_metadata={}
    )
]
Human: Buffer window memory stores the last k messages, dropping the rest.
AI:[0m

[1m> Finished chain.[0m

We can see with each new message the initial SystemMessage is updated with a new summary of the conversation. This initial SystemMessage is then followed by the most recent AIMessage and HumanMessage objects.

`ConversationSummaryBufferMemory` with `RunnableWithMessageHistory`

As with the previous memory types, we will implement this memory type again using the RunnableWithMessageHistory class. In our implementation we will modify the buffer window to be based on the number of messages rather than number of tokens. This tweak will make our implementation more closely aligned with original buffer window.

We will implement all of this via a new ConversationSummaryBufferMessageHistory class.

python

class ConversationSummaryBufferMessageHistory(BaseChatMessageHistory, BaseModel):
    messages: list[BaseMessage] = Field(default_factory=list)
    llm: ChatOpenAI = Field(default_factory=ChatOpenAI)
    k: int = Field(default_factory=int)

    def __init__(self, llm: ChatOpenAI, k: int):
        super().__init__(llm=llm, k=k)

    def add_messages(self, messages: list[BaseMessage]) -> None:
        """Add messages to the history, removing any messages beyond
        the last `k` messages and summarizing the messages that we
        drop.
        """
        existing_summary = None
        old_messages = None
        # see if we already have a summary message
        if len(self.messages) > 0 and isinstance(self.messages[0], SystemMessage):
            print(">> Found existing summary")
            existing_summary: str | None = self.messages.pop(0)
        # add the new messages to the history
        self.messages.extend(messages)
        # check if we have too many messages
        if len(self.messages) > self.k:
            print(
                f">> Found {len(self.messages)} messages, dropping "
                f"latest {len(self.messages) - self.k} messages.")
            # pull out the oldest messages...
            old_messages = self.messages[:self.k]
            # ...and keep only the most recent messages
            self.messages = self.messages[-self.k:]
        if old_messages is None:
            print(">> No old messages to update summary with")
            # if we have no old_messages, we have nothing to update in summary
            return
        # construct the summary chat messages
        summary_prompt = ChatPromptTemplate.from_messages([
            SystemMessagePromptTemplate.from_template(
                "Given the existing conversation summary and the new messages, "
                "generate a new summary of the conversation. Ensuring to maintain "
                "as much relevant information as possible."
            ),
            HumanMessagePromptTemplate.from_template(
                "Existing conversation summary:\n{existing_summary}\n\n"
                "New messages:\n{old_messages}"
            )
        ])
        # format the messages and invoke the LLM
        new_summary = self.llm.invoke(
            summary_prompt.format_messages(
                existing_summary=existing_summary,
                old_messages=old_messages
            )
        )
        print(f">> New summary: {new_summary.content}")
        # prepend the new summary to the history
        self.messages = [SystemMessage(content=new_summary.content)] + self.messages

    def clear(self) -> None:
        """Clear the history."""
        self.messages = []

Redefine the get_chat_history function to use our new ConversationSummaryBufferMessageHistory class.

python

chat_map = {}
def get_chat_history(session_id: str, llm: ChatOpenAI, k: int) -> ConversationSummaryBufferMessageHistory:
    if session_id not in chat_map:
        # if session ID doesn't exist, create a new chat history
        chat_map[session_id] = ConversationSummaryBufferMessageHistory(llm=llm, k=k)
    # return the chat history
    return chat_map[session_id]

Setup our pipeline with the new configurable fields.

python

pipeline_with_history = RunnableWithMessageHistory(
    pipeline,
    get_session_history=get_chat_history,
    input_messages_key="query",
    history_messages_key="history",
    history_factory_config=[
        ConfigurableFieldSpec(
            id="session_id",
            annotation=str,
            name="Session ID",
            description="The session ID to use for the chat history",
            default="id_default",
        ),
        ConfigurableFieldSpec(
            id="llm",
            annotation=ChatOpenAI,
            name="LLM",
            description="The LLM to use for the conversation summary",
            default=llm,
        ),
        ConfigurableFieldSpec(
            id="k",
            annotation=int,
            name="k",
            description="The number of messages to keep in the history",
            default=4,
        )
    ]
)

Finally, we invoke our runnable:

python

pipeline_with_history.invoke(
    {"query": "Hi, my name is Josh"},
    config={"session_id": "id_123", "llm": llm, "k": 4}
)
chat_map["id_123"].messages

No old messages to update summary with

python

[
    HumanMessage(
        content='Hi, my name is Josh', additional_kwargs={}, response_metadata={}
    ),
    AIMessage(
        content='Hello, Josh! How can I assist you today?',
        additional_kwargs={'refusal': None},
        response_metadata={
            'token_usage': {
                'completion_tokens': 12,
                'prompt_tokens': 26,
                'total_tokens': 38,
                'completion_tokens_details': {
                    'accepted_prediction_tokens': 0,
                    'audio_tokens': 0,
                    'reasoning_tokens': 0,
                    'rejected_prediction_tokens': 0
                },
                'prompt_tokens_details': {'audio_tokens': 0, 'cached_tokens': 0}
            },
            'model_name': 'gpt-4o-2024-08-06',
            'system_fingerprint': 'fp_5f20662549',
            'finish_reason': 'stop',
            'logprobs': None
        },
        id='run-f2e75736-b0a7-4a45-815b-b50a7bea4ae3-0',
        usage_metadata={
            'input_tokens': 26,
            'output_tokens': 12,
            'total_tokens': 38,
            'input_token_details': {'audio': 0, 'cache_read': 0},
            'output_token_details': {'audio': 0, 'reasoning': 0}
        }
    )
]

python

for i, msg in enumerate([
    "I'm researching the different types of conversational memory.",
    "I have been looking at ConversationBufferMemory and ConversationBufferWindowMemory.",
    "Buffer memory just stores the entire conversation",
    "Buffer window memory stores the last k messages, dropping the rest."
]):
    print(f"---\nMessage {i+1}\n---\n")
    pipeline_with_history.invoke(
        {"query": msg},
        config={"session_id": "id_123", "llm": llm, "k": 4}
    )

text

---
Message 1
---

>> No old messages to update summary with
---
Message 2
---

>> Found 6 messages, dropping latest 2 messages.
>> New summary: Josh introduced himself and expressed interest in researching different
types of conversational memory. The AI provided a detailed explanation of various types
of conversational memory, including episodic, semantic, procedural, autobiographical,
prospective, and working memory. Each type was described in terms of its role in
communication and relationship building. The AI offered further assistance if Josh had
specific questions or needed more detailed information.
---
Message 3
---

>> Found existing summary
>> Found 6 messages, dropping latest 2 messages.
>> New summary: Josh is researching different types of conversational memory. The AI
explained various types, including episodic, semantic, procedural, autobiographical,
prospective, and working memory, detailing their roles in communication and relationship
building. Josh also inquired about ConversationBufferMemory and
ConversationBufferWindowMemory, which are used in conversational AI for memory
management. The AI described ConversationBufferMemory as storing the entire conversation
history, useful for maintaining context but resource-intensive. In contrast,
ConversationBufferWindowMemory stores only a recent segment of the conversation,
balancing context maintenance with resource efficiency. The AI offered further
assistance if Josh had more questions or needed additional details.
---
Message 4
---

>> Found existing summary
>> Found 6 messages, dropping latest 2 messages.
>> New summary: Josh is researching different types of conversational memory, focusing
on ConversationBufferMemory and ConversationBufferWindowMemory, which are used in
conversational AI for memory management. The AI explained that ConversationBufferMemory
stores the entire conversation history, allowing access to any part of the dialogue for
maintaining context and continuity, but it can be resource-intensive. In contrast,
ConversationBufferWindowMemory stores only a recent segment of the conversation, which
is more resource-efficient while still maintaining sufficient context for many tasks.
Josh confirmed his understanding that ConversationBufferMemory stores the entire
conversation, and the AI reiterated the benefits and resource considerations of both
memory types. The AI remains available for further questions or details.

With that we've successfully implemented the ConversationSummaryBufferMemory type using RunnableWithMessageHistory!

Conversational Memory in LangChain

LangChain's Memory Types

Initialize our LLM

1. ConversationBufferMemory

ConversationBufferMemory with RunnableWithMessageHistory

2. ConversationBufferWindowMemory

ConversationBufferWindowMemory with RunnableWithMessageHistory

3. ConversationSummaryMemory

ConversationSummaryMemory with RunnableWithMessageHistory

4. ConversationSummaryBufferMemory

ConversationSummaryBufferMemory with RunnableWithMessageHistory

1. `ConversationBufferMemory`

`ConversationBufferMemory` with `RunnableWithMessageHistory`

2. `ConversationBufferWindowMemory`

`ConversationBufferWindowMemory` with `RunnableWithMessageHistory`

3. `ConversationSummaryMemory`

`ConversationSummaryMemory` with `RunnableWithMessageHistory`

4. `ConversationSummaryBufferMemory`

`ConversationSummaryBufferMemory` with `RunnableWithMessageHistory`