Aurelio logo
Updated on January 19, 2025

LangChain Expression Language (LCEL)

AI Engineering

This chapter will introduce LangChain's Expression Langauge (LCEL). We'll focus on understanding how LCEL works under the hood and how we implement it. We'll provide examples for both OpenAI's gpt-4o-mini and Meta's llama3.2 via Ollama!

Traditional Chains vs LCEL

In this section, we will use the traditional method for building chains before diving into LCEL to compare the two. We will build a pipeline where the user inputs a specific topic, and then the LLM looks for and returns a report on the specified topic. This generates a research report for the user.

Traditional LLMChain

The LLMChain is the simplest chain originally introduced in LangChain. This chain takes a prompt, feeds it into an LLM, and optionally adds an output parsing step before returning the result.

Let's see how we construct this using the traditional method. For this, we need the following:

  • prompt — a PromptTemplate that we use to generate the prompt for the LLM.

  • llm — the LLM we will be using to generate the output.

  • output_parser — an optional output parser that we will use to parse the structured output of the LLM.

python
from langchain import PromptTemplate

prompt_template = "Give me a small report on {topic}"

prompt = PromptTemplate(
input_variables=["topic"],
template=prompt_template
)

We'll start by initializing our connection to the OpenAI API for the LLM. We do need an OpenAI API key, which you can get from the OpenAI platform.

We will use the gpt-4o-mini model with a temperature of 0.0:

python
import os
from getpass import getpass
from langchain_openai import ChatOpenAI

os.environ["OPENAI_API_KEY"] = os.getenv("OPENAI_API_KEY") \
or getpass("Enter your OpenAI API key: ")

llm = ChatOpenAI(
model_name="gpt-4o-mini",
temperature=0.0,
)
python
llm_out = llm.invoke("Hello there")
llm_out
python
AIMessage(
content='Hello! How can I assist you today?',
response_metadata={
'token_usage': {
'completion_tokens': 10, 'prompt_tokens': 9, 'total_tokens': 19,
'completion_tokens_details': {
'accepted_prediction_tokens': 0, 'audio_tokens': 0,
'reasoning_tokens': 0, 'rejected_prediction_tokens': 0
},
'prompt_tokens_details': {
'audio_tokens': 0, 'cached_tokens': 0
}
},
'model_name': 'gpt-4o-mini-2024-07-18', 'system_fingerprint': 'fp_72ed7ab54c',
'finish_reason': 'stop', 'logprobs': None
},
additional_kwargs={'refusal': None},
id='run-fdebe966-3308-422d-a7a0-e7557e744551-0',
usage_metadata={
'input_tokens': 9, 'output_tokens': 10, 'total_tokens': 19,
'input_token_details': {'audio': 0, 'cache_read': 0},
'output_token_details': {'audio': 0, 'reasoning': 0}
}
)

Then, we define our output parser, which will be used to parse the LLM's output. In this case, we will use the StrOutputParser, which will parse the AIMessage output from our LLM into a single string.

python
from langchain.schema.output_parser import StrOutputParser

output_parser = StrOutputParser()

We invoke our parser with invoke:

python
out = output_parser.invoke(llm_out)
out
text
'Hello! How can I assist you today?'

Through the LLMChain class, we can place each of our components in a linear chain.

python
from langchain.chains import LLMChain

chain = LLMChain(prompt=prompt, llm=llm, output_parser=output_parser)
text
LangChainDeprecationWarning: The class `LLMChain` was deprecated in LangChain 0.1.17 and
will be removed in 1.0. Use :meth:`~RunnableSequence, e.g., `prompt | llm`` instead.
chain = LLMChain(prompt=prompt, llm=llm, output_parser=output_parser)

Note that the LLMChain was deprecated in LangChain 0.1.17. Today, the expected way of constructing these chains is through LCEL, which we'll cover in a moment.

We provide a topic that we'd like researched and invoke our chain:

python
result = chain.invoke("retrieval augmented generation")
result
python
{'topic': 'retrieval augmented generation',
'text': (
'### Report on Retrieval-Augmented Generation (RAG)\n\n#### Introduction\n'
'Retrieval-Augmented Generation (RAG) is an innovative approach that combines the '
'strengths of information retrieval and natural language generation. This method '
'enhances the capabilities of language models by allowing them to access external '
'knowledge sources, thereby improving the accuracy and relevance of generated '
'responses.\n\n'
'#### Concept Overview\nRAG operates on the principle of integrating a retrieval '
'mechanism with a generative model. The process typically involves two main '
'components:\n\n'
'1. **Retrieval Component**: This part of the system retrieves relevant documents '
'or pieces of information from a large corpus based on a given query. It employs '
'techniques such as vector embeddings and similarity search to identify the most '
'pertinent data.\n\n'
'2. **Generation Component**: Once the relevant information is retrieved, the '
'generative model (often based on architectures like Transformers) processes this '
'data to produce coherent and contextually appropriate text. The model can '
'leverage the retrieved information to enhance its responses, making them more '
'informative and grounded in factual content.\n\n'
'#### Advantages\n- **Improved Accuracy**: By accessing up-to-date and '
'domain-specific information, RAG can produce more accurate and contextually '
'relevant outputs compared to traditional generative models that rely solely on '
'pre-existing training data.\n- **Dynamic Knowledge Integration**: RAG allows for '
'the integration of external knowledge bases, enabling the model to stay current '
'with new information and trends without requiring retraining.\n'
'- **Enhanced Contextual Understanding**: The retrieval mechanism helps the model '
'understand the context better, leading to more nuanced and relevant responses.\n\n'
'#### Applications\nRAG has a wide range of applications, including:\n'
'- **Question Answering**: Providing precise answers to user queries by retrieving '
'relevant documents and generating responses based on them.\n'
'- **Chatbots and Virtual Assistants**: Enhancing conversational agents with the '
'ability to pull in real-time information, making interactions more informative.\n'
'- **Content Creation**: Assisting in generating articles, reports, or summaries '
'by retrieving relevant data and synthesizing it into coherent narratives.\n\n'
'#### Challenges\nDespite its advantages, RAG faces several challenges:\n'
'- **Retrieval Quality**: The effectiveness of the system heavily depends on the '
'quality of the retrieval component. Poor retrieval can lead to irrelevant or '
'misleading information being used in generation.\n- **Complexity**: Integrating '
'retrieval and generation processes adds complexity to the system architecture, '
'which can complicate training and deployment.\n- **Latency**: The retrieval '
'process can introduce latency, which may affect real-time applications where '
'speed is critical.\n\n#### Conclusion\nRetrieval-Augmented Generation represents '
'a significant advancement in the field of natural language processing. By '
'effectively combining retrieval and generation, RAG systems can produce more '
'accurate, relevant, and contextually aware outputs. As research and development '
'in this area continue, RAG is poised to play a crucial role in various '
'applications, enhancing the capabilities of AI-driven communication and '
'information systems.'
)
}

We can view a formatted version of this output using the Markdown display:

python
from IPython.display import display, Markdown

display(Markdown(result["text"]))
text
### Report on Retrieval-Augmented Generation (RAG)

#### Introduction
Retrieval-Augmented Generation (RAG) is an innovative approach that combines the
strengths of information retrieval and natural language generation. This method enhances
the capabilities of language models by allowing them to access external knowledge
sources, thereby improving the accuracy and relevance of generated responses.

#### Concept Overview
RAG operates on the principle of integrating a retrieval mechanism with a generative
model. The process typically involves two main components:

1. **Retrieval Component**: This part of the system retrieves relevant documents or
pieces of information from a large corpus based on a given query. It employs techniques
such as vector embeddings and similarity search to identify the most pertinent data.

2. **Generation Component**: Once the relevant information is retrieved, the generative
model (often based on architectures like Transformers) processes this data to produce
coherent and contextually appropriate text. The model can leverage the retrieved
information to enhance its responses, making them more informative and grounded in
factual content.

#### Advantages
- **Improved Accuracy**: By accessing up-to-date and domain-specific information, RAG
can produce more accurate and contextually relevant outputs compared to traditional
generative models that rely solely on pre-existing training data.
- **Dynamic Knowledge Integration**: RAG allows for the integration of external
knowledge bases, enabling the model to stay current with new information and trends
without requiring retraining.
- **Enhanced Contextual Understanding**: The retrieval mechanism helps the model
understand the context better, leading to more nuanced and relevant responses.

#### Applications
RAG has a wide range of applications, including:
- **Question Answering**: Providing precise answers to user queries by retrieving
relevant documents and generating responses based on them.
- **Chatbots and Virtual Assistants**: Enhancing conversational agents with the ability
to pull in real-time information, making interactions more informative.
- **Content Creation**: Assisting in generating articles, reports, or summaries by
retrieving relevant data and synthesizing it into coherent narratives.

#### Challenges
Despite its advantages, RAG faces several challenges:
- **Retrieval Quality**: The effectiveness of the system heavily depends on the quality
of the retrieval component. Poor retrieval can lead to irrelevant or misleading
information being used in generation.
- **Complexity**: Integrating retrieval and generation processes adds complexity to the
system architecture, which can complicate training and deployment.
- **Latency**: The retrieval process can introduce latency, which may affect real-time
applications where speed is critical.

#### Conclusion
Retrieval-Augmented Generation represents a significant advancement in the field of
natural language processing. By effectively combining retrieval and generation, RAG
systems can produce more accurate, relevant, and contextually aware outputs. As research
and development in this area continue, RAG is poised to play a crucial role in various
applications, enhancing the capabilities of AI-driven communication and information
systems.

That is a simple LLMChain using the traditional LangChain method. Now, let's move on to LCEL.

LangChain Expression Language (LCEL)

LangChain Expression Language (LCEL) is the recommended approach to building chains in LangChain, having superseded the traditional methods (including LLMChain). LCEL gives us a more flexible system for building chains. The pipe operator | is used by LCEL to chain together components. Let's see how we'd construct an LLMChain using LCEL.

python
lcel_chain = prompt | llm | output_parser

We can invoke this chain in the same way as we did before:

python
result = lcel_chain.invoke("retrieval augmented generation")

The output format differs slightly, but the underlying functionality and output content are the same. As before, we can view a formatted version of this output using the Markdown display:

python
display(Markdown(result))
text
### Report on Retrieval-Augmented Generation (RAG)

#### Introduction
Retrieval-Augmented Generation (RAG) is an advanced approach in natural language
processing (NLP) that combines the strengths of information retrieval and generative
models. This technique enhances the capabilities of language models by allowing them to
access external knowledge sources, thereby improving the accuracy and relevance of
generated responses.

#### Concept Overview
RAG operates on the principle of integrating a retrieval mechanism with a generative
model. The process typically involves two main components:

1. **Retrieval Component**: This part of the system retrieves relevant documents or
pieces of information from a large corpus based on a given query. It often employs
techniques such as vector embeddings and similarity search to identify the most
pertinent data.

2. **Generative Component**: Once relevant information is retrieved, a generative model
(often based on architectures like Transformers) processes this information to produce
coherent and contextually appropriate text. The generative model can be fine-tuned to
ensure that the output is not only relevant but also stylistically aligned with the
desired output.

#### Advantages of RAG
- **Enhanced Knowledge Access**: By leveraging external databases or corpora, RAG can
provide more accurate and up-to-date information than standalone generative models,
which may be limited by their training data.
- **Improved Contextual Relevance**: The retrieval step ensures that the generated
content is closely aligned with the user's query, leading to more relevant and
context-aware responses.
- **Scalability**: RAG systems can scale to incorporate vast amounts of information,
making them suitable for applications requiring extensive knowledge bases.

#### Applications
RAG has a wide range of applications, including but not limited to:
- **Question Answering**: Providing precise answers to user queries by retrieving
relevant documents and generating responses based on that information.
- **Chatbots and Virtual Assistants**: Enhancing conversational agents with the ability
to pull in real-time information from external sources.
- **Content Creation**: Assisting in generating articles, summaries, or reports by
retrieving relevant data and synthesizing it into coherent text.

#### Challenges
Despite its advantages, RAG also faces several challenges:
- **Complexity**: The integration of retrieval and generation components can complicate
the system architecture and require careful tuning.
- **Latency**: The retrieval process can introduce delays, which may affect the
responsiveness of applications, especially in real-time scenarios.
- **Quality of Retrieved Information**: The effectiveness of RAG heavily depends on the
quality and relevance of the retrieved documents. Poor retrieval can lead to inaccurate
or misleading generated content.

#### Conclusion
Retrieval-Augmented Generation represents a significant advancement in the field of NLP,
combining the strengths of information retrieval and generative modeling. By enabling
access to external knowledge, RAG enhances the relevance and accuracy of generated text,
making it a powerful tool for various applications. As research and development in this
area continue, RAG is likely to play an increasingly important role in the evolution of
intelligent systems and applications.

How Does the Pipe Operator Work?

Before discussing other LCEL features, let's examine the pipe operator | to understand its purpose and how it works.

Functionality-wise, the pipe takes the output from the left object and feeds it as input into the right object. In the example of prompt | llm | output_parser, we see that prompt feeds into llm feeds into output_parser.

The pipe operator is a way of chaining together components, and it means that whatever the left side outputs will be fed as input into the right side.

Let's create a simple class named Runnable that transforms a provided function into a "runnable" class that we will use with the pipe | operator.

python
class Runnable:
def __init__(self, func):
self.func = func
def __or__(self, other):
def chained_func(*args, **kwargs):
return other.invoke(self.func(*args, **kwargs))
return Runnable(chained_func)
def invoke(self, *args, **kwargs):
return self.func(*args, **kwargs)

With the Runnable class, we can wrap a function into a class, allowing us to add methods to the "function" (technically, it becomes a class). Adding methods to our runnable function allows us to chain together multiple functions using the __or__ method.

First, let's create a few functions that we'll chain together:

python
def add_five(x):
return x+5

def sub_five(x):
return x-5

def mul_five(x):
return x*5

Now we wrap our functions with the Runnable:

python
add_five_runnable = Runnable(add_five)
sub_five_runnable = Runnable(sub_five)
mul_five_runnable = Runnable(mul_five)

Finally, we can chain these together using the __or__ method from the Runnable class:

python
chain = (add_five_runnable).__or__(sub_five_runnable).__or__(mul_five_runnable)

chain.invoke(3)
text
15

We can see that we can chain our functions using __or__. The pipe | operator is simply a shortcut for the __or__ method. Using the pipe, we can create the same chain:

python
chain = add_five_runnable | sub_five_runnable | mul_five_runnable

chain.invoke(3)
text
15

LCEL RunnableLambda

The RunnableLambda class is LangChain's built-in method for constructing a runnable object from a function. It does the same thing as the custom Runnable class we created earlier. Let's try it out with the same functions.

python
from langchain_core.runnables import RunnableLambda

add_five_runnable = RunnableLambda(add_five)
sub_five_runnable = RunnableLambda(sub_five)
mul_five_runnable = RunnableLambda(mul_five)

We chain these together again with the pipe | operator:

python
chain = add_five_runnable | sub_five_runnable | mul_five_runnable

And call them using the invoke method:

python
chain.invoke(3)
text
15

Let's try something a little more complex. This time, we will generate and edit our report using an LCEL chain.

python
prompt_str = "give me a small report about {topic}"
prompt = PromptTemplate(
input_variables=["topic"],
template=prompt_str
)
python
chain = prompt | llm | output_parser
python
result = chain.invoke("AI")
display(Markdown(result))
text
### Report on Artificial Intelligence (AI)

#### Introduction
Artificial Intelligence (AI) refers to the simulation of human intelligence processes by
machines, particularly computer systems. These processes include learning (the
acquisition of information and rules for using it), reasoning (using rules to reach
approximate or definite conclusions), and self-correction. AI has become a
transformative technology across various sectors, influencing how we work, communicate,
and solve problems.

#### Types of AI
AI can be categorized into two main types:

1. **Narrow AI (Weak AI)**: This type of AI is designed and trained for a specific task.
Examples include virtual assistants like Siri and Alexa, recommendation systems on
streaming platforms, and image recognition software.

2. **General AI (Strong AI)**: This is a theoretical form of AI that possesses the
ability to understand, learn, and apply intelligence across a wide range of tasks,
similar to a human being. As of now, General AI remains largely conceptual and has not
yet been realized.

#### Applications of AI
AI technologies are being applied in numerous fields, including:

- **Healthcare**: AI is used for diagnostic purposes, personalized medicine, and
predictive analytics to improve patient outcomes.
- **Finance**: Algorithms analyze market trends, detect fraud, and automate trading
processes.
- **Transportation**: AI powers autonomous vehicles, optimizing routes and improving
safety.
- **Customer Service**: Chatbots and virtual assistants enhance customer interactions
and streamline support services.
- **Manufacturing**: AI-driven robots and predictive maintenance systems increase
efficiency and reduce downtime.

#### Challenges and Ethical Considerations
Despite its potential, AI poses several challenges:

- **Bias and Fairness**: AI systems can perpetuate existing biases present in training
data, leading to unfair outcomes.
- **Privacy Concerns**: The use of AI in data collection raises significant privacy
issues, particularly regarding personal information.
- **Job Displacement**: Automation driven by AI may lead to job losses in certain
sectors, necessitating workforce retraining and adaptation.
- **Security Risks**: AI can be exploited for malicious purposes, including cyberattacks
and misinformation campaigns.

#### Future Outlook
The future of AI is promising, with ongoing advancements in machine learning, natural
language processing, and robotics. As AI continues to evolve, it is expected to play an
increasingly integral role in society, driving innovation and efficiency. However, it is
crucial to address ethical concerns and ensure that AI development is guided by
principles of fairness, accountability, and transparency.

#### Conclusion
AI is a powerful tool that has the potential to revolutionize various aspects of life
and industry. As we move forward, it is essential to harness its capabilities
responsibly, ensuring that its benefits are widely shared while mitigating risks and
challenges.

---

This report provides a brief overview of AI, its applications, challenges, and future
prospects. For a more in-depth analysis, further research and exploration of specific AI
technologies and their implications would be beneficial.

Here, we define two functions: extract_fact will remove the first paragraph (typically the introduction), and replace_word will replace the substring "RAG" with "hot potato".

python
def extract_fact(x):
if "\n\n" in x:
return "\n".join(x.split("\n\n")[1:])
else:
return x

old_word = "RAG"
new_word = "hot potato"

def replace_word(x):
return x.replace(old_word, new_word)

Let's wrap these functions and see what the output is!

python
extract_fact_runnable = RunnableLambda(extract_fact)
replace_word_runnable = RunnableLambda(replace_word)
python
chain = prompt | llm | output_parser | extract_fact_runnable | replace_word_runnable
python
result = chain.invoke("retrieval augmented generation")
display(Markdown(result))
text
#### Introduction
Retrieval-Augmented Generation (hot potato) is an innovative approach that combines the
strengths of information retrieval and natural language generation. This method enhances
the capabilities of language models by allowing them to access external knowledge
sources, thereby improving the accuracy and relevance of generated responses.
#### Concept Overview
hot potato operates on the principle of integrating a retrieval mechanism with a
generative model. The process typically involves two main components:
1. **Retrieval Component**: This part of the system searches a large corpus of documents
or knowledge bases to find relevant information based on a given query. It uses
techniques such as vector embeddings and similarity search to identify the most
pertinent documents.
2. **Generation Component**: Once relevant documents are retrieved, the generative model
(often based on architectures like Transformers) processes this information to produce
coherent and contextually appropriate text. The model can leverage the retrieved content
to enhance its responses, making them more informative and grounded in factual data.
#### Advantages
- **Improved Accuracy**: By accessing up-to-date and domain-specific information, hot
potato can provide more accurate answers compared to traditional generative models that
rely solely on pre-existing training data.
- **Contextual Relevance**: The retrieval mechanism allows the model to tailor its
responses based on the specific context of the query, leading to more relevant and
useful outputs.
- **Scalability**: hot potato systems can scale to incorporate vast amounts of external
knowledge, making them suitable for applications requiring extensive information
retrieval.
#### Applications
hot potato has a wide range of applications, including:
- **Question Answering**: Enhancing systems that need to provide precise answers to user
queries by retrieving relevant documents.
- **Chatbots and Virtual Assistants**: Improving conversational agents by allowing them
to pull in real-time information from external sources.
- **Content Creation**: Assisting writers and content creators by providing relevant
data and insights during the writing process.
#### Challenges
Despite its advantages, hot potato also faces several challenges:
- **Latency**: The retrieval process can introduce delays, which may affect the
responsiveness of applications.
- **Quality of Retrieved Information**: The effectiveness of hot potato heavily depends
on the quality and relevance of the retrieved documents. Poor retrieval can lead to
inaccurate or misleading outputs.
- **Complexity**: Implementing a hot potato system requires careful integration of
retrieval and generation components, which can complicate the development process.
#### Conclusion
Retrieval-Augmented Generation represents a significant advancement in the field of
natural language processing. By effectively combining retrieval and generation, hot
potato systems can produce more accurate, relevant, and context-aware responses. As
research and development in this area continue to evolve, hot potato is poised to play a
crucial role in enhancing various applications, from customer support to content
generation. Future work will likely focus on improving retrieval efficiency, ensuring
the quality of information, and minimizing latency to create even more robust systems.

Those are our RunnableLambda functions. It's worth noting that our input to these functions must be a SINGLE argument. Where we have a function that accepts multiple arguments, we would provide a dictionary with key-value pairs and unpack them inside our function.

LCEL RunnableParallel and RunnablePassthrough

LCEL provides us with various Runnable classes that allow us to control the flow of data and execution order through our chains. Two of these are RunnableParallel and RunnablePassthrough.

  • RunnableParallel allows us to run multiple Runnable instances in parallel, acting almost like a Y-fork in the chain.

  • RunnablePassthrough — allows us to pass through a variable to the next Runnable without modification.

To see these runnable in action, we will create two data sources. Each source provides specific information, but to answer the question, we will need both to be fed to the LLM.

python
from langchain_openai import OpenAIEmbeddings
from langchain.vectorstores import DocArrayInMemorySearch

embedding = OpenAIEmbeddings()

vecstore_a = DocArrayInMemorySearch.from_texts(
["half the info is here", "James' birthday is December the 7th"],
embedding=embedding
)
vecstore_b = DocArrayInMemorySearch.from_texts(
["the other half of the info is here", "James was born in 1994"],
embedding=embedding
)

Here, you can see the prompt has three inputs: two for context ({context_a} and {context_b}) and one for the question itself ({question}).

python
prompt_str = """Using the context provided, answer the user's question.
Context:
{context_a}
{context_b}

Question:
{question}

Answer: """
python
from langchain.prompts import (
ChatPromptTemplate,
SystemMessagePromptTemplate,
HumanMessagePromptTemplate,
)

prompt = ChatPromptTemplate.from_messages([
SystemMessagePromptTemplate.from_template(prompt_str),
HumanMessagePromptTemplate.from_template("{question}")
])

Next, we must create retriever objects from our vector stores, which we can use in our chain to retrieve information.

python
from langchain_core.runnables import RunnablePassthrough, RunnableParallel

retriever_a = vecstore_a.as_retriever()
retriever_b = vecstore_b.as_retriever()

retrieval = RunnableParallel(
{
"context_a": retriever_a, "context_b": retriever_b,
"question": RunnablePassthrough()
}
)

The chain we'll be constructing will look something like this:

python
chain = retrieval | prompt | llm | output_parser

We invoke it as usual.

python
result = chain.invoke("What was the date when James was born")
result
text
'James was born on December 07, 1994.'

With that, we've implemented both the RunnableParallel and RunnablePassthrough classes to control the flow of data and execution order in our LCEL chains.


This chapter concludes on LangChain Expression Language (LCEL). We've covered the essentials, such as the unique syntax of the pipe operator (|) and how it works. Then, we built a few LCEL chains using the RunnableParallel and RunnablePassthrough classes.

It's important to remember that LCEL is a fundamental component when using the core LangChain package. Naturally, that means we also see LCEL when working with other LangChain-ecosystem libraries that rely on LangChain — such as LangGraph.

As we've seen, it is possible to use a more traditional object-oriented approach to building chains and agents. Still, these alternative methods often lack the level of support from which their LCEL equivalents benefit. We recommend sticking with LCEL and taking the time to understand how it works when working within the LangChain ecosystem.