Building RAG from Day One: Evolving from RAG to RAG-as-a-Tool with Function Calling (Part 2)

To move towards an agentic chatbot, we need a fresh approach to harnessing LLM capabilities alongside our documents. This is where function-calling comes to the rescue.

Oct 26, 2024

RAG is indeed a powerful method for integrating our knowledge base or documents into an LLM, but it does come with certain limitations. Today, I’m excited to share a new LLM design that leverages advanced prompt engineering to enhance our chatbot's capabilities. These improvements include a more natural conversational flow, smarter semantic routing, and access to a wider range of document types through function-calling.

Retrieving documents for every user question can lead to higher costs and added latency, especially when retrieval isn’t always necessary.
This approach can also limit the chatbot’s flexibility, making it less adaptable for general questions that don’t rely on documentation.

Function calling

To deepen your understanding, you can check out this informative blog: Function Calling in OpenAI's Guide. It covers key concepts and practical examples that will help illustrate how function-calling can enhance chatbot capabilities.

Enhancing Our Application with Function Calling

Function calling adds versatility to our chatbot, but let’s focus on how it can specifically enhance our RAG system. I like to call this approach 'RAG-as-a-tool.' Below is a simple code snippet that demonstrates the basic behavior of RAG when integrated with function calling.

import numpy as np
import openai
from google.colab import userdata

# Load API key
key = userdata.get('OPENAI_API_KEY')
openai.api_key = key

# Set up OpenAI API client
client = openai.OpenAI(api_key=key)

# Cosine similarity function
def cosine_similarity(a, b):
    return np.dot(a, b) / (np.linalg.norm(a) * np.linalg.norm(b))

# Embedding function
def get_embedding(input):
    return client.embeddings.create(
        input=input,
        model="text-embedding-3-small"
    ).data[0].embedding

# RAG function
def rag(user_input, database):
    # Step 1: Vectorize user_input
    input_embedding = get_embedding(user_input)

    # Step 2: Calculate similarity with each document in the database
    similarities = []
    for doc in database:
        doc_embedding = get_embedding(doc)
        similarity = cosine_similarity(input_embedding, doc_embedding)
        similarities.append((doc, similarity))

    # Step 3: Sort documents by similarity and retrieve top 3
    top_3_docs = sorted(similarities, key=lambda x: x[1], reverse=True)[:3]
    context = "\n".join([doc for doc, _ in top_3_docs])
    print("Context:", context)

    # Generate an answer using OpenAI API
    response = client.chat.completions.create(
        model="gpt-4o-mini",
        messages=[
            {
                "role": "system",
                "content": f"Answer the user's question based on the context below:\n{context}"
            },
            {
                "role": "user",
                "content": user_input
            }
        ],
        temperature=1,
        max_tokens=2048,
        top_p=1,
        frequency_penalty=0,
        presence_penalty=0
    )

    # Step 4: Return the generated response
    return response.choices[0].message.content

docs = [
    "Soccer helps improve cardiovascular health and increases endurance.",
    "Soccer is the most popular sport in the world.",
    "Playing soccer helps reduce stress and improves mental well-being.",
    "Soccer can help you connect with more people.",
    "Soccer is not only a sport but also a form of entertainment."
]
query = "What are the health benefits of playing soccer?"

# Call the RAG function
response_text = rag(query, docs)
print("Generated response:", response_text)

>>> Generated response: Soccer offers many health benefits, including:

1. **Improved Cardiovascular Health**: Playing soccer strengthens the cardiovascular system and improves blood circulation.
2. **Increased Endurance**: Continuous activity during a game helps develop endurance and overall fitness.
3. **Stress Reduction**: Participating in soccer reduces stress and enhances mental well-being, providing a sense of relaxation.
4. **Social Connection**: Soccer provides opportunities to connect and interact with others, enhancing social skills.

In summary, soccer is beneficial not only for physical health but also for mental wellness and community connection.

While the example below works well within a RAG setup, let’s take it a step further by incorporating more structured data for improved precision and flexibility.

docs = [
    "Product name: iPhone 13, Brand: Apple, Price: 20000000, Category: smartphone",
    "Product name: Galaxy S21, Brand: Samsung, Price: 18000000, Category: smartphone",
    "Product name: Xiaomi Mi 11, Brand: Xiaomi, Price: 15000000, Category: smartphone",
    "Product name: Oppo Find X3 Pro, Brand: Oppo, Price: 22000000, Category: smartphone",
    "Product name: Vivo X60 Pro, Brand: Vivo, Price: 17000000, Category: smartphone",
    "Product name: Sony Xperia 5 II, Brand: Sony, Price: 21000000, Category: smartphone",
    "Product name: Google Pixel 5, Brand: Google, Price: 16000000, Category: smartphone",
    "Product name: OnePlus 9, Brand: OnePlus, Price: 19000000, Category: smartphone",
    "Product name: Huawei P40 Pro, Brand: Huawei, Price: 20000000, Category: smartphone",
    "Product name: Realme GT, Brand: Realme, Price: 13000000, Category: smartphone",
    "Product name: Asus ROG Phone 5, Brand: Asus, Price: 25000000, Category: smartphone",
    "Product name: Nokia 8.3 5G, Brand: Nokia, Price: 14000000, Category: smartphone"
]

query = "Are there any phones over 20 million?"

# Call the RAG function
response_text = rag(query, docs)
print("Generated response:", response_text)

>>> Generated response: Yes, in the list provided, there are 2 phones priced over 20 million VND:

1. Oppo Find X3 Pro - Price: 22,000,000 VND
2. Asus ROG Phone 5 - Price: 25,000,000 VND

From the response above, we can see that the LLM might not cover all products due to the limitations of top-k retrieval (missing Product name: Sony Xperia 5 II, Brand: Sony, Price: 21000000) . Additionally, some data patterns reveal a structured format that could benefit from traditional database usage. So, can we guide our LLM to interact with structured data similarly to how we would? The answer is yes—we can build a tool tailored for this purpose.

To keep things simple, I’ll create a basic search_database tool that uses name and price parameters, along with a function to execute the search.

tools = [
    {
        "type": "function",
        "function": {
            "name": "search_database",
            "description": "Use this to search for phone information",
            "parameters": {
                "type": "object",
                "properties": {
                    "name": {
                        "type": "string",
                        "description": "name of the product",
                    },
                    "price": {
                        "type": "string",
                        "description": "price of the product",
                    },
                },
                "additionalProperties": False,
            },
        }
    }
]

def execute_database(name, price):
  # --- In reality, we have to implement this function
  # Based on the parameters we can execute our function
  # For example: SELECT * FROM our_database WHERE name LIKE %name%
  # For example: SELECT * FROM our_database WHERE price >= price
  # --- 
  # Fake return, in reality we actually run the SQL to get the data in our db
  return "Product name: iPhone 13, Brand: Apple, Price: 20,000,000 VND, Category: smartphone"

Finally, we obtain a result that looks like this:

# Simulate the tool call response
response = {
    "choices": [
        {
            "message": {
                "role": "assistant",
                "tool_calls": [
                    {
                        "id": "call_62136354",
                        "type": "function",
                        "function": {
                            "arguments": "{\"price\":\"20000000\"}",
                            "name": "search_database"
                        }
                    }
                ]
            }
        }
    ]
}

# Create a message containing the result of the function call
function_call_result_message = {
    "role": "tool",
    "content": "Product name: iPhone 13, Brand: Apple, Price: 20,000,000 VND, Category: smartphone",
    "tool_call_id": response['choices'][0]['message']['tool_calls'][0]['id']
}

messages = [
    {"role": "system", "content": "You are a helpful customer support assistant. Use the supplied tools to assist the user"},
    {"role": "user", "content": "Are there any phones over 20 million?"},
    response['choices'][0]['message'],
    function_call_result_message 
]

response = openai.chat.completions.create(
    model="gpt-4o-mini",
    messages=messages,
    tools=tools,
)
response

>>> A phone priced over 20 million VND is the iPhone 13 by Apple, with a price of 20 million VND.

Returning to our initial RAG example, we can now treat our document as a tool, functioning like this:

tools = [
    {
        "type": "function",
        "function": {
            "name": "search_documents",
            "description": "Use this to search for documents",
            "parameters": {
                "type": "object",
                "properties": {
                    "query": {
                        "type": "string",
                        "description": "User intention throughout the conversation",
                    }
                },
                "additionalProperties": False,
            },
        }
    }
]

This approach brings several benefits:

Enhanced Chitchat Handling: The chatbot can now respond to casual questions or those where the LLM deems retrieval unnecessary, thanks to your tool’s description.
Condensed Query: Have you heard of this technique? Originating from LangChain’s early RAG implementations, it involves summarizing the chatbot and user history into a single query to improve retrieval accuracy. With function calling, you can pass the 'query' parameter with a condensed intent if specified in the prompt.
Improved Document Retrieval: Retrieval is injected on the most recent conversation context, which may reduce the likelihood of hallucination.

Final Thought

Function-calling is a versatile approach for designing an LLM that best aligns with your specific data and product needs. It’s valuable for a wide range of use cases, including:

Enabling assistants to fetch data: an AI assistant needs to fetch the latest customer data from an internal system when a user asks “what are my recent orders?” before it can generate the response to the user
Enabling assistants to take actions: an AI assistant needs to schedule meetings based on user preferences and calendar availability.
Enabling assistants to perform computation: a math tutor assistant needs to perform a math computation.
Building rich workflows: a data extraction pipeline that fetches raw text, then converts it to structured data and saves it in a database.
Modifying your applications' UI: you can use function calls that update the UI based on user input, for example, rendering a pin on a map.

If you're familiar with LlamaIndex, they offer a separate setup for their Query Engine—similar to our basic RAG in Part 1 (reference here)—and a Chat Engine, which aligns with our current approach (reference here).

What's Next

We’re on the brink of developing a truly agentic chatbot—one capable of automating its own interactions. Stay tuned, and be sure to subscribe to stay updated on these exciting advancements. See you in the next part!

Hieu’s Substack

Discussion about this post