RAG: Building blocks (Part - 1)

Introduction

LLM/Inference

LLMs are the "brain" of RAG systems, responsible for interpreting input queries, retrieving relevant knowledge, and generating coherent, context-aware responses. Key roles of LLM in a RA system involve Natural Language Understanding, Knowledge Synthesis and Contextual Adaptation.

Below are few of the most popular LLMs currently available.

GPT (Generative Pre-trained Transformer): This a model developed by OpenAI. GPT is a model with advanced natural language generation capabilities. It has extensive fine-tuning options for specific domains. GPT is widely adopted for general-purpose and domain-specific applications.

Use Cases in RAG: Chatbots, content generation, and document summarization.

LLaMA (Large Language Model Meta AI): LLaMA is an open source model developed by Meta AI (Facebook). It is lightweight and optimized for research and deployment. Meta focused on efficiency with smaller parameter sizes compared to GPT when developing LLaMA. Due to it’s Open-source licensing, the model is widely available, enabling extensive customization.

Use Cases in RAG: Academic research, question-answering systems, and specialized retrieval tasks.

Claude: Clause was Developed by Anthropic with focus on safety and interpretability. Claude has robust guardrails for reducing harmful outputs. The model was designed for human-centric applications with ethical considerations. This leads to more descriptive and human like responses from the model.

Use Cases in RAG: Enterprise-grade customer support, legal documentation analysis, and compliance tasks.

Prompt Engineering

Prompt Engineering the process of designing and optimizing input queries to guide LLMs toward generating accurate and relevant outputs. Prompts serve as the bridge between user intent and model-generated responses. The prompt dictates the behavior of the LLM and thus the resulting response. There are several flavors of prompt engineering.

Clear Context Definition: Ensures that the retrieval and generation steps align with user expectations. Reduces ambiguity, improving the relevance of both retrieved information and the final output. Ex:- Explain the key findings of the quarterly sales report, focusing on trends in revenue growth and customer retention.

Few-Shot Prompting: Provide the model with examples within the prompt to improve accuracy. Ex- : Translate these phrases to German: 1. Hello. 2. How are you? 3. Good morning.

Chain-of-Thought Prompting: Guide the model to reason through problems step by step. Ex - A classroom has two blue chairs for every three red chairs. If there are a total of 30 chairs in the classroom, how many blue chairs are there? Describe your reasoning step by step

Instruction Tuning: This is the most common type of prompt that is used in a QnA type of RAG system. This Explicitly instructs the model on how to behave. Ex- You are a financial advisor. Respond concisely to client queries about investments.

Text Embedding

A computer or LLM model operates on numbers, but most of the human interaction involved natural language consisting of words and sentences. To bring the gap, we use text embedding which is a method of converting text into numerical vectors that represent the text's meaning. The text is represented by a series of numbers known as vectors, which are multi-dimensional tensors.

Similar words have similar vector representations, enabling semantic similarity searches. A vector representations capture relationships (e.g., "king - man + woman ≈ queen"). Text embedding is foundational to tasks like clustering, semantic search, and machine translation.

An embedding model transforms text queries and documents into vector embeddings for efficient similarity searches. This enables meaningful retrieval of contextually relevant data from vector databases.