Simple RAG Application Using CrewAI and OpenAI API

A support engineer drops a 200-page internal PDF into a shared drive. Within days, teammates are asking the same question in Slack: “Where exactly does the deployment checklist live?” The document exists. The knowledge is there. But searching, interpreting, and summarizing it repeatedly drains engineering time.

This is the practical motivation behind efforts to create a simple RAG application using CrewAI: connect large language models to internal documentation so answers reflect organizational context instead of generic training data. Retrieval-augmented generation (RAG) has moved from research prototypes into production copilots, internal search assistants, and knowledge bots. At the same time, orchestration frameworks such as CrewAI, LangChain, and AutoGen have matured, offering different abstractions for managing LLM workflows.

Product capabilities, pricing, and availability are subject to change. Evaluating tools through free tiers, trials, or sandbox environments is advisable before making commitments.

This article explains how RAG works mechanically, where CrewAI fits in the architecture, and includes a complete working example with source code. It also outlines trade-offs compared to LangChain and AutoGen so engineering teams can reason about architectural choices rather than framework branding.

How a Simple RAG Application Using CrewAI Works

At its core, retrieval-augmented generation (RAG) splits the problem of answering questions into two stages:

Retrieve relevant context from a knowledge base.
Generate an answer grounded strictly in that context.

Instead of relying on what the model “remembers” from pretraining, RAG injects fresh, domain-specific material at runtime.

The Retrieval Pipeline

A typical RAG pipeline includes:

Document ingestion (PDF, Markdown, text files)
Chunking into smaller segments
Embedding generation (vector representations of text)
Vector storage (FAISS, Pinecone, Weaviate, etc.)
Similarity search at query time

When a user asks a question, the system embeds the query and retrieves the most semantically similar chunks. Those chunks are then inserted into the LLM prompt.

Chunking strategy often influences answer quality more than the model choice itself. Oversized chunks dilute relevance; extremely small chunks lose context continuity.

Where CrewAI Fits Architecturally

CrewAI introduces agents and tasks as orchestration primitives. Instead of a single monolithic RAG chain, engineers define roles such as:

A retrieval-focused agent
A synthesis agent
Optionally, a validation or planning agent

For a simple implementation, only one answer-focused agent may be necessary, while retrieval logic remains external. As workflows expand—adding tool-calling or structured outputs—CrewAI’s separation of responsibilities becomes more valuable.

The practical distinction is this: CrewAI structures who does what in an LLM workflow. It does not replace embeddings, vector stores, or model providers.

Complete Example: Create a Simple RAG Application Using CrewAI

Below is a minimal working example. It demonstrates:

Document loading
Chunking
Embedding creation
FAISS vector similarity search
Context injection
CrewAI agent orchestration

This version uses OpenAI-compatible models for simplicity. The same pattern applies to Azure-hosted or open-weight deployments.

Project Structure

rag-crewai/
│
├── main.py
├── requirements.txt
└── docs/
    └── handbook.txt

requirements.txt

crewai
langchain
langchain-openai
faiss-cpu
tiktoken
python-dotenv

Install dependencies:

pip install -r requirements.txt

Sample Internal Document

Create a file at docs/handbook.txt:

Deployment Checklist:

1. Ensure Docker image is built with correct tag.
2. Run database migrations before deployment.
3. Verify environment variables in production.
4. Confirm health check endpoint returns 200.
5. Monitor logs for 15 minutes after release.

Incident Response:

- Escalate P1 incidents immediately.
- Notify Slack channel #oncall.
- Create Jira ticket within 30 minutes.

main.py (Full Working Code)

import os
from dotenv import load_dotenv

from crewai import Agent, Task, Crew
from langchain_openai import OpenAIEmbeddings, ChatOpenAI
from langchain.vectorstores import FAISS
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain.document_loaders import TextLoader

# --------------------------------------------------
# Load environment variables
# --------------------------------------------------
load_dotenv()

# OPENAI_API_KEY=your_key_here

# --------------------------------------------------
# 1. Load and Chunk Documents
# --------------------------------------------------
loader = TextLoader("docs/handbook.txt")
documents = loader.load()

text_splitter = RecursiveCharacterTextSplitter(
    chunk_size=500,
    chunk_overlap=50
)

docs = text_splitter.split_documents(documents)

# --------------------------------------------------
# 2. Create Vector Store
# --------------------------------------------------
embeddings = OpenAIEmbeddings()

vectorstore = FAISS.from_documents(docs, embeddings)

retriever = vectorstore.as_retriever(search_kwargs={"k": 3})

# --------------------------------------------------
# 3. Initialize LLM
# --------------------------------------------------
llm = ChatOpenAI(
    model="gpt-4o-mini",
    temperature=0
)

# --------------------------------------------------
# 4. Define Agents
# --------------------------------------------------
answer_agent = Agent(
    role="Technical Support Assistant",
    goal="Provide accurate and concise answers using retrieved context",
    backstory="You answer questions strictly based on provided documentation.",
    llm=llm,
    verbose=True
)

# --------------------------------------------------
# 5. Retrieval Function
# --------------------------------------------------
def retrieve_context(query: str):
    results = retriever.get_relevant_documents(query)
    return "\n\n".join([doc.page_content for doc in results])

# --------------------------------------------------
# 6. Query Execution
# --------------------------------------------------
user_query = input("Ask a question about internal documentation: ")
context = retrieve_context(user_query)

task = Task(
    description=f"""
You are given internal documentation below:

{context}

Using only this documentation, answer the following question:

Question: {user_query}

If the answer is not found in the documentation, say:
"The documentation does not contain this information."
""",
    agent=answer_agent
)

crew = Crew(
    agents=[answer_agent],
    tasks=[task],
    verbose=True
)

result = crew.kickoff()

print("\nFinal Answer:\n")
print(result)

Example Run

Input:

What should be done before deployment?

Output:

Before deployment, ensure the Docker image is built with the correct tag,
run database migrations, verify environment variables, confirm the health
check endpoint returns 200, and monitor logs for 15 minutes after release.

This demonstrates a functioning simple RAG application using CrewAI. The LLM response is grounded strictly in retrieved document content.

Practical Applications in Real Engineering Environments

A minimal example like this scales surprisingly well for internal use cases.

Internal Documentation Assistants

Teams of 10–100 engineers frequently index:

Architecture decision records
Deployment playbooks
Runbooks
Security procedures

CrewAI allows retrieval and response synthesis to remain logically separate, making it easier to add guardrails later.

Regulated Environments

Organizations operating in regulated industries may introduce an additional agent that verifies retrieved passages against compliance constraints before response synthesis. This increases token usage but improves traceability.

Startups Building AI Copilots

Early-stage teams often begin with direct LangChain chains. As workflows expand to include tool calls or multi-step reasoning, CrewAI’s agent abstraction can reduce coupling and clarify responsibility boundaries.

Key Trade-Offs and Framework Comparison

Architectural differences matter more than marketing narratives. Below is a structural comparison.

Feature	CrewAI	LangChain	AutoGen
Core Abstraction	Role-based agents & tasks	Chains, retrievers, tools	Conversational multi-agents
RAG Pattern	Agent-coordinated retrieval & synthesis	Prebuilt retriever chains	Chat-based tool invocation
Ecosystem Breadth	Growing	Broad integration ecosystem	Expanding enterprise focus
Orchestration Style	Explicit task delegation	Composable pipelines	Agent-to-agent dialogue
Complexity Overhead	Moderate	Moderate to high	Moderate

LangChain provides wide integration support and utilities. CrewAI emphasizes clarity of workflow ownership. AutoGen leans into conversational coordination between agents.

Latency, cost control, and document quality often influence outcomes more than framework choice.

Common Mistakes When Building RAG Systems

Three issues appear repeatedly:

Overestimating model intelligence. RAG improves grounding but cannot compensate for outdated or poorly structured documents.
Neglecting chunk strategy. Retrieval quality often hinges on chunk size and overlap.
Over-orchestrating prematurely. Multi-agent setups add clarity at scale but increase debugging complexity.

The simplest functional system is often the best starting point.

Decision Framework

When a Simple RAG Application Using CrewAI Makes Sense

Multi-step internal copilots
Workflows requiring explicit role separation
Systems expected to evolve into tool-integrated agents

When Alternatives May Be Preferable

Minimal document Q&A bots
Ultra-low-latency requirements
Teams already heavily invested in existing LangChain chains

The decision is architectural rather than ideological.

FAQ: Create a Simple RAG Application Using CrewAI

Q: What is required to create a simple RAG application using CrewAI?
A: A document source, embedding model, vector database, LLM, and CrewAI agents to orchestrate retrieval and generation.

Q: Does CrewAI handle embeddings and vector storage?
A: No. CrewAI coordinates agents and tasks. Embeddings and vector databases are provided by separate libraries or services.

Q: Can this example scale to production?
A: Yes, provided the underlying vector store, model hosting, and monitoring infrastructure are production-grade.

Q: How does CrewAI compare to LangChain for RAG?
A: CrewAI emphasizes role-based agents and task delegation. LangChain centers on chains and tool integrations. Both support RAG workflows.

Q: Is multi-agent RAG more accurate?
A: It can improve structure and validation, but accuracy still depends heavily on document quality and retrieval strategy.

Closing Thoughts

Efforts to create a simple RAG application using CrewAI reflect a broader shift toward structured, context-aware AI systems. Retrieval ensures domain relevance. Agent orchestration clarifies workflow boundaries. The trade-offs revolve around complexity, latency, and maintainability.

The example above demonstrates that a working RAG system can be built in under a hundred lines of code. What determines success in production is less about framework choice and more about data discipline, evaluation strategy, and architectural clarity.

Editorial Note: This article is based on publicly available industry research, official documentation, and general informational sources. Content is reviewed and updated periodically to reflect changes in products, specifications, pricing models, and market practices.

Woopsly

I am a writer, blogger and maker! I am passionate about technology and new trends in the market.

Simple RAG Application Using CrewAI and OpenAI API