Simple RAG Application Using CrewAI and OpenAI API
A support engineer drops a 200-page internal PDF into a shared drive. Within days, teammates are asking the same question in Slack: “Where exactly does the deployment checklist live?” The document exists. The knowledge is there. But searching, interpreting, and summarizing it repeatedly drains engineering time.
This is the practical motivation behind efforts to create a simple RAG application using CrewAI: connect large language models to internal documentation so answers reflect organizational context instead of generic training data. Retrieval-augmented generation (RAG) has moved from research prototypes into production copilots, internal search assistants, and knowledge bots. At the same time, orchestration frameworks such as CrewAI, LangChain, and AutoGen have matured, offering different abstractions for managing LLM workflows.
Product capabilities, pricing, and availability are subject to change. Evaluating tools through free tiers, trials, or sandbox environments is advisable before making commitments.
This article explains how RAG works mechanically, where CrewAI fits in the architecture, and includes a complete working example with source code. It also outlines trade-offs compared to LangChain and AutoGen so engineering teams can reason about architectural choices rather than framework branding.
How a Simple RAG Application Using CrewAI Works
At its core, retrieval-augmented generation (RAG) splits the problem of answering questions into two stages:
- Retrieve relevant context from a knowledge base.
- Generate an answer grounded strictly in that context.
Instead of relying on what the model “remembers” from pretraining, RAG injects fresh, domain-specific material at runtime.
The Retrieval Pipeline
A typical RAG pipeline includes:
- Document ingestion (PDF, Markdown, text files)
- Chunking into smaller segments
- Embedding generation (vector representations of text)
- Vector storage (FAISS, Pinecone, Weaviate, etc.)
- Similarity search at query time
When a user asks a question, the system embeds the query and retrieves the most semantically similar chunks. Those chunks are then inserted into the LLM prompt.
Chunking strategy often influences answer quality more than the model choice itself. Oversized chunks dilute relevance; extremely small chunks lose context continuity.
Where CrewAI Fits Architecturally
CrewAI introduces agents and tasks as orchestration primitives. Instead of a single monolithic RAG chain, engineers define roles such as:
- A retrieval-focused agent
- A synthesis agent
- Optionally, a validation or planning agent
For a simple implementation, only one answer-focused agent may be necessary, while retrieval logic remains external. As workflows expand—adding tool-calling or structured outputs—CrewAI’s separation of responsibilities becomes more valuable.
The practical distinction is this: CrewAI structures who does what in an LLM workflow. It does not replace embeddings, vector stores, or model providers.
Complete Example: Create a Simple RAG Application Using CrewAI
Below is a minimal working example. It demonstrates:
- Document loading
- Chunking
- Embedding creation
- FAISS vector similarity search
- Context injection
- CrewAI agent orchestration
This version uses OpenAI-compatible models for simplicity. The same pattern applies to Azure-hosted or open-weight deployments.
Project Structure
rag-crewai/
│
├── main.py
├── requirements.txt
└── docs/
└── handbook.txt
requirements.txt
crewai
langchain
langchain-openai
faiss-cpu
tiktoken
python-dotenv
Install dependencies:
pip install -r requirements.txt
Sample Internal Document
Create a file at docs/handbook.txt:
Deployment Checklist:
1. Ensure Docker image is built with correct tag.
2. Run database migrations before deployment.
3. Verify environment variables in production.
4. Confirm health check endpoint returns 200.
5. Monitor logs for 15 minutes after release.
Incident Response:
- Escalate P1 incidents immediately.
- Notify Slack channel #oncall.
- Create Jira ticket within 30 minutes.
main.py (Full Working Code)
import os
from dotenv import load_dotenv
from crewai import Agent, Task, Crew
from langchain_openai import OpenAIEmbeddings, ChatOpenAI
from langchain.vectorstores import FAISS
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain.document_loaders import TextLoader
# --------------------------------------------------
# Load environment variables
# --------------------------------------------------
load_dotenv()
# OPENAI_API_KEY=your_key_here
# --------------------------------------------------
# 1. Load and Chunk Documents
# --------------------------------------------------
loader = TextLoader("docs/handbook.txt")
documents = loader.load()
text_splitter = RecursiveCharacterTextSplitter(
chunk_size=500,
chunk_overlap=50
)
docs = text_splitter.split_documents(documents)
# --------------------------------------------------
# 2. Create Vector Store
# --------------------------------------------------
embeddings = OpenAIEmbeddings()
vectorstore = FAISS.from_documents(docs, embeddings)
retriever = vectorstore.as_retriever(search_kwargs={"k": 3})
# --------------------------------------------------
# 3. Initialize LLM
# --------------------------------------------------
llm = ChatOpenAI(
model="gpt-4o-mini",
temperature=0
)
# --------------------------------------------------
# 4. Define Agents
# --------------------------------------------------
answer_agent = Agent(
role="Technical Support Assistant",
goal="Provide accurate and concise answers using retrieved context",
backstory="You answer questions strictly based on provided documentation.",
llm=llm,
verbose=True
)
# --------------------------------------------------
# 5. Retrieval Function
# --------------------------------------------------
def retrieve_context(query: str):
results = retriever.get_relevant_documents(query)
return "\n\n".join([doc.page_content for doc in results])
# --------------------------------------------------
# 6. Query Execution
# --------------------------------------------------
user_query = input("Ask a question about internal documentation: ")
context = retrieve_context(user_query)
task = Task(
description=f"""
You are given internal documentation below:
{context}
Using only this documentation, answer the following question:
Question: {user_query}
If the answer is not found in the documentation, say:
"The documentation does not contain this information."
""",
agent=answer_agent
)
crew = Crew(
agents=[answer_agent],
tasks=[task],
verbose=True
)
result = crew.kickoff()
print("\nFinal Answer:\n")
print(result)
Example Run
Input:
What should be done before deployment?
Output:
Before deployment, ensure the Docker image is built with the correct tag,
run database migrations, verify environment variables, confirm the health
check endpoint returns 200, and monitor logs for 15 minutes after release.
This demonstrates a functioning simple RAG application using CrewAI. The LLM response is grounded strictly in retrieved document content.
Practical Applications in Real Engineering Environments
A minimal example like this scales surprisingly well for internal use cases.
Internal Documentation Assistants
Teams of 10–100 engineers frequently index:
- Architecture decision records
- Deployment playbooks
- Runbooks
- Security procedures
CrewAI allows retrieval and response synthesis to remain logically separate, making it easier to add guardrails later.
Regulated Environments
Organizations operating in regulated industries may introduce an additional agent that verifies retrieved passages against compliance constraints before response synthesis. This increases token usage but improves traceability.
Startups Building AI Copilots
Early-stage teams often begin with direct LangChain chains. As workflows expand to include tool calls or multi-step reasoning, CrewAI’s agent abstraction can reduce coupling and clarify responsibility boundaries.
Key Trade-Offs and Framework Comparison
Architectural differences matter more than marketing narratives. Below is a structural comparison.
| Feature | CrewAI | LangChain | AutoGen |
|---|---|---|---|
| Core Abstraction | Role-based agents & tasks | Chains, retrievers, tools | Conversational multi-agents |
| RAG Pattern | Agent-coordinated retrieval & synthesis | Prebuilt retriever chains | Chat-based tool invocation |
| Ecosystem Breadth | Growing | Broad integration ecosystem | Expanding enterprise focus |
| Orchestration Style | Explicit task delegation | Composable pipelines | Agent-to-agent dialogue |
| Complexity Overhead | Moderate | Moderate to high | Moderate |
LangChain provides wide integration support and utilities. CrewAI emphasizes clarity of workflow ownership. AutoGen leans into conversational coordination between agents.
Latency, cost control, and document quality often influence outcomes more than framework choice.
Common Mistakes When Building RAG Systems
Three issues appear repeatedly:
- Overestimating model intelligence. RAG improves grounding but cannot compensate for outdated or poorly structured documents.
- Neglecting chunk strategy. Retrieval quality often hinges on chunk size and overlap.
- Over-orchestrating prematurely. Multi-agent setups add clarity at scale but increase debugging complexity.
The simplest functional system is often the best starting point.
Decision Framework
When a Simple RAG Application Using CrewAI Makes Sense
- Multi-step internal copilots
- Workflows requiring explicit role separation
- Systems expected to evolve into tool-integrated agents
When Alternatives May Be Preferable
- Minimal document Q&A bots
- Ultra-low-latency requirements
- Teams already heavily invested in existing LangChain chains
The decision is architectural rather than ideological.
FAQ: Create a Simple RAG Application Using CrewAI
Q: What is required to create a simple RAG application using CrewAI?
A: A document source, embedding model, vector database, LLM, and CrewAI agents to orchestrate retrieval and generation.
Q: Does CrewAI handle embeddings and vector storage?
A: No. CrewAI coordinates agents and tasks. Embeddings and vector databases are provided by separate libraries or services.
Q: Can this example scale to production?
A: Yes, provided the underlying vector store, model hosting, and monitoring infrastructure are production-grade.
Q: How does CrewAI compare to LangChain for RAG?
A: CrewAI emphasizes role-based agents and task delegation. LangChain centers on chains and tool integrations. Both support RAG workflows.
Q: Is multi-agent RAG more accurate?
A: It can improve structure and validation, but accuracy still depends heavily on document quality and retrieval strategy.
Closing Thoughts
Efforts to create a simple RAG application using CrewAI reflect a broader shift toward structured, context-aware AI systems. Retrieval ensures domain relevance. Agent orchestration clarifies workflow boundaries. The trade-offs revolve around complexity, latency, and maintainability.
The example above demonstrates that a working RAG system can be built in under a hundred lines of code. What determines success in production is less about framework choice and more about data discipline, evaluation strategy, and architectural clarity.
Editorial Note: This article is based on publicly available industry research, official documentation, and general informational sources. Content is reviewed and updated periodically to reflect changes in products, specifications, pricing models, and market practices.
I am a writer, blogger and maker! I am passionate about technology and new trends in the market.