Deep Dive into Search & RAG

An in-depth look at the Retrieval Augmented Generation (RAG) pipeline and how search functionality is implemented.

Last updated on March 7, 2026

AI Docs provides powerful search and AI assistant capabilities, fundamentally transforming how you interact with your project's documentation and codebase. This deep dive explores the sophisticated mechanisms behind these features, focusing on hybrid search for documentation and Retrieval Augmented Generation (RAG) for the AI assistant. These systems ensure you receive accurate, context-rich, and semantically relevant information, whether you're searching for a specific topic or asking a complex question.

Hybrid Search for Documentation

AI Docs offers a robust search functionality that combines the strengths of semantic search and traditional keyword-based full-text search. This hybrid approach ensures that your queries yield comprehensive and highly relevant results, understanding both the meaning and the exact terms you're looking for.

When you initiate a search within your documentation site, the following process unfolds:

Parallel Search Execution: Your search query triggers two distinct search mechanisms simultaneously:
- Vector Search (Qdrant): Your query is first converted into a numerical vector embedding. This embedding is then used to perform a semantic search in Qdrant, AI Docs' vector database. This process identifies documentation and code chunks that are semantically similar to your query, even if they don't contain the exact keywords. This is particularly effective for conceptual searches.
- Full-Text Search (PostgreSQL): Concurrently, a traditional full-text search is executed against the generatedDocs table in PostgreSQL. This leverages advanced text indexing to find precise keyword matches within documentation titles and content, providing highly accurate results for specific terms.
Result Merging and Deduplication: The results from both Qdrant and PostgreSQL are then intelligently merged. Semantic search results are prioritized, and then unique full-text matches that were not already found by the vector search are added. This ensures a broad coverage of relevant information without redundancy.
Intelligent Re-ranking: The merged results undergo a re-ranking process. A "quality score" is applied to each result, boosting generated documentation for conceptual queries and code chunks (especially those with exports or functions) for implementation-focused queries. Results from very small chunks that might lack sufficient context are penalized, ensuring higher quality snippets rise to the top. The final list is sorted by this combined score, presenting the most relevant items first.

This /api/search endpoint orchestrates this entire process, providing a seamless and highly effective search experience.

Retrieval Augmented Generation (RAG) for the AI Assistant

The AI assistant feature, accessible through the chat interface, is powered by Retrieval Augmented Generation (RAG). This advanced technique enhances the AI model's responses by dynamically providing it with relevant, up-to-date context from your project's documentation and codebase, ensuring highly accurate and project-specific answers.

Here's how the RAG flow works when you interact with the AI assistant via the DocsChat component:

User Query: You ask a question to the AI assistant through the chat interface.
Query Embedding: Your question is converted into a vector embedding using OpenAI's embedding models. This transforms your natural language query into a numerical representation that captures its semantic meaning.
Vector Search: AI Docs performs a sophisticated vector search in Qdrant using your query embedding. This search, handled by the searchRelevantChunks function, retrieves the most semantically similar chunks of information from your project's stored documentation and code. This includes both the documentation generated by AI Docs and the original source code.
Context Assembly: The retrieved chunks are carefully assembled into a structured context. Documentation chunks are prioritized for conceptual understanding and usage questions, while relevant code snippets are included for implementation details. This ensures the AI receives a balanced and comprehensive view of the relevant information.

The system intelligently separates documentation and code context, presenting documentation first as it's often more useful for conceptual understanding.
System Prompt Augmentation: This assembled context, along with a concise overview of your project (name, repository, descriptions), is dynamically injected into the AI model's system prompt. This guides the AI to use the specific, relevant information from your project to formulate its answer, rather than relying solely on its general training data.
AI Response Generation: The AI model, specifically gpt-4o-mini from OpenAI, generates a response based on its general knowledge, the augmented system prompt, and the conversation history. The response is streamed back to your chat interface in real-time, often including citations to the specific documentation sections or code files that informed the answer.

This /api/chatrag endpoint orchestrates the entire RAG process, ensuring that the AI assistant provides intelligent, context-aware, and actionable responses. Note that AI chat is a Pro plan feature.

Example: RAG System Prompt Structure

The system prompt provided to the LLM is dynamically constructed to give the AI assistant clear instructions and context. While the exact prompt can vary, it generally follows this structure:

typescript

// Conceptual example of system prompt construction
const systemPrompt = `You are a knowledgeable AI assistant for the project "${projectName}". You help developers understand this project's codebase, architecture, and usage.

## Project Overview
- **Name**: ${projectName}
- **Repository**: ${repoOwner}/${repoName}
- **URL**: ${repoUrl}
- **Description**: ${projectDescription}

## Retrieved Context
// Documentation context (if available)
## Documentation Context
[Doc 1] Group > Title > Section Heading
Content of documentation chunk 1

[Doc 2] Another Group > Another Title
Content of documentation chunk 2

// Code context (if available)
## Code Context
[Source 1] path/to/file.ts:10-25
Language: typescript
\`\`\`typescript
// Code snippet 1
\`\`\`

[Source 2] another/file.js:50-70
Language: javascript
\`\`\`javascript
// Code snippet 2
\`\`\`

## Instructions
- Use the project overview to answer general questions.
- Use the documentation context for conceptual and usage questions.
- Use the code context for implementation-specific questions.
- Synthesize information from both when relevant.
- Reference specific files, sections, and line numbers.
- If context is insufficient, state uncertainty.
- Format code snippets properly using markdown.
- Be concise but thorough.`;

Core Components and Technologies

The advanced search and RAG capabilities in AI Docs are built upon a foundation of powerful AI and database technologies:

Qdrant (Vector Database): Qdrant is central to AI Docs' semantic understanding. It stores high-dimensional vector embeddings of all your project's documentation and code chunks, along with rich metadata. This enables lightning-fast semantic similarity searches, forming the backbone of both hybrid search and RAG. For more details, refer to Vector Search with Qdrant.
OpenAI (LLMs and Embeddings):
- Embeddings: OpenAI's text-embedding-3-small model is used to convert text (user queries, code, documentation) into vector embeddings, which are then stored and queried in Qdrant.
- Large Language Models (LLMs): For the AI assistant, AI Docs leverages OpenAI's gpt-4o-mini model. This model processes the augmented system prompts and generates intelligent, context-aware responses.
PostgreSQL (Relational Database & Full-Text Search): PostgreSQL serves as the primary relational database, storing project metadata, generated documentation (generatedDocs table), and other application data. It also powers the traditional full-text search component of the hybrid search, using ts_rank and to_tsvector functions for efficient keyword matching. For more details, refer to Database Architecture with Drizzle ORM.
AI SDK (Orchestration): AI Docs uses the Vercel AI SDK (@ai-sdk/openai, ai) to orchestrate interactions with LLMs. This SDK simplifies tasks such as managing chat history, converting messages, constructing prompts, and streaming responses, providing a robust framework for building AI-powered features.

Optimizations and Best Practices

AI Docs incorporates several optimizations to ensure the search and RAG systems are performant, accurate, and deliver high-quality results:

Intelligent Re-ranking: Beyond simple similarity, a "quality score" is applied to search results. This score boosts generated-doc chunks for conceptual queries and code chunks with features like hasExports or hasFunctions for implementation-focused queries. It also penalizes very small chunks that might lack sufficient context.
Deduplication Strategy: Results are deduplicated to avoid overwhelming you with redundant information, especially for code snippets from the same file.
Qdrant Collection Configuration: The Qdrant collection is configured with specific parameters (e.g., scalar quantization, HNSW index tuning) to maximize search speed and efficiency.
Payload Filtering: Extensive payload indexing in Qdrant allows for precise filtering of search results based on projectId, sourceType, language, and other metadata, ensuring only relevant information is considered.
Rate Limiting: Both the public search API (/api/search) and the AI chat API (/api/chatrag) are protected by rate limits (publicApiLimiter, aiApiLimiter) to ensure fair usage and prevent abuse.
Context Assembly Logic: The assembleContext function is designed to prioritize and structure retrieved information logically for the LLM, placing documentation before code for better conceptual understanding.
Customization: You can influence the AI's behavior by modifying the system prompts or even swapping out LLMs. For advanced techniques, refer to Customizing AI Behavior.

By understanding these advanced topics, you can fully appreciate the depth and sophistication of AI Docs' search and AI assistant capabilities, and leverage them to their fullest potential.

PreviousCustomizing AI Behavior NextSecurity and Data Privacy

Was this page helpful?