Vector Search with Qdrant

Learn how Qdrant is leveraged for semantic code search, analysis, and providing relevant context to the AI model.

Last updated on March 7, 2026

AI Docs leverages advanced vector search capabilities, powered by Qdrant, to transform your GitHub repositories into an intelligent, queryable knowledge base. This system is fundamental to how the AI assistant answers questions and how the search functionality provides highly relevant results, enabling semantic understanding beyond simple keyword matching.

Core Concepts of Vector Search

Vector search is a technique that allows you to find items based on their semantic similarity rather than exact keyword matches. This approach enables AI Docs to understand the intent behind your questions or search queries, even if the exact words aren't present in the stored content.

At its core, vector search involves:

Embeddings: Text, such as code snippets or documentation paragraphs, is converted into high-dimensional numerical vectors, known as embeddings. These vectors capture the semantic meaning of the text. AI Docs uses OpenAI's text-embedding-3-small model, which generates 1536-dimensional vectors. Semantically similar pieces of text will have vectors that are numerically "close" to each other in this high-dimensional space.
Vector Database: A specialized database, like Qdrant, stores these vectors along with their original content and associated metadata. When you submit a query, that query is also converted into an embedding. The vector database then efficiently searches for and retrieves stored vectors that are closest to the query vector, indicating semantic relevance.

This process allows AI Docs to go beyond simple keyword matching, providing more intelligent and contextually relevant answers.

Qdrant's Role in AI Docs

Qdrant serves as the high-performance vector database for AI Docs. It is chosen for its speed, scalability, and robust feature set, which includes advanced indexing and filtering capabilities crucial for handling diverse documentation and codebase content.

Collection Initialization and Optimization

When you set up a project, AI Docs initializes a dedicated Qdrant collection named "docup". This collection is configured for optimal performance:

Vector Size: Set to 1536 to match the output of OpenAI's text-embedding-3-small model.
Distance Metric: Cosine distance is used, which is ideal for measuring semantic similarity between embeddings.
Quantization: Scalar quantization (specifically int8) is enabled. This compresses the vectors, reducing memory usage and speeding up search queries with minimal impact on accuracy.
HNSW Index: A Hierarchical Navigable Small World (HNSW) index is used for efficient Approximate Nearest Neighbor (ANN) search, allowing for very fast retrieval of similar vectors.
Payload Indexes: Critical metadata fields are indexed to enable rapid filtering. These include projectId (to isolate data for specific projects), fileType, language, isDocumentation, and sourceType (to distinguish between generated documentation content and raw source code).

Storing Embeddings

When AI Docs processes your GitHub repository or generates documentation, it breaks down the content (code files, generated markdown) into smaller, manageable "chunks." Each chunk is then embedded and stored in Qdrant.

The payload associated with each vector is rich with metadata, enabling powerful filtering and re-ranking. This detailed metadata allows AI Docs to intelligently retrieve and prioritize information based on the context of the query. For example, a query about "how to implement X" might prioritize code chunks with functions, while a query about "what is Y" might prioritize generated documentation chunks.

For more details on how the database schema is managed, refer to Database Architecture with Drizzle ORM.

How AI Docs Leverages Vector Search for RAG

The core of AI Docs' AI capabilities, particularly for answering questions about your codebase, relies on a Retrieval Augmented Generation (RAG) pipeline. Vector search with Qdrant is a critical component of this pipeline, responsible for retrieving the most relevant context. You can learn more about the overall RAG pipeline in Deep Dive into Search & RAG.

When you ask a question to the AI assistant, the following steps occur:

Query Embedding: Your natural language query is first converted into a vector embedding using OpenAI's text-embedding-3-small model.
Semantic Search: The searchRelevantChunks function performs a semantic search in Qdrant using your query's embedding. It searches within the "docup" collection, filtering results by your specific projectId to ensure data isolation.
Result Processing and Re-ranking: The raw results from Qdrant are then passed to the processResults function, which applies several intelligent optimizations:
- Quality Scoring: Each retrieved chunk receives a "quality score" that adjusts its relevance.
- Contextual Boosting:
  - generated-doc chunks (documentation) receive a boost for conceptual queries.
  - Code chunks with features like hasExports, hasFunctions, or hasClasses receive a boost for implementation-focused queries.
  - Chunks with hasComments are also boosted, as they often provide better context.
- Penalization: Very small chunks (e.g., less than 200 characters) are penalized, as they might lack sufficient context.
- Deduplication: While multiple documentation chunks from different sections or pages are valuable, code chunks are deduplicated by file path to avoid overwhelming results with redundant code snippets from the same file.
- Top-K Selection: After re-ranking and deduplication, the top K most relevant chunks are selected.
Context Assembly: The retrieved chunks are then assembled into a structured context. AI Docs intelligently separates documentation context from code context, presenting them clearly to the Large Language Model (LLM). This separation helps the LLM understand the nature of the information.
- Documentation chunks are formatted with their docGroup, docTitle, and sectionHeading.
- Code chunks are formatted with their path, startLine, endLine, and language, enclosed in code blocks.
LLM Prompt Construction: This assembled context, along with an overview of your project, is incorporated into a system prompt for the LLM. This enriched prompt guides the LLM to generate accurate and contextually relevant answers based on your project's specific documentation and codebase.
Response Generation: The LLM (e.g., gpt-4o-mini) processes the prompt and your original query to generate a comprehensive answer, often citing specific sources from the retrieved chunks.

Specialized Search Functions

AI Docs also provides specialized functions built on top of searchRelevantChunks to optimize for specific use cases:

searchCodeExamples(projectId, query, topK): Prioritizes code files by boosting chunks with hasExports, hasFunctions, or hasClasses. This is ideal when you're looking for implementation details or code snippets.
searchDocumentation(projectId, query, topK): Prioritizes generated documentation files, making it suitable for conceptual questions or understanding how to use features.

These functions allow AI Docs to fine-tune its retrieval strategy based on the likely intent of your query.

Hybrid Search for Documentation

Beyond the RAG pipeline, AI Docs provides a powerful search functionality that combines the strengths of vector search (semantic understanding) and traditional full-text search (keyword matching). This hybrid approach ensures comprehensive and highly relevant search results when you use the search dialog.

When you use the search dialog, the following happens:

Parallel Search Execution:
- Vector Search (Qdrant): Your search query is embedded, and the searchRelevantChunks function is called to find semantically similar documentation and code chunks. This excels at finding related concepts even if exact keywords aren't present.
- Full-Text Search (PostgreSQL): Simultaneously, a traditional full-text search is performed on the generatedDocs table in PostgreSQL. This uses a ts_rank function to find keyword matches within documentation titles and content, providing precise results for specific terms.
Result Merging and Deduplication: The results from both Qdrant and PostgreSQL are merged. Vector search results are prioritized, and then unique full-text matches are added.
Re-ranking: The merged results are then re-ranked based on a combined score, ensuring the most relevant items appear at the top. The processResults function applies a "quality score" to Qdrant hits, boosting generated documentation for conceptual queries and code chunks with features like exports or functions for implementation-focused queries. It also penalizes very small chunks that might lack sufficient context.

This hybrid strategy provides a robust search experience, balancing semantic understanding with keyword precision.

Managing Embeddings

AI Docs provides robust mechanisms for managing the embeddings stored in Qdrant, ensuring your knowledge base remains accurate and up-to-date:

Upserting Embeddings: New embeddings are added, or existing ones are updated, by processing content chunks in batches. Each point includes a rich payload of metadata, allowing for precise filtering and retrieval.
Deleting Embeddings:
- All embeddings associated with a specific project can be removed, for example, when a project is deleted.
- Only the generated documentation embeddings for a project can be deleted, allowing for regeneration of documentation without affecting code embeddings.
- Specific embedding points can be deleted by their unique IDs for granular control.

These functions ensure that your Qdrant collection remains clean, up-to-date, and accurately reflects the current state of your project's documentation and codebase.

PreviousAI-Powered Generation NextDocumentation Structure and Display

Was this page helpful?