Learn how Qdrant is leveraged for semantic code search, analysis, and providing relevant context to the AI model.
AI Docs leverages advanced vector search capabilities, powered by Qdrant, to transform your GitHub repositories into an intelligent, queryable knowledge base. This system is fundamental to how the AI assistant answers questions and how the search functionality provides highly relevant results, enabling semantic understanding beyond simple keyword matching.
Vector search is a technique that allows you to find items based on their semantic similarity rather than exact keyword matches. This approach enables AI Docs to understand the intent behind your questions or search queries, even if the exact words aren't present in the stored content.
At its core, vector search involves:
text-embedding-3-small model, which generates 1536-dimensional vectors. Semantically similar pieces of text will have vectors that are numerically "close" to each other in this high-dimensional space.This process allows AI Docs to go beyond simple keyword matching, providing more intelligent and contextually relevant answers.
Qdrant serves as the high-performance vector database for AI Docs. It is chosen for its speed, scalability, and robust feature set, which includes advanced indexing and filtering capabilities crucial for handling diverse documentation and codebase content.
When you set up a project, AI Docs initializes a dedicated Qdrant collection named "docup". This collection is configured for optimal performance:
1536 to match the output of OpenAI's text-embedding-3-small model.Cosine distance is used, which is ideal for measuring semantic similarity between embeddings.int8) is enabled. This compresses the vectors, reducing memory usage and speeding up search queries with minimal impact on accuracy.projectId (to isolate data for specific projects), fileType, language, isDocumentation, and sourceType (to distinguish between generated documentation content and raw source code).When AI Docs processes your GitHub repository or generates documentation, it breaks down the content (code files, generated markdown) into smaller, manageable "chunks." Each chunk is then embedded and stored in Qdrant.
The payload associated with each vector is rich with metadata, enabling powerful filtering and re-ranking. This detailed metadata allows AI Docs to intelligently retrieve and prioritize information based on the context of the query. For example, a query about "how to implement X" might prioritize code chunks with functions, while a query about "what is Y" might prioritize generated documentation chunks.
For more details on how the database schema is managed, refer to Database Architecture with Drizzle ORM.
The core of AI Docs' AI capabilities, particularly for answering questions about your codebase, relies on a Retrieval Augmented Generation (RAG) pipeline. Vector search with Qdrant is a critical component of this pipeline, responsible for retrieving the most relevant context. You can learn more about the overall RAG pipeline in Deep Dive into Search & RAG.
When you ask a question to the AI assistant, the following steps occur:
text-embedding-3-small model.searchRelevantChunks function performs a semantic search in Qdrant using your query's embedding. It searches within the "docup" collection, filtering results by your specific projectId to ensure data isolation.processResults function, which applies several intelligent optimizations:
generated-doc chunks (documentation) receive a boost for conceptual queries.hasExports, hasFunctions, or hasClasses receive a boost for implementation-focused queries.hasComments are also boosted, as they often provide better context.K most relevant chunks are selected.docGroup, docTitle, and sectionHeading.path, startLine, endLine, and language, enclosed in code blocks.gpt-4o-mini) processes the prompt and your original query to generate a comprehensive answer, often citing specific sources from the retrieved chunks.AI Docs also provides specialized functions built on top of searchRelevantChunks to optimize for specific use cases:
searchCodeExamples(projectId, query, topK): Prioritizes code files by boosting chunks with hasExports, hasFunctions, or hasClasses. This is ideal when you're looking for implementation details or code snippets.searchDocumentation(projectId, query, topK): Prioritizes generated documentation files, making it suitable for conceptual questions or understanding how to use features.These functions allow AI Docs to fine-tune its retrieval strategy based on the likely intent of your query.
Beyond the RAG pipeline, AI Docs provides a powerful search functionality that combines the strengths of vector search (semantic understanding) and traditional full-text search (keyword matching). This hybrid approach ensures comprehensive and highly relevant search results when you use the search dialog.
When you use the search dialog, the following happens:
searchRelevantChunks function is called to find semantically similar documentation and code chunks. This excels at finding related concepts even if exact keywords aren't present.generatedDocs table in PostgreSQL. This uses a ts_rank function to find keyword matches within documentation titles and content, providing precise results for specific terms.processResults function applies a "quality score" to Qdrant hits, boosting generated documentation for conceptual queries and code chunks with features like exports or functions for implementation-focused queries. It also penalizes very small chunks that might lack sufficient context.This hybrid strategy provides a robust search experience, balancing semantic understanding with keyword precision.
AI Docs provides robust mechanisms for managing the embeddings stored in Qdrant, ensuring your knowledge base remains accurate and up-to-date:
These functions ensure that your Qdrant collection remains clean, up-to-date, and accurately reflects the current state of your project's documentation and codebase.