Semantic search revolutionizes information retrieval by understanding the meaning and context of queries rather than relying solely on keyword matching.
This approach uses vector embeddings to represent both documents and queries in a high-dimensional space,
enabling similarity-based retrieval through mathematical operations.
Vector embeddings transform text into numerical representations that capture semantic meaning.
Documents with similar meanings cluster together in the vector space, regardless of exact word matches.
Similarity Metrics
-
Cosine Similarity: Measures the angle between two vectors (most common for text).
-
Euclidean Distance (L2): Measures straight-line distance between vectors.
Search Algorithms
-
k-Nearest Neighbors (kNN): Exact search, suitable for smaller datasets.
-
Approximate Nearest Neighbors (ANN): Faster search with slight accuracy trade-off, ideal for large-scale applications (millions+ embeddings).
Semantic search workflow:
- Indexing Phase:
- Search Phase:
When indexing or storing a document in a vector database, one approach is to generate a single embedding vector for the entire document.
However, this can result in loss of contextual information, leading to less accurate search results.
To address this, the document can be split into smaller chunks, and an embedding is generated for each chunk to preserve finer-grained semantic details.
Text Chunking Strategies
-
Fixed-Size Chunking
- Character-based: Split at fixed character counts (e.g., 512, 1024 characters).
- Token-based: Split at fixed token counts (respects model limits).
-
Semantic Chunking
- Sentence-based: Maintain sentence boundaries.
- Paragraph-based: Preserve logical document structure.
-
Overlapping Windows
- Maintain context continuity between chunks.
- Typical overlap: 10-20% of chunk size.
- Helps capture cross-boundary information.
-
Advanced Techniques
- LLM-guided chunking: Use language models to identify optimal split points.
- Metadata enrichment: Add summaries, keywords, or extracted entities.
We’ll use a few examples to demonstrate how semantic search works.
The first step is to embed all documents into a vector space using a vector database.
Queries must then be embedded using the same embedding model that was used to generate the document embeddings.
The goal is to retrieve the most relevant documents—those whose embeddings are closest to the query embedding—based on cosine similarity.
This involves identifying the nearest neighbors in the vector space, i.e., the document embeddings closest to the query embedding.
Search results can be further enhanced using Large Language Models (LLMs) through techniques such as re-ranking and Retrieval-Augmented Generation (RAG).