Transform Search Accuracy! The Frontline of Index Strategy - Dense/Sparse/Hybrid Comparison
Solve search omissions and noise through strategic use of FAISS, Milvus, Pinecone, and Weaviate. Learn combination strategies for BM25, SPLADE, and DPR.
Table of Contents
Transform Search Accuracy! The Frontline of Index Strategy
Dense/Sparse/Hybrid Thorough Comparison
In RAG systems, index strategy is a crucial element that significantly affects search accuracy. This article provides detailed practical insights into traditional keyword search (Sparse), semantic search (Dense), and hybrid search that combines both approaches.
Why Do Search Accuracy Problems Occur?
Many RAG systems experience the following issues:
1. Limitations of Exact Matching: Keyword search cannot handle expression variations
2. Ambiguity in Semantic Search: Vector search tends to miss specific proper nouns
3. Lack of Context: Single search methods cannot consider context
Types and Characteristics of Index Strategies
1. Sparse Search (BM25)
Features: Keyword matching, statistical weighting
Strengths: Proper nouns, technical terms, exact string matching
Weaknesses: Synonyms, expression variations, semantic similarity
1from rank_bm25 import BM25Okapi
2import jieba
3
4# Japanese-compatible BM25 implementation
5def create_bm25_index(documents):
6 # Japanese word segmentation
7 tokenized_docs = [list(jieba.cut(doc)) for doc in documents]
8 bm25 = BM25Okapi(tokenized_docs)
9 return bm25
10
11# Execute search
12query = "machine learning accuracy improvement"
13tokenized_query = list(jieba.cut(query))
14scores = bm25.get_scores(tokenized_query)
2. Dense Search (Vector Search)
Features: Semantic similarity, contextual understanding
Strengths: Synonym search, semantic similarity, multilingual support
Weaknesses: Proper nouns, numerical values, exact string matching
1import faiss
2from sentence_transformers import SentenceTransformer
3
4# Dense search implementation
5model = SentenceTransformer('BAAI/bge-large-ja')
6embeddings = model.encode(documents)
7
8# Build FAISS index
9dimension = embeddings.shape[1]
10index = faiss.IndexFlatIP(dimension) # Inner Product
11index.add(embeddings.astype('float32'))
12
13# Execute search
14query_embedding = model.encode([query])
15k = 5
16scores, indices = index.search(query_embedding.astype('float32'), k)
3. Hybrid Search
Features: Combination of Sparse and Dense
Benefits: Leverages strengths of both methods
Implementation: Weighted score fusion
1from langchain.retrievers import EnsembleRetriever
2from langchain.retrievers import BM25Retriever
3from langchain.vectorstores import FAISS
4
5# Build hybrid retriever
6bm25_retriever = BM25Retriever.from_documents(documents)
7vector_store = FAISS.from_documents(documents, embeddings)
8vector_retriever = vector_store.as_retriever()
9
10# Ensemble retriever
11hybrid_retriever = EnsembleRetriever(
12 retrievers=[bm25_retriever, vector_retriever],
13 weights=[0.4, 0.6] # BM25: 40%, Vector: 60%
14)
Vector Database Comparison
FAISS (Facebook AI Similarity Search)
Features:
- •Pros: High speed, diverse index types, free
- •Cons: Difficult distributed processing, limited metadata features
- •Use Cases: Single machine, prototypes
1import faiss
2
3# IVFPQ (memory efficiency focused)
4quantizer = faiss.IndexFlatL2(dimension)
5index = faiss.IndexIVFPQ(quantizer, dimension, 100, 8, 8)
6
7# HNSW (speed focused)
8index = faiss.IndexHNSWFlat(dimension, 32)
Milvus
Features:
- •Pros: Distributed processing, scalability, rich indexes
- •Cons: Complex setup, high resource consumption
- •Use Cases: Large-scale data, production environments
1from pymilvus import connections, Collection
2
3# Milvus connection
4connections.connect("default", host="localhost", port="19530")
5
6# Create collection
7collection = Collection("rag_collection")
8
9# Execute search
10search_params = {"metric_type": "IP", "params": {"nprobe": 10}}
11results = collection.search(
12 query_embedding,
13 "embeddings",
14 search_params,
15 limit=5
16)
Pinecone
Features:
- •Pros: Managed service, easy setup
- •Cons: Cloud dependency, cost
- •Use Cases: Rapid development, reduced operational burden
1import pinecone
2
3# Initialize Pinecone
4pinecone.init(api_key="your-api-key", environment="us-west1-gcp")
5
6# Index operations
7index = pinecone.Index("rag-index")
8
9# Insert vectors
10index.upsert(vectors=zip(ids, embeddings, metadata))
11
12# Execute search
13results = index.query(
14 vector=query_embedding.tolist(),
15 top_k=5,
16 include_metadata=True
17)
Weaviate
Features:
- •Pros: GraphQL API, schema flexibility, AI integration
- •Cons: Learning curve, complex performance tuning
- •Use Cases: Complex schemas, multimodal
1import weaviate
2
3# Weaviate connection
4client = weaviate.Client("http://localhost:8080")
5
6# Execute search
7result = client.query.get("Document", ["content", "title"]) .with_near_vector({"vector": query_embedding}) .with_limit(5) .do()
Advanced Index Optimization
Hierarchical Indexing
Hierarchical approach utilizing document structure:
1class HierarchicalIndex:
2 def __init__(self):
3 self.document_index = {} # Document level
4 self.section_index = {} # Section level
5 self.paragraph_index = {} # Paragraph level
6
7 def build_hierarchy(self, documents):
8 for doc in documents:
9 # Document level index
10 doc_embedding = self.embed_document(doc)
11 self.document_index[doc.id] = doc_embedding
12
13 # Section level
14 for section in doc.sections:
15 section_embedding = self.embed_section(section)
16 self.section_index[section.id] = section_embedding
Time-Aware Indexing
Considering temporal weighting:
1import datetime
2
3def time_weighted_search(query_embedding, results, decay_factor=0.1):
4 current_time = datetime.datetime.now()
5 weighted_results = []
6
7 for result in results:
8 # Apply temporal decay
9 time_diff = (current_time - result.timestamp).days
10 time_weight = math.exp(-decay_factor * time_diff)
11
12 final_score = result.score * time_weight
13 weighted_results.append((result, final_score))
14
15 return sorted(weighted_results, key=lambda x: x[1], reverse=True)
Search Strategy Optimization
Reciprocal Rank Fusion (RRF)
Effectively fusing multiple search results:
1def reciprocal_rank_fusion(rankings, k=60):
2 fused_scores = {}
3
4 for ranking in rankings:
5 for rank, doc_id in enumerate(ranking, 1):
6 if doc_id not in fused_scores:
7 fused_scores[doc_id] = 0
8 fused_scores[doc_id] += 1 / (k + rank)
9
10 return sorted(fused_scores.items(), key=lambda x: x[1], reverse=True)
Query Expansion
Improving search accuracy through query expansion:
1from transformers import pipeline
2
3# Query expansion implementation
4def expand_query(query, model_name="rinna/japanese-gpt-neox-3.6b"):
5 generator = pipeline("text-generation", model=model_name)
6
7 prompt = f"Keywords related to question '{query}':"
8 expanded = generator(prompt, max_length=50, num_return_sequences=1)
9
10 return expanded[0]['generated_text']
INDX Practical Case Studies
Law Firm C
Challenge: Poor case law search accuracy (45% precision)
Solution:
- •BM25 (for statutes & case numbers) + Dense (for content search)
- •Hierarchical indexing (law → article → clause)
- •Temporal weighting (prioritizing recent cases)
Implementation:
1# Legal-specialized index
2class LegalHybridRetriever:
3 def __init__(self):
4 self.statute_bm25 = BM25Retriever() # For statutes
5 self.case_dense = DenseRetriever() # For cases
6 self.temporal_weights = TemporalWeighting()
7
8 def search(self, query):
9 statute_results = self.statute_bm25.search(query)
10 case_results = self.case_dense.search(query)
11
12 # Apply temporal weighting
13 weighted_results = self.temporal_weights.apply(
14 statute_results + case_results
15 )
16
17 return self.fuse_results(weighted_results)
Results: Precision 45% → 78%, User satisfaction 90%
Manufacturing Company D
Challenge: Multilingual technical specification search
Solution:
- •Multilingual Dense search (Japanese/English/Chinese support)
- •Integration with technical terminology dictionary
- •Domain-specific fine-tuning
Results: Search time 30s → 3s, Accuracy 60% → 85%
Performance Optimization
Index Compression
1# Product Quantization (PQ)
2index = faiss.IndexPQ(dimension, 8, 8) # Compress to 8 bytes
3
4# Scalar Quantization (SQ)
5index = faiss.IndexScalarQuantizer(dimension, faiss.ScalarQuantizer.QT_8bit)
Parallel Processing Optimization
1import concurrent.futures
2
3def parallel_search(queries, retrievers):
4 with concurrent.futures.ThreadPoolExecutor(max_workers=4) as executor:
5 futures = [
6 executor.submit(retriever.search, query)
7 for query, retriever in zip(queries, retrievers)
8 ]
9 results = [future.result() for future in futures]
10 return results
Conclusion
Index strategy optimization is a crucial element that determines RAG system success. Rather than relying on a single method, significant accuracy improvements can be achieved by appropriately using Sparse, Dense, and Hybrid approaches according to use cases, and combining advanced techniques like hierarchical and temporal approaches. At INDX, we design and implement optimal index strategies tailored to each client's data characteristics and requirements.