Knowledge

RAG

Retrieval-Augmented Generation powered by your local documents.

Sparky includes a production-grade RAG system that runs entirely on your machine. Add documents, code files, and URLs as knowledge sources. Sparky chunks, embeds, and indexes everything locally using llama.cpp. No data leaves your machine and no external API calls are made.

When knowledge is enabled for a chat, Sparky automatically retrieves relevant chunks from your indexed sources and includes them in the context sent to the agent. The agent can then reference your documents to answer questions, synthesize information, or provide grounded responses.

How It Works

Add sources — add files, folders, or URLs from the Knowledge section in the sidebar
Indexing — Sparky splits documents into chunks, generates embeddings, and builds a full-text search index
Retrieval — when you ask a question with knowledge enabled, Sparky searches your index using hybrid search (BM25 + vector similarity) and reranks the results
Augmentation — the most relevant chunks are injected into the conversation context before the agent responds

Supported Indexing Formats

Text files, Markdown, PDF, HTML, and URLs are supported out of the box.

BM25 (Keyword Search)

BM25 is a traditional keyword-based ranking algorithm powered by SQLite FTS5. It matches documents based on exact term frequency and is fast, predictable, and works well when you know the exact terminology.

Best for: exact matches, code symbols, technical terms, known keywords.

Hybrid Search

Hybrid search combines BM25 keyword matching with vector similarity search. Documents are embedded using a local model (via llama.cpp) and stored in sqlite-vec. At query time, both BM25 and vector results are merged and reranked using a cross-encoder model to produce the final ranking.

Best for: natural language questions, conceptual queries, finding related content when you don't know the exact wording.

note

Hybrid search requires two additional models (~1.3 GB total) for embedding and reranking. They are downloaded automatically in the background when you first enable hybrid mode. You can continue using the application while the download completes.

Reranking

After the initial retrieval, a reranking model scores each candidate chunk against your query for deeper semantic relevance. This two-stage approach (retrieve broadly, then rerank precisely) gives you the best of both worlds: fast initial retrieval with high-quality final results.

Comparison

	BM25	Hybrid
Speed	Faster	Slightly slower
Exact matches	Excellent	Excellent
Semantic understanding	None	Strong
Best use case	Known terms	Natural language

Models

All models run locally via llama.cpp. No data is sent to external services.

Model	Purpose
`nomic-embed-text-v1.5.Q4_0.gguf`	Text embeddings for vector search
`bge-reranker-v2-m3-Q4_K_M.gguf`	Cross-encoder reranking
`qwen2.5-1.5b-instruct-q4_k_m.gguf`	Keyword extraction and query expansion

How It Works​

Supported Indexing Formats​

BM25 (Keyword Search)​

Hybrid Search​

Reranking​

Comparison​

Models​