← tomás.erdmannsdörffer ● retrieval runs in your browser · zero backend

Ask my research corpus.

A transparent RAG system over a curated corpus of papers on physics-informed neural networks, neural operators, model compression, and LLM inference, the literature I actually use day-to-day. BM25 retrieval runs client-side, and answers work instantly with no setup (extractive mode), or plug in any OpenAI-compatible LLM for generative answers. The behind-the-scenes panel shows retrieved chunks, scores, and the exact prompt: if retrieval fails, you can see why.

RAGBM2524 chunks · 16 papers extractive + generativezero backend

answers:⚡ extractive mode, works now, no key needed · index:loading…

🔎

Ask anything about the corpus.

Retrieval is instant and local. Try one of these, no API key required:

retrieval[01]

AlgorithmBM25 + tag boost

Corpus size-

Top-k retrieved4

Last query-

retrieved chunks[02]

Ask a question to see retrieved chunks, scores and score bars.

prompt / context[03]

What's actually happening here?

Most "chat with your PDFs" demos hide retrieval behind a black box. This one shows everything: the chunks pulled, their scores, and the exact prompt. That transparency is the point, when retrieval fails, you can see why.

architecture

Static corpus: 24 hand-curated chunks from 16 scientific ML papers, shipped as one small JSON file. No vector DB, no backend.
Retrieval: BM25 with light stemming, computed in the browser. Title, section, and tag tokens are up-weighted; a small tag-overlap rerank runs after scoring. Top-4 retained.
Extractive mode (default): with no LLM configured, the answer is stitched from the highest-scoring sentences inside the retrieved chunks, each sentence scored by IDF-weighted query overlap. Zero setup, fully local, honestly labeled.
Generative mode: any OpenAI-compatible endpoint, OpenAI, Groq, OpenRouter, local Ollama. Your key stays in localStorage and is sent only to the endpoint you configure.

honest trade-offs

BM25 vs dense embeddings. Production RAG uses dense embeddings + a reranker; they handle synonyms and paraphrase better. BM25 wins for a 24-chunk corpus and ships with zero infrastructure, at scale I'd swap it.
Extractive answers can read choppy. They're real sentences from real papers, in rank order, with citations, not fluent prose. That's the trade for zero-setup honesty.
No conversational memory. Each question is independent; multi-turn needs history-aware query rewriting (one extra LLM call).

why a small corpus?