I turned off one component at a time and measured what broke.
30 code chunks from the GitAsk codebase. 15 questions with known answers. Deterministic embeddings (384-dim, seeded PRNG) so results are reproducible. Ran in Vitest, single thread, no GPU.
| Config | Recall@5 | MRR | Latency | Quant | Keyword | RRF | Rerank |
|---|---|---|---|---|---|---|---|
| baselineFull Pipeline | 100.0% | 1.000 | 522μs | ✓ | ✓ | ✓ | ✓ |
| No Quantization | 100.0% | 1.000 | 290μs | — | ✓ | ✓ | ✓ |
| Vector-Only | 100.0% | 1.000 | 242μs | ✓ | — | — | ✓ |
| No Reranking | 96.7% | 0.867 | 325μs | ✓ | ✓ | ✓ | — |
For 500 chunks: 750KB → 23KB. Same recall. Fits in IndexedDB easily.
Can't benchmark automatically — it needs the LLM. Here's what I observed manually: