pathfinder

Architecture

Measured per-stage latency on the live deploy (rrf3_rerank pipeline, 119-query overall mean): 315 ms p95 end-to-end, well under the 2 s target. Stage timings are emitted on every search via Server-Sent Events.

StagePipeline stepBudgetModel / library
0Intent router150 msGemini 2.5 Flash-Lite + Instructor (lru-cached)
1ABM25 retrieval0.2 msBM25S BM25+, k1=1.5, b=0.75
1BDense retrieval2.3 msBGE-M3 (FP16) → in-memory NumPy cosine
1CKG retrieval25 msCypher templates over Neo4j AuraDB Free
2RRF3 fusion0.1 msk=60, BM25 + dense + KG ranks
3Cross-encoder rerank285 msbge-reranker-v2-m3 FP16, top-25 funnel
4Per-stage scores → SSE5 msFastAPI StreamingResponse
        query
          │
          ▼
   intent (Gemini)
          │
   ┌──────┼──────┐
   ▼      ▼      ▼
  BM25  dense   KG          ← three parallel channels
   │      │      │
   └──────┼──────┘
          ▼
       RRF k=60               ← rank-fusion (RRF3)
          │
          ▼
   cross-encoder rerank       ← bge-reranker-v2-m3, top-25 funnel
          │
          ▼
   results + per-stage
   scores via SSE