pathfinder

Evaluation

Locked snapshot of every offline metric: corpus + KG composition, the full ablation matrix across 119 queries on two strata, and the per-stage latency budget on a single RTX 4060.

Best nDCG@10
0.557
RRF3 (BM25 + dense + KG)
Recall@100
0.700
target ≥ 0.97
MRR@10
0.591
target ≥ 0.65
Full pipeline
315 ms
target p95 < 2000 ms

Corpus

Profiles
1,782
Jobs
1,370
Canonical skills
4,553
HAS_SKILL edges
30,369
REQUIRES_SKILL edges
5,129
Eval queries
119
50 candidate · 50 job · 19 paraphrase

Knowledge graph

Person nodes
1,782
Job nodes
1,370
Skill nodes
4,553
Other nodes
1,433
834 role · 415 desig · 53 ind · 131 loc
Total relationships
45,731
HAS_SKILL · REQUIRES_SKILL · CAN_FILL · IS_DESIGNATION · AT_LOCATION · IN_INDUSTRY

Ablation matrix

7 retrieval configurations × 3 strata. Highlighted cells are the best nDCG@10 per stratum.

Overall (mean of candidate + job tasks)

ConfigurationnDCG@10R@10R@100MRR@10Latency
BM25
0.5400.5140.7040.5670.2 ms
BGE-M3 dense
0.5270.5160.7030.5482.3 ms
RRF (BM25 + dense)
0.5510.5210.6960.5852.5 ms
Cross-encoder rerank top-25
0.5380.5350.6960.551285 ms
KG channel only
0.4250.4430.6860.45625 ms
RRF3 (BM25 + dense + KG)
0.5570.5400.7000.59130 ms
Full pipeline (RRF3 + rerank top-25)
0.5440.5360.7000.565315 ms

Original stratum (100 lexical-anchor queries)

ConfigurationnDCG@10R@10R@100MRR@10Latency
BM25
0.6040.5800.7700.6340.1 ms
BGE-M3 dense
0.5890.5830.7590.6122.3 ms
RRF (BM25 + dense)
0.6180.5890.7600.6572.5 ms
Cross-encoder rerank top-25
0.5980.5960.7600.614285 ms
KG channel only
0.4760.4970.7470.51225 ms
RRF3 (BM25 + dense + KG)
0.6210.6010.7560.66330 ms
Full pipeline (RRF3 + rerank top-25)
0.6050.5970.7560.630315 ms

Paraphrase stratum (19 Gemini-generated queries)

ConfigurationnDCG@10R@10R@100MRR@10Latency
BM25
0.2010.1620.3590.211
BGE-M3 dense
0.2010.1620.4060.211
RRF (BM25 + dense)
0.2010.1620.3580.211
Cross-encoder rerank top-25
0.2200.2150.3580.219
KG channel only
0.1580.1580.3680.158
RRF3 (BM25 + dense + KG)
0.2170.2150.4000.216
Full pipeline (RRF3 + rerank top-25)
0.2240.2150.4000.224

Latency budget per stage

  • Intent classification
    150 ms
  • Query encode (BGE-M3)
    1.7 ms
  • BM25 retrieve top-100
    0.1 ms
  • Dense cosine search
    0.6 ms
  • KG Cypher (skill overlap)
    25 ms
  • RRF fusion
    2.5 ms
  • Cross-encoder rerank (top-25)
    285 ms
  • Total — full pipeline
    315 ms

Findings

  • RRF3 (BM25 + dense + KG) is the new best on the lexical original stratum (nDCG@10 = 0.621).
  • Full pipeline (RRF3 + cross-encoder rerank top-25) is the new best on the paraphrase stratum (nDCG@10 = 0.224).
  • Each retrieval stage handles a different query distribution; the multi-stage funnel is robust across both strata.
  • Cross-encoder is the only stage that lifts paraphrase metrics, because bge-reranker-v2-m3 is trained on natural-language pairs (MS-MARCO, MIRACL).
  • Paraphrase stratum is 19/100 (Gemini Flash-Lite RPD quota) — backfill pending; relative ranking already established.