Title: AUDITING HNSW INDEX LEAKAGE: RECOVERING EMBEDDING GEOMETRY FROM GRAPH TOPOLOGY FARS
PDF: hnsw-topology-deanonymization.pdf
Score: 4.2
Verdict: Reject
Confidence: 0.60
Elapsed: 57.4s

Strengths:
1. Novel and timely security analysis: The paper identifies a previously underappreciated attack surface—HNSW index topology—and demonstrates that it leaks more geometric information than adjacency lists alone. This is a meaningful security finding for the RAG/vector database community (Section 1, Section 5 'Security Implications').
2. Well-motivated technical insight with empirical support: The degree-penalized edge weighting (Equation 1) is intuitive and directly addresses a real problem (hub distortion in small-world graphs). The sanity check against Erdős-Rényi random graphs (Table 3) compellingly shows the attack exploits genuine metric structure, not mere graph density—ER yields near-chance recall (0.0013) vs. HNSW's 0.3475, a 269× gap.
3. Dimension-dependent finding adds nuance: The discovery that the attack helps on high-dimensional (768-d MSMARCO-10K, +28% Recall@10) but hurts on low-dimensional (128-d SIFT10K, -23%) data is an interesting and non-obvious result with practical implications for risk assessment (Table 1, Section 4.2).
4. Robustness to construction randomness: Results are consistent across three HNSW construction seeds with standard deviation <0.003 (Section 4.2), lending credibility to the findings.

Weaknesses:
1. Extremely limited experimental scale: Only two datasets, both at n=10,000. Real HNSW deployments operate at millions to billions of vectors. The authors acknowledge this limitation (Section 5) but it fundamentally undermines the practical relevance of the security claim. Whether degree-penalized geodesic embedding scales to n=1M+ with acceptable computational cost and reconstruction quality is entirely unknown.
2. Main result is modest and methodologically questionable: Recall@10 of 0.4164 means the attacker recovers fewer than half of the true 10-nearest neighbors even under the most favorable (high-dim) setting. The '28% improvement' claim is relative (0.3248→0.4164), while the absolute gain is only +0.092 in Recall@10. This raises the question: is this level of leakage practically threatening? The paper does not discuss what an attacker could actually do with ~42% kNN recovery (Section 4.2).
3. Missing critical baselines and comparisons: No comparison to ordinal embedding methods (Terada & Luxburg 2014, Cucuringu & Woodworth 2015) cited in related work, which also aim to reconstruct geometry from graph structure. No comparison to other graph-based reconstruction approaches (e.g., Laplacian eigenmaps, node2vec-style embeddings). The adjacency-only baseline is trivially simple; the absence of more sophisticated baselines makes it impossible to assess whether degree-penalized geodesic embedding is the best approach or merely a reasonable one (Section 4.1, Table 1).
4. Ablation studies use mismatched hyperparameters: Table 2 ablation uses α=1.0, d'=32, L=256, but the main results in Table 1 use 'optimized parameters (L=2000–3000, d'=128, α=3–4)' (Section 4.1). The ablation thus does not actually explain the main result's behavior—why does the method work better on high-dim data? The ablation is only on SIFT10K (the low-dim dataset where the method underperforms), with no ablation on MSMARCO-10K (Table 2 caption).
5. No statistical significance testing: Despite claiming 'highly consistent' results across seeds (Section 4.2), no p-values, confidence intervals, or formal significance tests are reported. With only 3 seeds and std <0.003, the comparison between methods may not be statistically meaningful, especially for small absolute differences.
6. Incomplete threat model analysis: The paper assumes access to the full layer-0 adjacency list, which is a strong assumption. In many deployments, an attacker who can read the index file could also directly access the vectors. The scenario where topology is leaked but vectors remain secret is not clearly motivated as realistic (Section 3.1).

Must Fix Items:
1. Add ablation studies on MSMARCO-10K (the high-dimensional dataset where the method actually works) to explain the dimension-dependent effect, rather than only ablating on SIFT10K where the method underperforms.
2. Add at least one non-trivial reconstruction baseline from the ordinal/graph embedding literature (e.g., Cucuringu & Woodworth 2015 synchronization method, or Laplacian eigenmaps) to properly contextualize the contribution.
3. Report statistical significance or confidence intervals for all key comparisons, not just standard deviations across 3 seeds.
4. Clarify the realistic threat scenario: when would an attacker have access to the full HNSW adjacency list but not the underlying vectors? Discuss deployment configurations where this is plausible.

Runs:
- run=1 score=4.2 verdict=Reject confidence=0.6 error=None