{
  "pdf": "secure-simhash-privacy-fingerprints.pdf",
  "title": "DISTANCE-HIDING FINGERPRINTS FOR TEXT EMBED-DINGS VIA SECURE SIMHASH FARS Analemma",
  "elapsed": 184.2,
  "runs_mode": 1,
  "valid_runs": 1,
  "avg_score": 4.2,
  "scores": [
    4.2
  ],
  "score_std": 0,
  "final_verdict": "Reject",
  "final_confidence": 0.6,
  "conference_scores": {
    "soundness": 2.5,
    "presentation": 3,
    "contribution": 2.2,
    "overall_rating": 4.2,
    "confidence": 3
  },
  "strengths": [
    "Clean and elegant theoretical formulation: The XOR composition idea is simple to state and yields a clean closed-form collision probability Psec(s) = 1/2 + 1/2·p(s)^k (Equation 3), making the privacy-utility tradeoff analytically transparent. The derivation is correct and the flattening effect is mathematically well-grounded.",
    "Strong empirical Pareto dominance: Table 1 shows Secure SimHash k=4 L=512 achieves AUC@0.5=0.463 (below random guess) while maintaining Recall@10=0.780, whereas RR-SimHash and Noise-SimHash baselines cannot simultaneously achieve both meaningful privacy and utility. This is a convincing demonstration of the core claim.",
    "Well-designed ablation study: Table 2 properly isolates the XOR composition mechanism from the confound of reduced effective bits by comparing against Compute-Matched SimHash (same number of independent projections but no XOR). The 29-38% relative AUC improvement confirms the privacy gain is structural, not merely from information reduction."
  ],
  "weaknesses": [
    "Single-dataset evaluation is a major limitation: All experiments are conducted only on BEIR Quora (Section 4.1). Quora duplicate-question retrieval is a particularly favorable benchmark because it has high cosine similarities for true pairs (easy near-neighbor/far-neighbor separation). The method's effectiveness on harder benchmarks with less clear similarity structure (e.g., MS MARCO, NFCorpus, or domain-specific datasets) is entirely unknown. The authors acknowledge this only briefly in Section 5.",
    "No formal privacy guarantee: The paper uses AUC against specific attackers as a privacy metric but provides no differential privacy bound or information-theoretic guarantee. The collision probability flattening is argued intuitively ('approaches 0.5') but there is no theorem bounding an adversary's estimation error or proving indistinguishability. This is a significant gap compared to Riazi et al. (2016) which provides information-theoretic privacy bounds, and compared to RR-SimHash which inherits DP guarantees from randomized response. The AUC metric is attacker-dependent and does not guarantee security against stronger adversaries.",
    "Misleading comparison with RR-SimHash and Noise-SimHash — baseline fairness concern: The paper compares against only two specific baselines (RR-SimHash with two α values, Noise-SimHash with two σ values) but does not explore the full parameter space. For instance, RR-SimHash with intermediate α values (e.g., 1.5, 2.0) could potentially achieve Pareto points closer to Secure SimHash. Similarly, Noise-SimHash with σ between 0.01 and 0.05 is not reported. This selective reporting risks unfair comparison (HF_UNFAIR_BASELINE risk). Additionally, the paper does not compare against Riazi et al. (2016)'s secure binary embeddings, which is directly related prior work."
  ],
  "must_fix_items": [
    "Evaluate on at least 2-3 additional BEIR datasets (e.g., MS MARCO, NFCorpus, SciFact) to demonstrate generalizability beyond Quora's favorable similarity distribution.",
    "Provide a formal privacy guarantee (e.g., ε-DP bound on the fingerprint, or indistinguishability theorem), rather than relying solely on empirical AUC against fixed attackers.",
    "Report full parameter sweeps for RR-SimHash and Noise-SimHash baselines to enable fair Pareto frontier comparison, and include Riazi et al. (2016) as a baseline."
  ],
  "runs": [
    {
      "run": 1,
      "score": 4.2,
      "verdict": "Reject",
      "confidence": 0.6,
      "strengths": [
        "Clean and elegant theoretical formulation: The XOR composition idea is simple to state and yields a clean closed-form collision probability Psec(s) = 1/2 + 1/2·p(s)^k (Equation 3), making the privacy-utility tradeoff analytically transparent. The derivation is correct and the flattening effect is mathematically well-grounded.",
        "Strong empirical Pareto dominance: Table 1 shows Secure SimHash k=4 L=512 achieves AUC@0.5=0.463 (below random guess) while maintaining Recall@10=0.780, whereas RR-SimHash and Noise-SimHash baselines cannot simultaneously achieve both meaningful privacy and utility. This is a convincing demonstration of the core claim.",
        "Well-designed ablation study: Table 2 properly isolates the XOR composition mechanism from the confound of reduced effective bits by comparing against Compute-Matched SimHash (same number of independent projections but no XOR). The 29-38% relative AUC improvement confirms the privacy gain is structural, not merely from information reduction."
      ],
      "weaknesses": [
        "Single-dataset evaluation is a major limitation: All experiments are conducted only on BEIR Quora (Section 4.1). Quora duplicate-question retrieval is a particularly favorable benchmark because it has high cosine similarities for true pairs (easy near-neighbor/far-neighbor separation). The method's effectiveness on harder benchmarks with less clear similarity structure (e.g., MS MARCO, NFCorpus, or domain-specific datasets) is entirely unknown. The authors acknowledge this only briefly in Section 5.",
        "No formal privacy guarantee: The paper uses AUC against specific attackers as a privacy metric but provides no differential privacy bound or information-theoretic guarantee. The collision probability flattening is argued intuitively ('approaches 0.5') but there is no theorem bounding an adversary's estimation error or proving indistinguishability. This is a significant gap compared to Riazi et al. (2016) which provides information-theoretic privacy bounds, and compared to RR-SimHash which inherits DP guarantees from randomized response. The AUC metric is attacker-dependent and does not guarantee security against stronger adversaries.",
        "Misleading comparison with RR-SimHash and Noise-SimHash — baseline fairness concern: The paper compares against only two specific baselines (RR-SimHash with two α values, Noise-SimHash with two σ values) but does not explore the full parameter space. For instance, RR-SimHash with intermediate α values (e.g., 1.5, 2.0) could potentially achieve Pareto points closer to Secure SimHash. Similarly, Noise-SimHash with σ between 0.01 and 0.05 is not reported. This selective reporting risks unfair comparison (HF_UNFAIR_BASELINE risk). Additionally, the paper does not compare against Riazi et al. (2016)'s secure binary embeddings, which is directly related prior work."
      ],
      "must_fix_items": [
        "Evaluate on at least 2-3 additional BEIR datasets (e.g., MS MARCO, NFCorpus, SciFact) to demonstrate generalizability beyond Quora's favorable similarity distribution.",
        "Provide a formal privacy guarantee (e.g., ε-DP bound on the fingerprint, or indistinguishability theorem), rather than relying solely on empirical AUC against fixed attackers.",
        "Report full parameter sweeps for RR-SimHash and Noise-SimHash baselines to enable fair Pareto frontier comparison, and include Riazi et al. (2016) as a baseline."
      ],
      "conference_scores": {
        "soundness": 2.5,
        "presentation": 3,
        "contribution": 2.2,
        "overall_rating": 4.2,
        "confidence": 3
      }
    }
  ]
}