{
  "pdf": "dp-eigenspectrum-monitor-logs.pdf",
  "title": "DIFFERENTIALLY PRIVATE EIGENSPECTRUM MONI-TOR LOGS FOR HALLUCINATION DETECTION FARS Analemma",
  "elapsed": 52.4,
  "runs_mode": 1,
  "valid_runs": 1,
  "avg_score": 3.5,
  "scores": [
    3.5
  ],
  "score_std": 0,
  "final_verdict": "Strong Reject",
  "final_confidence": 0.6,
  "conference_scores": {
    "soundness": 2.3,
    "presentation": 2.8,
    "contribution": 2,
    "overall_rating": 3.5,
    "confidence": 3
  },
  "strengths": [
    "Pre-registered decision criterion (Section 3.5) establishes a clear viability threshold (≤5-point AUROC drop) before experiments, preventing post-hoc rationalization of negative results. This is a methodological best practice rarely seen in ML privacy work.",
    "The finding that eigenspectrum compression (40,960→10 dimensions) provides inherent privacy (Table 2: attackers at chance level even without DP noise, Top-1: 0.72% vs 0.50% chance) is a genuinely novel and useful observation for practitioners designing LLM monitoring systems.",
    "Honest reporting of a negative result with careful analysis of why it fails (Section 4.5: nonlinear eigenvalue transformation amplifies perturbations, small eigenvalues especially vulnerable with relative error δ/λ diverging as λ→0). This provides actionable insight for future mechanism design.",
    "The pilot diagnostic framework (Table 1, Section 4.2) accurately predicts full-experiment noise-to-signal ratios within 2-3%, enabling efficient screening of DP mechanisms before costly full evaluations."
  ],
  "weaknesses": [
    "Extremely narrow experimental scope: only one model (Llama-3.1-8B-Instruct), one dataset (modified SQuAD v2.0), one hidden layer (layer 16), one value of K (10), and a single hallucination detection method (INSIDE's EigenScore). The negative conclusion 'DP-protected eigenspectrum monitoring is not viable' is vastly overgeneralized from this single configuration. Different models, layers, or detection methods could yield very different results.",
    "The privacy evaluation is limited to canary-ID attacks only (Section 3.4). This is a narrow threat model—membership inference via synthetic IDs. The paper does not evaluate prompt reconstruction attacks, attribute inference, or training data extraction, which are the primary privacy concerns cited in the motivation (Morris et al., 2023; Nikolaou et al., 2025; Dong et al., 2025). Claiming 'substantial inherent privacy' from chance-level canary-ID attacks overreaches the evidence.",
    "The DP mechanism is applied to vec(Z) before eigenspectrum computation (Section 3.3), which is suboptimal. Adding noise to the 40,960-dimensional vectorized matrix and then computing eigenvalues is wasteful—noise directly perturbing the K=10 eigenvalues or the K×K Gram matrix would be far more efficient and is a natural alternative the paper mentions in Section 4.5 but never tests. The negative result may be an artifact of this suboptimal mechanism placement rather than a fundamental incompatibility.",
    "The clipping radius CZ = 34.05 is set to the 95th percentile (Section 4.1), meaning ~5% of examples are clipped. This introduces bias before any DP noise is added, yet the 'clip-only baseline' is treated as the unattainable gold standard. The actual utility gap attributable to DP alone may be smaller than reported, since clipping itself already degrades signal.",
    "No statistical significance tests are reported for the AUROC comparisons (Table 2). The standard deviations (±0.010, ±0.008, ±0.009) suggest the differences may be statistically significant, but this is never verified. Additionally, the AUROC values themselves are quite low (0.672 baseline), raising questions about whether eigenspectrum monitoring is practically useful even without DP."
  ],
  "must_fix_items": [
    "Soften the generalization of the negative conclusion: 'DP-protected eigenspectrum monitoring is not viable' should be qualified as 'at tested privacy budgets, with the specific mechanism placement (noise on vec(Z)), on this single model/dataset configuration.'",
    "Evaluate at least one alternative DP mechanism placement: noise on the Gram matrix Σ (K×K, only 100 entries for K=10) rather than on vec(Z) (40,960 entries). This is the most obvious improvement and is acknowledged in Section 4.5 but never tested.",
    "Broaden the privacy evaluation beyond canary-ID attacks. At minimum, evaluate a prompt reconstruction or embedding inversion attack to justify the claim of 'substantial inherent privacy.'"
  ],
  "runs": [
    {
      "run": 1,
      "score": 3.5,
      "verdict": "Strong Reject",
      "confidence": 0.6,
      "strengths": [
        "Pre-registered decision criterion (Section 3.5) establishes a clear viability threshold (≤5-point AUROC drop) before experiments, preventing post-hoc rationalization of negative results. This is a methodological best practice rarely seen in ML privacy work.",
        "The finding that eigenspectrum compression (40,960→10 dimensions) provides inherent privacy (Table 2: attackers at chance level even without DP noise, Top-1: 0.72% vs 0.50% chance) is a genuinely novel and useful observation for practitioners designing LLM monitoring systems.",
        "Honest reporting of a negative result with careful analysis of why it fails (Section 4.5: nonlinear eigenvalue transformation amplifies perturbations, small eigenvalues especially vulnerable with relative error δ/λ diverging as λ→0). This provides actionable insight for future mechanism design.",
        "The pilot diagnostic framework (Table 1, Section 4.2) accurately predicts full-experiment noise-to-signal ratios within 2-3%, enabling efficient screening of DP mechanisms before costly full evaluations."
      ],
      "weaknesses": [
        "Extremely narrow experimental scope: only one model (Llama-3.1-8B-Instruct), one dataset (modified SQuAD v2.0), one hidden layer (layer 16), one value of K (10), and a single hallucination detection method (INSIDE's EigenScore). The negative conclusion 'DP-protected eigenspectrum monitoring is not viable' is vastly overgeneralized from this single configuration. Different models, layers, or detection methods could yield very different results.",
        "The privacy evaluation is limited to canary-ID attacks only (Section 3.4). This is a narrow threat model—membership inference via synthetic IDs. The paper does not evaluate prompt reconstruction attacks, attribute inference, or training data extraction, which are the primary privacy concerns cited in the motivation (Morris et al., 2023; Nikolaou et al., 2025; Dong et al., 2025). Claiming 'substantial inherent privacy' from chance-level canary-ID attacks overreaches the evidence.",
        "The DP mechanism is applied to vec(Z) before eigenspectrum computation (Section 3.3), which is suboptimal. Adding noise to the 40,960-dimensional vectorized matrix and then computing eigenvalues is wasteful—noise directly perturbing the K=10 eigenvalues or the K×K Gram matrix would be far more efficient and is a natural alternative the paper mentions in Section 4.5 but never tests. The negative result may be an artifact of this suboptimal mechanism placement rather than a fundamental incompatibility.",
        "The clipping radius CZ = 34.05 is set to the 95th percentile (Section 4.1), meaning ~5% of examples are clipped. This introduces bias before any DP noise is added, yet the 'clip-only baseline' is treated as the unattainable gold standard. The actual utility gap attributable to DP alone may be smaller than reported, since clipping itself already degrades signal.",
        "No statistical significance tests are reported for the AUROC comparisons (Table 2). The standard deviations (±0.010, ±0.008, ±0.009) suggest the differences may be statistically significant, but this is never verified. Additionally, the AUROC values themselves are quite low (0.672 baseline), raising questions about whether eigenspectrum monitoring is practically useful even without DP."
      ],
      "must_fix_items": [
        "Soften the generalization of the negative conclusion: 'DP-protected eigenspectrum monitoring is not viable' should be qualified as 'at tested privacy budgets, with the specific mechanism placement (noise on vec(Z)), on this single model/dataset configuration.'",
        "Evaluate at least one alternative DP mechanism placement: noise on the Gram matrix Σ (K×K, only 100 entries for K=10) rather than on vec(Z) (40,960 entries). This is the most obvious improvement and is acknowledged in Section 4.5 but never tested.",
        "Broaden the privacy evaluation beyond canary-ID attacks. At minimum, evaluate a prompt reconstruction or embedding inversion attack to justify the claim of 'substantial inherent privacy.'"
      ],
      "conference_scores": {
        "soundness": 2.3,
        "presentation": 2.8,
        "contribution": 2,
        "overall_rating": 3.5,
        "confidence": 3
      }
    }
  ]
}