{
  "pdf": "lora-svd-equalization-cl.pdf",
  "title": "DELTA SVD-EQ: POST-HOC SPECTRAL EQUALIZA-TION FOR LORA CONTINUAL LEARNING FARS Analemma",
  "elapsed": 218.9,
  "runs_mode": 1,
  "valid_runs": 1,
  "avg_score": 3.8,
  "scores": [
    3.8
  ],
  "score_std": 0,
  "final_verdict": "Strong Reject",
  "final_confidence": 0.6,
  "conference_scores": {
    "soundness": 2.2,
    "presentation": 2.8,
    "contribution": 2,
    "overall_rating": 3.8,
    "confidence": 3
  },
  "strengths": [
    "The proposed method is genuinely post-hoc, training-free, and memory-free — a practical and underexplored niche in continual learning. The four-property characterization (training-free, memory-free, post-hoc, norm-preserving) is a clean and useful framing that clearly distinguishes SVD-EQ from prior work like EWC, O-LoRA, and FOREVER (Section 3.4).",
    "The paper includes honest and informative ablation studies. The mean-smoothing and random-spectrum baselines in Table 1 isolate norm shrinkage vs. spectral redistribution, and the null-result correlation analysis in Section 4.5 (Pearson r = −0.027, p = 0.893) that contradicts the motivating hypothesis shows intellectual honesty and strengthens the paper's credibility.",
    "The variance reduction finding (3–10× within-order std reduction, Table 2) is practically significant and arguably the strongest empirical result. Even if the mean BWT improvement is marginal, stabilizing outcomes across seeds is a meaningful contribution for practitioners (Table 2: baseline std 2.5–4.1pp vs. SVD-EQ std 0.4–1.0pp)."
  ],
  "weaknesses": [
    "The primary metric improvement is marginal and statistically insignificant: BWT improves by only +2.3pp with p = 0.059 (Section 4.2), meaning the result does not reach the standard p < 0.05 threshold. With only 9 runs (3 orders × 3 seeds), the statistical power is very low, and the claim of '15% relative reduction in forgetting' overstates a marginally significant result.",
    "The method fails on 1 out of 3 task orders: Order 6 shows degradation of −1.0pp BWT (Table 2). The paper does not provide a satisfactory explanation for why the method helps some orderings but hurts others. An order-dependent intervention that can harm performance is a significant practical concern that is acknowledged but not analyzed deeply enough.",
    "The ablation results actually undermine the core motivation. Mean-smoothing achieves +2.2pp BWT (nearly identical to SVD-EQ's +2.3pp) but shrinks the norm by 13.3% (Table 1), and the spectral concentration hypothesis is falsified (Section 4.5, r = −0.027). This suggests the equalization mechanism itself is not the primary driver — norm shrinkage is doing most of the work, which contradicts the paper's framing that spectral redistribution is the key insight. The 'norm-preserving' property claimed as a strength may actually be a design limitation.",
    "The experimental scale is very limited: only 4 text classification tasks, a single small model (Qwen3-0.6B), rank r=8, and LoRA applied only to query/value projections (Section 4.1). There is no evaluation on generation tasks, larger models, higher ranks, or full LoRA application, making generalizability highly uncertain."
  ],
  "must_fix_items": [
    "The core mechanistic claim is unsupported: the ablation shows norm shrinkage accounts for nearly all the benefit (+2.2pp for mean-smoothing vs. +2.3pp for SVD-EQ), and the spectral concentration hypothesis is falsified. The paper must either (a) revise its framing to acknowledge that norm shrinkage is the primary mechanism and spectral redistribution provides marginal additional benefit, or (b) provide evidence that spectral redistribution contributes meaningfully beyond what simple norm scaling achieves.",
    "Statistical significance is not achieved (p = 0.059). The paper should either run more seeds/orders to establish significance or substantially moderate the claims about BWT improvement. Claims like '15% relative reduction in forgetting' should be qualified as marginally significant.",
    "The negative result on Order 6 needs deeper analysis. Why does the method hurt for certain orderings? This is critical for practical adoption."
  ],
  "runs": [
    {
      "run": 1,
      "score": 3.8,
      "verdict": "Strong Reject",
      "confidence": 0.6,
      "strengths": [
        "The proposed method is genuinely post-hoc, training-free, and memory-free — a practical and underexplored niche in continual learning. The four-property characterization (training-free, memory-free, post-hoc, norm-preserving) is a clean and useful framing that clearly distinguishes SVD-EQ from prior work like EWC, O-LoRA, and FOREVER (Section 3.4).",
        "The paper includes honest and informative ablation studies. The mean-smoothing and random-spectrum baselines in Table 1 isolate norm shrinkage vs. spectral redistribution, and the null-result correlation analysis in Section 4.5 (Pearson r = −0.027, p = 0.893) that contradicts the motivating hypothesis shows intellectual honesty and strengthens the paper's credibility.",
        "The variance reduction finding (3–10× within-order std reduction, Table 2) is practically significant and arguably the strongest empirical result. Even if the mean BWT improvement is marginal, stabilizing outcomes across seeds is a meaningful contribution for practitioners (Table 2: baseline std 2.5–4.1pp vs. SVD-EQ std 0.4–1.0pp)."
      ],
      "weaknesses": [
        "The primary metric improvement is marginal and statistically insignificant: BWT improves by only +2.3pp with p = 0.059 (Section 4.2), meaning the result does not reach the standard p < 0.05 threshold. With only 9 runs (3 orders × 3 seeds), the statistical power is very low, and the claim of '15% relative reduction in forgetting' overstates a marginally significant result.",
        "The method fails on 1 out of 3 task orders: Order 6 shows degradation of −1.0pp BWT (Table 2). The paper does not provide a satisfactory explanation for why the method helps some orderings but hurts others. An order-dependent intervention that can harm performance is a significant practical concern that is acknowledged but not analyzed deeply enough.",
        "The ablation results actually undermine the core motivation. Mean-smoothing achieves +2.2pp BWT (nearly identical to SVD-EQ's +2.3pp) but shrinks the norm by 13.3% (Table 1), and the spectral concentration hypothesis is falsified (Section 4.5, r = −0.027). This suggests the equalization mechanism itself is not the primary driver — norm shrinkage is doing most of the work, which contradicts the paper's framing that spectral redistribution is the key insight. The 'norm-preserving' property claimed as a strength may actually be a design limitation.",
        "The experimental scale is very limited: only 4 text classification tasks, a single small model (Qwen3-0.6B), rank r=8, and LoRA applied only to query/value projections (Section 4.1). There is no evaluation on generation tasks, larger models, higher ranks, or full LoRA application, making generalizability highly uncertain."
      ],
      "must_fix_items": [
        "The core mechanistic claim is unsupported: the ablation shows norm shrinkage accounts for nearly all the benefit (+2.2pp for mean-smoothing vs. +2.3pp for SVD-EQ), and the spectral concentration hypothesis is falsified. The paper must either (a) revise its framing to acknowledge that norm shrinkage is the primary mechanism and spectral redistribution provides marginal additional benefit, or (b) provide evidence that spectral redistribution contributes meaningfully beyond what simple norm scaling achieves.",
        "Statistical significance is not achieved (p = 0.059). The paper should either run more seeds/orders to establish significance or substantially moderate the claims about BWT improvement. Claims like '15% relative reduction in forgetting' should be qualified as marginally significant.",
        "The negative result on Order 6 needs deeper analysis. Why does the method hurt for certain orderings? This is critical for practical adoption."
      ],
      "conference_scores": {
        "soundness": 2.2,
        "presentation": 2.8,
        "contribution": 2,
        "overall_rating": 3.8,
        "confidence": 3
      }
    }
  ]
}