{
  "pdf": "a94e486a-d870-4ced-93c6-1f1f577f39d1.pdf",
  "title": "MIDPC LORA: INTERMEDIATE SVD SLICES CONTINUAL LEARNING WITH LOW-RANK ADAPTA-TION FARS",
  "elapsed": 81.0,
  "runs_mode": 1,
  "valid_runs": 1,
  "avg_score": 4.8,
  "scores": [
    4.8
  ],
  "score_std": 0.0,
  "final_verdict": "Reject",
  "final_confidence": 0.78,
  "conference_scores": null,
  "strengths": [
    "The paper identifies a legitimate and previously unexplored design choice in the SVD-slice initialization space: the choice of which singular components to use for LoRA initialization is not binary (top vs. bottom), and the intermediate region is a reasonable thing to investigate. The unifying framework (Eq. 1–2, Section 3.3) that shows PiSSA (s=0), MiLoRA (s=rmax−r), and MidPC (s=⌊rmax/2⌋) are special cases of a single parameterized initialization is clean and intellectually satisfying. Evidence: Section 3.3, Equations 1–2, Figure 1.",
    "The learning rate matching experiment (Table 2, Section 4.4) is a meaningful confound control. Different SVD slices have different singular value magnitudes, which change effective learning rates. The 91.9% gap retention vs MiLoRA after LR matching provides evidence that the MidPC advantage over MiLoRA is not merely a conditioning artifact. This is a non-trivial experimental control that many similar papers omit. Evidence: Table 2, Section 4.4.",
    "MidPC demonstrates notably stable BWT (−1.7%) across both task orders (Table 3), while MiLoRA shows order-dependent variance (BWT ranges from −7.2% to −2.8%). This order-robustness is a practically useful property. Evidence: Table 3, Section 4.5."
  ],
  "weaknesses": [
    "The core contribution is extremely thin: it is a one-line change to an existing initialization scheme (set s = rmax/2 instead of s = 0 or s = rmax − r). There is no theoretical justification for why the middle slice should be optimal—only a hypothesis (Section 3.2) and 'intuition' (Section 3.4). The paper does not derive any property of the middle slice (e.g., conditioning bounds, gradient dynamics, interference analysis) that would distinguish it from any other fixed slice index. The entire contribution reduces to a single hyperparameter setting with empirical validation on one model and one benchmark. Evidence: Section 3.3 (Eq. 1 is just slicing the SVD at index s), Section 3.4 ('Intuition' section with no formal content).",
    "Evaluation is critically narrow: single model (Qwen3-0.6B), single benchmark (FOREVER Standard CL, 5 text classification tasks), single rank (r=8), single start index (s=508). No sweep over s values is presented anywhere—this is especially problematic because the central claim is that the *intermediate* region is best, but no evidence shows what happens at, e.g., s=128, s=256, s=384, s=640, s=768, s=896. Without this sweep, we cannot tell whether there is a smooth curve (consistent with the 'intermediate is best' story) or whether s=508 just happens to work well by chance for this particular model/benchmark/rank combination. Evidence: Table 1 (only 3 s values reported: 0, 508, 1016), no Figure/Table showing OP/BWT as a function of s.",
    "The paper claims statistical significance (Section 4.2: 'p < 0.01 vs PiSSA, p < 0.05 vs MiLoRA') but reports only n=6 runs (2 orders × 3 seeds). With n=6 and the large standard deviations reported (PiSSA: ±4.9/±6.5, MiLoRA: ±2.7/±2.5, MidPC: ±1.3/±2.0), the significance tests are fragile. More critically, the FOREVER† baseline results (Table 1) use a fundamentally different setup (4 tasks, different model) and are included for comparison despite being non-comparable—the paper does not acknowledge this confound clearly enough. The comparison against PiSSA is further weakened by the LR-matching analysis itself: only 35.2% of the BWT gap is retained after LR matching, meaning nearly two-thirds of MidPC's advantage over PiSSA is a conditioning artifact, not a genuine spectral effect. Evidence: Table 1 (FOREVER† footnote), Table 2 (35.2% gap retained vs PiSSA), Section 4.2 (significance claims with n=6)."
  ],
  "must_fix_items": [
    "Add a sweep over multiple s values (e.g., s ∈ {0, 128, 256, 384, 508, 640, 768, 896, 1016}) to demonstrate that the intermediate region systematically outperforms endpoints, rather than just comparing three points. Without this, the 'intermediate is best' claim is supported by exactly one data point (s=508).",
    "Provide theoretical or analytical justification for why s = rmax/2 is a good choice. Currently the paper offers only hand-waving intuition (Section 3.4). At minimum, derive conditions under which the intermediate slice provides better stability-plasticity trade-off, or analyze the gradient dynamics as a function of slice position.",
    "Evaluate on at least one additional model (e.g., a 1B+ model) and one additional continual learning benchmark. Single-model, single-benchmark evaluation is insufficient for a general claim about SVD initialization strategy."
  ],
  "runs": [
    {
      "run": 1,
      "score": 4.8,
      "verdict": "Reject",
      "confidence": 0.78,
      "strengths": [
        "The paper identifies a legitimate and previously unexplored design choice in the SVD-slice initialization space: the choice of which singular components to use for LoRA initialization is not binary (top vs. bottom), and the intermediate region is a reasonable thing to investigate. The unifying framework (Eq. 1–2, Section 3.3) that shows PiSSA (s=0), MiLoRA (s=rmax−r), and MidPC (s=⌊rmax/2⌋) are special cases of a single parameterized initialization is clean and intellectually satisfying. Evidence: Section 3.3, Equations 1–2, Figure 1.",
        "The learning rate matching experiment (Table 2, Section 4.4) is a meaningful confound control. Different SVD slices have different singular value magnitudes, which change effective learning rates. The 91.9% gap retention vs MiLoRA after LR matching provides evidence that the MidPC advantage over MiLoRA is not merely a conditioning artifact. This is a non-trivial experimental control that many similar papers omit. Evidence: Table 2, Section 4.4.",
        "MidPC demonstrates notably stable BWT (−1.7%) across both task orders (Table 3), while MiLoRA shows order-dependent variance (BWT ranges from −7.2% to −2.8%). This order-robustness is a practically useful property. Evidence: Table 3, Section 4.5."
      ],
      "weaknesses": [
        "The core contribution is extremely thin: it is a one-line change to an existing initialization scheme (set s = rmax/2 instead of s = 0 or s = rmax − r). There is no theoretical justification for why the middle slice should be optimal—only a hypothesis (Section 3.2) and 'intuition' (Section 3.4). The paper does not derive any property of the middle slice (e.g., conditioning bounds, gradient dynamics, interference analysis) that would distinguish it from any other fixed slice index. The entire contribution reduces to a single hyperparameter setting with empirical validation on one model and one benchmark. Evidence: Section 3.3 (Eq. 1 is just slicing the SVD at index s), Section 3.4 ('Intuition' section with no formal content).",
        "Evaluation is critically narrow: single model (Qwen3-0.6B), single benchmark (FOREVER Standard CL, 5 text classification tasks), single rank (r=8), single start index (s=508). No sweep over s values is presented anywhere—this is especially problematic because the central claim is that the *intermediate* region is best, but no evidence shows what happens at, e.g., s=128, s=256, s=384, s=640, s=768, s=896. Without this sweep, we cannot tell whether there is a smooth curve (consistent with the 'intermediate is best' story) or whether s=508 just happens to work well by chance for this particular model/benchmark/rank combination. Evidence: Table 1 (only 3 s values reported: 0, 508, 1016), no Figure/Table showing OP/BWT as a function of s.",
        "The paper claims statistical significance (Section 4.2: 'p < 0.01 vs PiSSA, p < 0.05 vs MiLoRA') but reports only n=6 runs (2 orders × 3 seeds). With n=6 and the large standard deviations reported (PiSSA: ±4.9/±6.5, MiLoRA: ±2.7/±2.5, MidPC: ±1.3/±2.0), the significance tests are fragile. More critically, the FOREVER† baseline results (Table 1) use a fundamentally different setup (4 tasks, different model) and are included for comparison despite being non-comparable—the paper does not acknowledge this confound clearly enough. The comparison against PiSSA is further weakened by the LR-matching analysis itself: only 35.2% of the BWT gap is retained after LR matching, meaning nearly two-thirds of MidPC's advantage over PiSSA is a conditioning artifact, not a genuine spectral effect. Evidence: Table 1 (FOREVER† footnote), Table 2 (35.2% gap retained vs PiSSA), Section 4.2 (significance claims with n=6)."
      ],
      "must_fix_items": [
        "Add a sweep over multiple s values (e.g., s ∈ {0, 128, 256, 384, 508, 640, 768, 896, 1016}) to demonstrate that the intermediate region systematically outperforms endpoints, rather than just comparing three points. Without this, the 'intermediate is best' claim is supported by exactly one data point (s=508).",
        "Provide theoretical or analytical justification for why s = rmax/2 is a good choice. Currently the paper offers only hand-waving intuition (Section 3.4). At minimum, derive conditions under which the intermediate slice provides better stability-plasticity trade-off, or analyze the gradient dynamics as a function of slice position.",
        "Evaluate on at least one additional model (e.g., a 1B+ model) and one additional continual learning benchmark. Single-model, single-benchmark evaluation is insufficient for a general claim about SVD initialization strategy."
      ],
      "conference_scores": null
    }
  ]
}
