{
  "pdf": "overlap-lbfgs-collocation-resampling.pdf",
  "title": "OVERLAP-RESAMPLED L-BFGS PHYSICS-INFORMED NEURAL NETWORKS",
  "elapsed": 54.9,
  "runs_mode": 1,
  "valid_runs": 1,
  "avg_score": 4,
  "scores": [
    4
  ],
  "score_std": 0,
  "final_verdict": "Reject",
  "final_confidence": 0.6,
  "conference_scores": {
    "soundness": 2.5,
    "presentation": 2.8,
    "contribution": 2.2,
    "overall_rating": 4,
    "confidence": 3
  },
  "strengths": [
    "Identifies and addresses a real and practical incompatibility in PINN training: L-BFGS requires consistent gradients but collocation resampling changes the loss function. This tension is well-known among practitioners but had no principled solution. (Section 3.1, Eq. 2-3)",
    "The overlap-set curvature pair computation (Eq. 3) is a clean, principled adaptation of multi-batch L-BFGS (Berahas et al., 2016) to the PINN setting. The mapping from mini-batch stochasticity to collocation resampling is natural and the solution is algorithmically simple—compute gradient differences only on shared points. (Section 3.2)",
    "Cautious update rule (Eq. 4) provides practical robustness: the ablation in Table 3 shows the method works stably even at o=0.25, well below the theoretical o≥0.8 threshold, with skip rate adapting from 15.2% to 21.4% as overlap decreases. This demonstrates the filtering mechanism effectively handles noisy curvature estimates. (Section 3.3, Table 3, Figure 2b)"
  ],
  "weaknesses": [
    "Extremely limited experimental evaluation: only two benchmark problems (1D ice-shelf inverse, 2D Poisson forward), each relatively simple. No higher-dimensional problems, no time-dependent PDEs, no problems with complex geometries, and no comparison against adaptive sampling methods like RAD/RAR-G that also address coverage. This makes it hard to assess generalizability. (Section 4, Tables 1-3)",
    "Statistical rigor is weak: only 3 random seeds per configuration, and standard deviations are large relative to means (e.g., Berr = 8.06 ± 4.71 for Overlap-LBFGS in Table 1—the std is 58% of the mean). The 7% improvement over Adam+Resampling (8.06 vs 8.63 × 10^-4) is not statistically significant given this variance, yet is highlighted as a key result. (Table 1, Section 4.2)",
    "On the forward problem (Table 2), Overlap-LBFGS is 2.1× worse than the standard Adam→Fixed L-BFGS baseline (7.03 vs 3.42 × 10^-4), and uses a 67% larger gradient budget (50K vs 30K). The paper frames this as acceptable, but the method's advantage is confined to inverse problems where resampling helps—on canonical forward problems it is strictly inferior to the standard approach. (Table 2, Section 4.3)",
    "The three-phase pipeline introduces multiple hyperparameters (phase transition points, overlap fraction o, cautious update threshold ε) that require problem-specific tuning. The paper provides no guidance on how to set these, and the warmstart ablation (Section 4.6) shows catastrophic failure without it (19× worse), suggesting the method is brittle to pipeline configuration. (Section 3.4, Section 4.6)"
  ],
  "must_fix_items": [
    "Add statistical significance testing (e.g., paired t-test or bootstrap) for the ice-shelf results—the claimed 7% improvement over Adam+Resampling is within noise given the large standard deviations and only 3 seeds.",
    "Report results with matched budgets for Table 2—Overlap-LBFGS uses 50K evaluations vs 30K for baselines, making the comparison unfair. Either match budgets or explicitly account for the budget difference in the analysis.",
    "Test on at least 2-3 additional problem types (e.g., time-dependent PDE, higher-dimensional problem, or a problem with complex geometry) to demonstrate generalizability beyond the two simple benchmarks."
  ],
  "runs": [
    {
      "run": 1,
      "score": 4,
      "verdict": "Reject",
      "confidence": 0.6,
      "strengths": [
        "Identifies and addresses a real and practical incompatibility in PINN training: L-BFGS requires consistent gradients but collocation resampling changes the loss function. This tension is well-known among practitioners but had no principled solution. (Section 3.1, Eq. 2-3)",
        "The overlap-set curvature pair computation (Eq. 3) is a clean, principled adaptation of multi-batch L-BFGS (Berahas et al., 2016) to the PINN setting. The mapping from mini-batch stochasticity to collocation resampling is natural and the solution is algorithmically simple—compute gradient differences only on shared points. (Section 3.2)",
        "Cautious update rule (Eq. 4) provides practical robustness: the ablation in Table 3 shows the method works stably even at o=0.25, well below the theoretical o≥0.8 threshold, with skip rate adapting from 15.2% to 21.4% as overlap decreases. This demonstrates the filtering mechanism effectively handles noisy curvature estimates. (Section 3.3, Table 3, Figure 2b)"
      ],
      "weaknesses": [
        "Extremely limited experimental evaluation: only two benchmark problems (1D ice-shelf inverse, 2D Poisson forward), each relatively simple. No higher-dimensional problems, no time-dependent PDEs, no problems with complex geometries, and no comparison against adaptive sampling methods like RAD/RAR-G that also address coverage. This makes it hard to assess generalizability. (Section 4, Tables 1-3)",
        "Statistical rigor is weak: only 3 random seeds per configuration, and standard deviations are large relative to means (e.g., Berr = 8.06 ± 4.71 for Overlap-LBFGS in Table 1—the std is 58% of the mean). The 7% improvement over Adam+Resampling (8.06 vs 8.63 × 10^-4) is not statistically significant given this variance, yet is highlighted as a key result. (Table 1, Section 4.2)",
        "On the forward problem (Table 2), Overlap-LBFGS is 2.1× worse than the standard Adam→Fixed L-BFGS baseline (7.03 vs 3.42 × 10^-4), and uses a 67% larger gradient budget (50K vs 30K). The paper frames this as acceptable, but the method's advantage is confined to inverse problems where resampling helps—on canonical forward problems it is strictly inferior to the standard approach. (Table 2, Section 4.3)",
        "The three-phase pipeline introduces multiple hyperparameters (phase transition points, overlap fraction o, cautious update threshold ε) that require problem-specific tuning. The paper provides no guidance on how to set these, and the warmstart ablation (Section 4.6) shows catastrophic failure without it (19× worse), suggesting the method is brittle to pipeline configuration. (Section 3.4, Section 4.6)"
      ],
      "must_fix_items": [
        "Add statistical significance testing (e.g., paired t-test or bootstrap) for the ice-shelf results—the claimed 7% improvement over Adam+Resampling is within noise given the large standard deviations and only 3 seeds.",
        "Report results with matched budgets for Table 2—Overlap-LBFGS uses 50K evaluations vs 30K for baselines, making the comparison unfair. Either match budgets or explicitly account for the budget difference in the analysis.",
        "Test on at least 2-3 additional problem types (e.g., time-dependent PDE, higher-dimensional problem, or a problem with complex geometry) to demonstrate generalizability beyond the two simple benchmarks."
      ],
      "conference_scores": {
        "soundness": 2.5,
        "presentation": 2.8,
        "contribution": 2.2,
        "overall_rating": 4,
        "confidence": 3
      }
    }
  ]
}