Title: LOCAL-TIME ADAMW FOR STABILITY-GAP REDUC-TION IN CONTINUAL LEARNING FARS Analemma
PDF: localtime-adamw-task-switch.pdf
Score: 4.5
Verdict: Reject
Confidence: 0.60
Elapsed: 62.3s

Strengths:
1. Clean mechanistic insight: The paper identifies a specific, concrete mechanism (AdamW bias correction becoming counterproductive at task boundaries) and traces its effect through a well-defined causal chain from timestep reset → scale factor s(1)≈0.316 → ~3× update dampening → reduced stability gap. This is a genuine mechanistic contribution rather than a black-box trick. (Sections 3.2-3.3, Equation 4, Figure 1)
2. Strong mechanism validation: The empirical update norm ratios (3.42× on Split CIFAR-100, 3.65× on Rotated MNIST) closely match the theoretical prediction of √10 ≈ 3.16, providing direct confirmation that the proposed mechanism operates as claimed. The NoBiasCorr control condition is a particularly valuable design choice—it shows that simply disabling bias correction does not help (SG worsens from 0.486 to 0.519 on Split CIFAR-100), confirming the benefit is specifically from the timestep reset dampening schedule, not from any bias-correction modification. (Section 4.3, Figure 3, Table 1)
3. Extreme simplicity and practicality: The method requires literally one line of code (resetting optimizer.state[p]['step'] = 0), introduces no new hyperparameters, and is a drop-in replacement for standard AdamW. This is a rare case where the simplicity is justified by the mechanistic explanation rather than being a limitation. (Section 3.4)

Weaknesses:
1. Extremely narrow experimental evaluation—only 2 benchmarks (Rotated MNIST, Split CIFAR-100) with a single architecture (ResNet-18). Rotated MNIST is a very simple benchmark; Split CIFAR-100, while more challenging, is still a standard small-scale benchmark. No evaluation on larger-scale benchmarks (e.g., Split ImageNet-R, DomainNet, CLEAR, or any NLP/RL continual learning benchmarks). The paper does not test whether the insight generalizes beyond these simple vision settings or to different architectures (transformers, MLPs). This severely limits confidence in the generality of the claims. (Section 4.1, Table 1)
2. No comparison with existing stability-gap mitigation methods or standard continual learning baselines. The paper only compares against CarryAll-AdamW and NoBiasCorr (a control), but does not compare against any replay method (e.g., ER, DER), regularization method (e.g., EWC, SI), or specifically any method that targets the stability gap (e.g., methods from Lange et al. 2022 or subsequent work). It is unclear whether LT-AdamW provides additive benefits on top of these methods or is redundant. The paper mentions combining with replay as 'future work' but this is a critical omission for evaluating practical impact. (Section 4, Tables 1-2)
3. The NoBiasCorr control has a fundamental confound: disabling bias correction changes the optimizer's behavior globally (at all timesteps, not just at task boundaries), making it an imperfect control for isolating the timestep-reset mechanism. A better control would be an optimizer that resets the timestep AND the moment buffers (full reset), which would disentangle the contribution of preserved moments vs. the dampening schedule. Additionally, the paper does not compare against simply reducing the learning rate at task boundaries (e.g., by 3×) with a warmup schedule, which would be the most natural alternative and would test whether the benefit is specifically from the bias-correction mechanism or just from dampening updates generally. (Sections 4.2-4.3)
4. Statistical concerns: The reported improvements on Rotated MNIST for ACC (0.628→0.621) actually show a slight degradation, and the improvement in min-ACC is modest (0.688→0.719). With only 5 seeds and the high variance visible in the error bars (e.g., LT-AdamW SG on Split CIFAR-100: 0.333±0.066, NoBiasCorr SG: 0.519±0.066—the overlap is non-trivial), the statistical robustness is questionable. The p-values are reported in text but no multiple-comparison correction is mentioned. (Table 1, Section 4.2)

Must Fix Items:
1. Add a learning-rate dampening control: compare LT-AdamW against simply multiplying the learning rate by ~0.33 at task boundaries with a linear warmup over the same period. This is essential to establish that the benefit is specifically from the bias-correction mechanism rather than from generic update dampening.
2. Add comparison with at least one replay-based and one regularization-based continual learning baseline, and test whether LT-AdamW provides complementary benefits when combined with these methods.
3. Evaluate on at least one additional, more challenging benchmark (e.g., Split ImageNet-R, DomainNet, or a task from a different modality) to assess generality beyond small-scale vision benchmarks.

Runs:
- run=1 score=4.5 verdict=Reject confidence=0.6 error=None