Evaluation Dashboard

Review Quality

Conference-level performance metrics, per-tier precision/recall, and confusion matrices.

Accuracy
73.0%
Macro F1
0.680
Samples
500

Per-Tier Metrics (ICLR 2025)

TierPrecisionRecallF1-ScoreSamples
Best Paper85.0%70.0%0.77012
Oral72.0%68.0%0.70045
Spotlight65.0%60.0%0.62080
Poster78.0%82.0%0.800250
Reject80.0%75.0%0.770113

Confusion Matrix

Actual \ PredBest PaperOralSpotlightPosterReject
Best Paper82110
Oral131751
Spotlight0548225
Poster021520528
Reject0032585