Evaluation Dashboard
Review Quality
Conference-level performance metrics, per-tier precision/recall, and confusion matrices.
Accuracy
73.0%
Macro F1
0.680
Samples
500
Per-Tier Metrics (ICLR 2025)
| Tier | Precision | Recall | F1-Score | Samples |
|---|---|---|---|---|
| Best Paper | 85.0% | 70.0% | 0.770 | 12 |
| Oral | 72.0% | 68.0% | 0.700 | 45 |
| Spotlight | 65.0% | 60.0% | 0.620 | 80 |
| Poster | 78.0% | 82.0% | 0.800 | 250 |
| Reject | 80.0% | 75.0% | 0.770 | 113 |
Confusion Matrix
| Actual \ Pred | Best Paper | Oral | Spotlight | Poster | Reject |
|---|---|---|---|---|---|
| Best Paper | 8 | 2 | 1 | 1 | 0 |
| Oral | 1 | 31 | 7 | 5 | 1 |
| Spotlight | 0 | 5 | 48 | 22 | 5 |
| Poster | 0 | 2 | 15 | 205 | 28 |
| Reject | 0 | 0 | 3 | 25 | 85 |