bugfixes

2026-03-12 19:55:51 +11:00
parent 76851f0816
commit d1237eed44
12 changed files with 1444 additions and 82 deletions
@@ -12,7 +12,7 @@ Priority key: `P0` = critical/blocking, `P1` = important, `P2` = later optimizat
 - [x] [P0] Audit `observations_ws90` and `observations_baro` for missingness, gaps, duplicates, and out-of-order rows. (completed on runtime machine)
 - [x] [P0] Validate rain label construction from `rain_mm` (counter resets, negative deltas, spikes). (completed on runtime machine)
 - [x] [P0] Measure class balance by week (rain-positive vs rain-negative). (completed on runtime machine)
- [ ] [P1] Document known data issues and mitigation rules.
+- [x] [P1] Document known data issues and mitigation rules. (see `docs/rain_data_issues.md`)

 ## 3) Dataset and Feature Engineering
 - [x] [P1] Extract reusable dataset-builder logic from training script into a maintainable module/workflow.
@@ -26,16 +26,16 @@ Priority key: `P0` = critical/blocking, `P1` = important, `P2` = later optimizat
 - [x] [P0] Keep logistic regression as baseline.
 - [x] [P1] Add at least one tree-based baseline (e.g. gradient boosting). (implemented via `hist_gb`; runtime evaluation pending local Python deps)
 - [x] [P0] Use strict time-based train/validation/test splits (no random shuffling).
- [ ] [P1] Add walk-forward backtesting across multiple temporal folds.
- [ ] [P1] Tune hyperparameters on validation data only.
- [ ] [P1] Calibrate probabilities (Platt or isotonic) and compare calibration quality.
+- [x] [P1] Add walk-forward backtesting across multiple temporal folds. (`train_rain_model.py --walk-forward-folds`)
+- [x] [P1] Tune hyperparameters on validation data only. (`train_rain_model.py --tune-hyperparameters`)
+- [x] [P1] Calibrate probabilities (Platt or isotonic) and compare calibration quality. (`--calibration-methods`)
 - [x] [P0] Choose and lock the operating threshold based on use-case costs.

 ## 5) Evaluation and Reporting
 - [x] [P0] Report ROC-AUC, PR-AUC, confusion matrix, precision, recall, and Brier score.
- [ ] [P1] Compare against naive baselines (persistence and simple forecast-threshold rules).
+- [x] [P1] Compare against naive baselines (persistence and simple forecast-threshold rules).
 - [ ] [P2] Slice performance by periods/weather regimes (day/night, rainy weeks, etc.).
- [ ] [P1] Produce a short model card (data window, features, metrics, known limitations).
+- [x] [P1] Produce a short model card (data window, features, metrics, known limitations). (`--model-card-out`)

 ## 6) Packaging and Deployment
 - [x] [P1] Version model artifacts and feature schema together.
@@ -45,10 +45,10 @@ Priority key: `P0` = critical/blocking, `P1` = important, `P2` = later optimizat
 - [ ] [P2] Add scheduled retraining with rollback to last-known-good model.

 ## 7) Monitoring and Operations
- [ ] [P1] Track feature drift and prediction drift over time.
- [ ] [P1] Track calibration drift and realized performance after deployment.
- [ ] [P1] Add alerts for training/inference/data pipeline failures.
- [ ] [P1] Document runbook for train/evaluate/deploy/rollback.
+- [x] [P1] Track feature drift and prediction drift over time. (view: `rain_feature_drift_daily`, `rain_prediction_drift_daily`)
+- [x] [P1] Track calibration drift and realized performance after deployment. (view: `rain_calibration_drift_daily`)
+- [x] [P1] Add alerts for training/inference/data pipeline failures. (`scripts/check_rain_pipeline_health.py`)
+- [x] [P1] Document runbook for train/evaluate/deploy/rollback. (see `docs/rain_model_runbook.md`)

 ## 8) Immediate Next Steps (This Week)
 - [x] [P0] Run first full data audit and label-quality checks. (completed on runtime machine)