bugfixes
This commit is contained in:
20
todo.md
20
todo.md
@@ -12,7 +12,7 @@ Priority key: `P0` = critical/blocking, `P1` = important, `P2` = later optimizat
|
||||
- [x] [P0] Audit `observations_ws90` and `observations_baro` for missingness, gaps, duplicates, and out-of-order rows. (completed on runtime machine)
|
||||
- [x] [P0] Validate rain label construction from `rain_mm` (counter resets, negative deltas, spikes). (completed on runtime machine)
|
||||
- [x] [P0] Measure class balance by week (rain-positive vs rain-negative). (completed on runtime machine)
|
||||
- [ ] [P1] Document known data issues and mitigation rules.
|
||||
- [x] [P1] Document known data issues and mitigation rules. (see `docs/rain_data_issues.md`)
|
||||
|
||||
## 3) Dataset and Feature Engineering
|
||||
- [x] [P1] Extract reusable dataset-builder logic from training script into a maintainable module/workflow.
|
||||
@@ -26,16 +26,16 @@ Priority key: `P0` = critical/blocking, `P1` = important, `P2` = later optimizat
|
||||
- [x] [P0] Keep logistic regression as baseline.
|
||||
- [x] [P1] Add at least one tree-based baseline (e.g. gradient boosting). (implemented via `hist_gb`; runtime evaluation pending local Python deps)
|
||||
- [x] [P0] Use strict time-based train/validation/test splits (no random shuffling).
|
||||
- [ ] [P1] Add walk-forward backtesting across multiple temporal folds.
|
||||
- [ ] [P1] Tune hyperparameters on validation data only.
|
||||
- [ ] [P1] Calibrate probabilities (Platt or isotonic) and compare calibration quality.
|
||||
- [x] [P1] Add walk-forward backtesting across multiple temporal folds. (`train_rain_model.py --walk-forward-folds`)
|
||||
- [x] [P1] Tune hyperparameters on validation data only. (`train_rain_model.py --tune-hyperparameters`)
|
||||
- [x] [P1] Calibrate probabilities (Platt or isotonic) and compare calibration quality. (`--calibration-methods`)
|
||||
- [x] [P0] Choose and lock the operating threshold based on use-case costs.
|
||||
|
||||
## 5) Evaluation and Reporting
|
||||
- [x] [P0] Report ROC-AUC, PR-AUC, confusion matrix, precision, recall, and Brier score.
|
||||
- [ ] [P1] Compare against naive baselines (persistence and simple forecast-threshold rules).
|
||||
- [x] [P1] Compare against naive baselines (persistence and simple forecast-threshold rules).
|
||||
- [ ] [P2] Slice performance by periods/weather regimes (day/night, rainy weeks, etc.).
|
||||
- [ ] [P1] Produce a short model card (data window, features, metrics, known limitations).
|
||||
- [x] [P1] Produce a short model card (data window, features, metrics, known limitations). (`--model-card-out`)
|
||||
|
||||
## 6) Packaging and Deployment
|
||||
- [x] [P1] Version model artifacts and feature schema together.
|
||||
@@ -45,10 +45,10 @@ Priority key: `P0` = critical/blocking, `P1` = important, `P2` = later optimizat
|
||||
- [ ] [P2] Add scheduled retraining with rollback to last-known-good model.
|
||||
|
||||
## 7) Monitoring and Operations
|
||||
- [ ] [P1] Track feature drift and prediction drift over time.
|
||||
- [ ] [P1] Track calibration drift and realized performance after deployment.
|
||||
- [ ] [P1] Add alerts for training/inference/data pipeline failures.
|
||||
- [ ] [P1] Document runbook for train/evaluate/deploy/rollback.
|
||||
- [x] [P1] Track feature drift and prediction drift over time. (view: `rain_feature_drift_daily`, `rain_prediction_drift_daily`)
|
||||
- [x] [P1] Track calibration drift and realized performance after deployment. (view: `rain_calibration_drift_daily`)
|
||||
- [x] [P1] Add alerts for training/inference/data pipeline failures. (`scripts/check_rain_pipeline_health.py`)
|
||||
- [x] [P1] Document runbook for train/evaluate/deploy/rollback. (see `docs/rain_model_runbook.md`)
|
||||
|
||||
## 8) Immediate Next Steps (This Week)
|
||||
- [x] [P0] Run first full data audit and label-quality checks. (completed on runtime machine)
|
||||
|
||||
Reference in New Issue
Block a user