work on model training

This commit is contained in:
2026-03-05 11:03:20 +11:00
parent 96e72d7c43
commit c8e38cd597
10 changed files with 534 additions and 30 deletions

14
todo.md
View File

@@ -9,9 +9,9 @@ Priority key: `P0` = critical/blocking, `P1` = important, `P2` = later optimizat
- [x] [P0] Freeze training window with explicit UTC start/end timestamps.
## 2) Data Quality and Label Validation
- [ ] [P0] Audit `observations_ws90` and `observations_baro` for missingness, gaps, duplicates, and out-of-order rows. (script ready: `scripts/audit_rain_data.py`; run on runtime machine)
- [ ] [P0] Validate rain label construction from `rain_mm` (counter resets, negative deltas, spikes). (script ready: `scripts/audit_rain_data.py`; run on runtime machine)
- [ ] [P0] Measure class balance by week (rain-positive vs rain-negative). (script ready: `scripts/audit_rain_data.py`; run on runtime machine)
- [x] [P0] Audit `observations_ws90` and `observations_baro` for missingness, gaps, duplicates, and out-of-order rows. (completed on runtime machine)
- [x] [P0] Validate rain label construction from `rain_mm` (counter resets, negative deltas, spikes). (completed on runtime machine)
- [x] [P0] Measure class balance by week (rain-positive vs rain-negative). (completed on runtime machine)
- [ ] [P1] Document known data issues and mitigation rules.
## 3) Dataset and Feature Engineering
@@ -38,10 +38,10 @@ Priority key: `P0` = critical/blocking, `P1` = important, `P2` = later optimizat
- [ ] [P1] Produce a short model card (data window, features, metrics, known limitations).
## 6) Packaging and Deployment
- [ ] [P1] Version model artifacts and feature schema together.
- [x] [P1] Version model artifacts and feature schema together.
- [x] [P0] Implement inference path with feature parity between training and serving.
- [x] [P0] Add prediction storage table for predicted probabilities and realized outcomes.
- [ ] [P1] Expose predictions via API and optionally surface in web dashboard.
- [x] [P1] Expose predictions via API and optionally surface in web dashboard.
- [ ] [P2] Add scheduled retraining with rollback to last-known-good model.
## 7) Monitoring and Operations
@@ -51,7 +51,7 @@ Priority key: `P0` = critical/blocking, `P1` = important, `P2` = later optimizat
- [ ] [P1] Document runbook for train/evaluate/deploy/rollback.
## 8) Immediate Next Steps (This Week)
- [ ] [P0] Run first full data audit and label-quality checks. (blocked here; run on runtime machine)
- [ ] [P0] Train baseline model on full available history and capture metrics. (blocked here; run on runtime machine)
- [x] [P0] Run first full data audit and label-quality checks. (completed on runtime machine)
- [x] [P0] Train baseline model on full available history and capture metrics. (completed on runtime machine)
- [ ] [P1] Add one expanded feature set and rerun evaluation.
- [x] [P0] Decide v1 threshold and define deployment interface.