more work on model training

This commit is contained in:
2026-03-05 20:49:19 +11:00
parent 76270e5650
commit 5b8cad905f
9 changed files with 380 additions and 48 deletions

12
todo.md
View File

@@ -15,12 +15,12 @@ Priority key: `P0` = critical/blocking, `P1` = important, `P2` = later optimizat
- [ ] [P1] Document known data issues and mitigation rules.
## 3) Dataset and Feature Engineering
- [ ] [P1] Extract reusable dataset-builder logic from training script into a maintainable module/workflow.
- [ ] [P1] Add lag/rolling features (means, stddev, deltas) for core sensor inputs.
- [ ] [P1] Encode wind direction properly (cyclical encoding).
- [x] [P1] Extract reusable dataset-builder logic from training script into a maintainable module/workflow.
- [x] [P1] Add lag/rolling features (means, stddev, deltas) for core sensor inputs.
- [x] [P1] Encode wind direction properly (cyclical encoding).
- [ ] [P2] Add calendar features (hour-of-day, day-of-week, seasonality proxies).
- [ ] [P1] Join aligned forecast features from `forecast_openmeteo_hourly` (precip prob, cloud cover, wind, pressure).
- [ ] [P1] Persist versioned dataset snapshots for reproducibility.
- [x] [P1] Join aligned forecast features from `forecast_openmeteo_hourly` (precip prob, cloud cover, wind, pressure).
- [x] [P1] Persist versioned dataset snapshots for reproducibility.
## 4) Modeling and Validation
- [x] [P0] Keep logistic regression as baseline.
@@ -53,5 +53,5 @@ Priority key: `P0` = critical/blocking, `P1` = important, `P2` = later optimizat
## 8) Immediate Next Steps (This Week)
- [x] [P0] Run first full data audit and label-quality checks. (completed on runtime machine)
- [x] [P0] Train baseline model on full available history and capture metrics. (completed on runtime machine)
- [ ] [P1] Add one expanded feature set and rerun evaluation.
- [ ] [P1] Add one expanded feature set and rerun evaluation. (feature-set plumbing implemented; rerun pending on runtime machine)
- [x] [P0] Decide v1 threshold and define deployment interface.