go-weatherstation/todo.md

# Predictive Model TODO

Priority key: `P0` = critical/blocking, `P1` = important, `P2` = later optimization.

## 1) Scope and Success Criteria
- [x] [P0] Lock v1 target: predict `rain_next_1h >= 0.2mm`.
- [x] [P0] Define the decision use-case (alerts vs dashboard signal).
- [x] [P0] Set acceptance metrics and thresholds (precision, recall, ROC-AUC).
- [x] [P0] Freeze training window with explicit UTC start/end timestamps.

## 2) Data Quality and Label Validation
- [x] [P0] Audit `observations_ws90` and `observations_baro` for missingness, gaps, duplicates, and out-of-order rows. (completed on runtime machine)
- [x] [P0] Validate rain label construction from `rain_mm` (counter resets, negative deltas, spikes). (completed on runtime machine)
- [x] [P0] Measure class balance by week (rain-positive vs rain-negative). (completed on runtime machine)
- [x] [P1] Document known data issues and mitigation rules. (see `docs/rain_data_issues.md`)

## 3) Dataset and Feature Engineering
- [x] [P1] Extract reusable dataset-builder logic from training script into a maintainable module/workflow.
- [x] [P1] Add lag/rolling features (means, stddev, deltas) for core sensor inputs.
- [x] [P1] Encode wind direction properly (cyclical encoding).
- [x] [P2] Add calendar features (hour-of-day, day-of-week, seasonality proxies). (`feature-set=extended_calendar`)
- [x] [P1] Join aligned forecast features from `forecast_openmeteo_hourly` (precip prob, cloud cover, wind, pressure).
- [x] [P1] Persist versioned dataset snapshots for reproducibility.

## 4) Modeling and Validation
- [x] [P0] Keep logistic regression as baseline.
- [x] [P1] Add at least one tree-based baseline (e.g. gradient boosting). (implemented via `hist_gb`; runtime evaluation pending local Python deps)
- [x] [P0] Use strict time-based train/validation/test splits (no random shuffling).
- [x] [P1] Add walk-forward backtesting across multiple temporal folds. (`train_rain_model.py --walk-forward-folds`)
- [x] [P1] Tune hyperparameters on validation data only. (`train_rain_model.py --tune-hyperparameters`)
- [x] [P1] Calibrate probabilities (Platt or isotonic) and compare calibration quality. (`--calibration-methods`)
- [x] [P0] Choose and lock the operating threshold based on use-case costs.

## 5) Evaluation and Reporting
- [x] [P0] Report ROC-AUC, PR-AUC, confusion matrix, precision, recall, and Brier score.
- [x] [P1] Compare against naive baselines (persistence and simple forecast-threshold rules).
- [x] [P2] Slice performance by periods/weather regimes (day/night, rainy weeks, etc.). (`sliced_performance_test`)
- [x] [P1] Produce a short model card (data window, features, metrics, known limitations). (`--model-card-out`)

## 6) Packaging and Deployment
- [x] [P1] Version model artifacts and feature schema together.
- [x] [P0] Implement inference path with feature parity between training and serving.
- [x] [P0] Add prediction storage table for predicted probabilities and realized outcomes.
- [x] [P1] Expose predictions via API and optionally surface in web dashboard.
- [x] [P2] Add scheduled retraining with rollback to last-known-good model. (`run_rain_ml_worker.py` candidate promote + `RAIN_MODEL_BACKUP_PATH`)

## 7) Monitoring and Operations
- [x] [P1] Track feature drift and prediction drift over time. (view: `rain_feature_drift_daily`, `rain_prediction_drift_daily`)
- [x] [P1] Track calibration drift and realized performance after deployment. (view: `rain_calibration_drift_daily`)
- [x] [P1] Add alerts for training/inference/data pipeline failures. (`scripts/check_rain_pipeline_health.py`)
- [x] [P1] Document runbook for train/evaluate/deploy/rollback. (see `docs/rain_model_runbook.md`)

## 8) Immediate Next Steps (This Week)
- [x] [P0] Run first full data audit and label-quality checks. (completed on runtime machine)
- [x] [P0] Train baseline model on full available history and capture metrics. (completed on runtime machine)
- [x] [P1] Add one expanded feature set and rerun evaluation. (completed on runtime machine 2026-03-12 with `feature_set=extended`, `model_version=rain-auto-v1-extended-202603120932`)
- [x] [P0] Decide v1 threshold and define deployment interface.

## 9) Extension Plan: 4-Hour Precipitation Window (In Progress)
- [x] [P0] Lock v2 target definition for horizon extension: `rain_next_4h_mm >= <threshold_mm>` and explicitly decide whether the threshold remains `0.2mm` or is increased for 4-hour labeling. (implemented with `0.2mm` carry-forward)
- [x] [P0] Decide rollout strategy: additive dual-horizon support (`1h` + `4h`) vs direct replacement; prefer dual-horizon for safe cutover. (implemented as additive dual-horizon)
- [x] [P0] Parameterize label horizon in shared ML code (`scripts/rain_model_common.py`) so target columns are generated for 4-hour windows (48 x 5-minute buckets) instead of hard-coded 1-hour columns.
- [x] [P1] Revisit persistence/context features currently tied to `rain_last_1h_mm`; decide whether to keep 1-hour context, add 4-hour context, or both for the 4-hour target. (implemented horizon-aware context column selection)
- [x] [P0] Update training pipeline (`scripts/train_rain_model.py`) to train against the 4-hour target column, including reports, model-card content, dataset snapshot columns, and artifact metadata.
- [x] [P0] Update audit pipeline (`scripts/audit_rain_data.py`) to report class balance and target definition for 4-hour labels.
- [x] [P0] Update inference pipeline (`scripts/predict_rain_model.py`) to use the 4-hour target, including realized-outcome availability checks (`pred_ts + 4h`) and metadata/reporting fields.
- [x] [P0] Finalize DB storage design for 4-hour predictions (new `predictions_rain_4h` table vs generic horizon column strategy) before migrations. (implemented dedicated `predictions_rain_4h` table)
- [x] [P0] Create schema migration (recommended: new hypertable `predictions_rain_4h` with `rain_next_4h_mm_actual` and `rain_next_4h_actual` fields) and matching indexes.
- [x] [P0] Update prediction upsert SQL to write to the 4-hour prediction table/columns.
- [x] [P0] Update monitoring views in `db/init/002_rain_monitoring_views.sql` so drift/calibration/pipeline-health views include the 4-hour prediction path.
- [x] [P0] Update Go DB query layer (`internal/db/series.go`) to read 4-hour prediction rows/fields.
- [x] [P1] Update dashboard API defaults (`cmd/ingestd/web.go`) from `rain_next_1h` to the selected 4-hour model name (or make model name configurable).
- [x] [P1] Update web UI labels/semantics (`cmd/ingestd/web/index.html`, `cmd/ingestd/web/app.js`) from “Rain 1h %” to “Rain 4h %” and verify chart legends/tooltips match the new horizon.
- [x] [P1] Update worker/runtime defaults (`docker-compose.yml`, `scripts/run_rain_ml_worker.py`, `scripts/run_p0_rain_workflow.sh`) to use `rain_next_4h` naming/versioning.
- [x] [P1] Add dual-run deployment support with isolated artifacts for 4h and 1h workers (`docker-compose.yml` + `scripts/rainml_py.sh` service targeting).
- [x] [P0] Update health-check defaults (`scripts/check_rain_pipeline_health.py`) for 4-hour evaluation latency (e.g., pending-eval age threshold > 4h).
- [x] [P1] Update docs and runbooks (`README.md`, `docs/rain_prediction.md`, `docs/rain_model_runbook.md`) so commands, table names, and target definitions match the 4-hour system.
- [x] [P1] Add explicit automated cutover gate script for baseline-vs-candidate decisioning. (`scripts/check_rain_cutover_gate.py`)
- [x] [P0] Run full retraining/evaluation for the 4-hour target and compare against current 1-hour model metrics before production cutover. (completed on runtime machine 2026-04-06; comparison vs `rain_model_report_extended_eval.json` showed regression: precision `0.7103 -> 0.5545`, PR-AUC `0.7245 -> 0.5850`, ROC-AUC `0.9184 -> 0.7843`, Brier `0.0931 -> 0.2276`)
- [ ] [P0] Execute staged rollout: deploy schema + views, deploy model + inference, verify dashboard/health checks, then switch default model name. (schema/views/deploy/inference/health completed on runtime machine 2026-04-06; final production cutover decision remains open due 4h metric regression vs 1h baseline)
- [ ] [P0] Improve 4-hour model quality to meet cutover gate (at minimum recover precision and calibration relative to current 1-hour production baseline) before declaring rollout complete.
- [x] [P1] Keep rollback path documented: retain `rain_next_1h` artifacts/table access until 4-hour monitoring is stable. (documented in `docs/rain_model_runbook.md` staged rollout/rollback section)