Predictive Model TODO

Priority key: P0 = critical/blocking, P1 = important, P2 = later optimization.

1) Scope and Success Criteria

[P0] Lock v1 target: predict rain_next_1h >= 0.2mm.
[P0] Define the decision use-case (alerts vs dashboard signal).
[P0] Set acceptance metrics and thresholds (precision, recall, ROC-AUC).
[P0] Freeze training window with explicit UTC start/end timestamps.

2) Data Quality and Label Validation

[P0] Audit observations_ws90 and observations_baro for missingness, gaps, duplicates, and out-of-order rows. (completed on runtime machine)
[P0] Validate rain label construction from rain_mm (counter resets, negative deltas, spikes). (completed on runtime machine)
[P0] Measure class balance by week (rain-positive vs rain-negative). (completed on runtime machine)
[P1] Document known data issues and mitigation rules. (see docs/rain_data_issues.md)

3) Dataset and Feature Engineering

[P1] Extract reusable dataset-builder logic from training script into a maintainable module/workflow.
[P1] Add lag/rolling features (means, stddev, deltas) for core sensor inputs.
[P1] Encode wind direction properly (cyclical encoding).
[P2] Add calendar features (hour-of-day, day-of-week, seasonality proxies). (feature-set=extended_calendar)
[P1] Join aligned forecast features from forecast_openmeteo_hourly (precip prob, cloud cover, wind, pressure).
[P1] Persist versioned dataset snapshots for reproducibility.

4) Modeling and Validation

[P0] Keep logistic regression as baseline.
[P1] Add at least one tree-based baseline (e.g. gradient boosting). (implemented via hist_gb; runtime evaluation pending local Python deps)
[P0] Use strict time-based train/validation/test splits (no random shuffling).
[P1] Add walk-forward backtesting across multiple temporal folds. (train_rain_model.py --walk-forward-folds)
[P1] Tune hyperparameters on validation data only. (train_rain_model.py --tune-hyperparameters)
[P1] Calibrate probabilities (Platt or isotonic) and compare calibration quality. (--calibration-methods)
[P0] Choose and lock the operating threshold based on use-case costs.

5) Evaluation and Reporting

[P0] Report ROC-AUC, PR-AUC, confusion matrix, precision, recall, and Brier score.
[P1] Compare against naive baselines (persistence and simple forecast-threshold rules).
[P2] Slice performance by periods/weather regimes (day/night, rainy weeks, etc.). (sliced_performance_test)
[P1] Produce a short model card (data window, features, metrics, known limitations). (--model-card-out)

6) Packaging and Deployment

[P1] Version model artifacts and feature schema together.
[P0] Implement inference path with feature parity between training and serving.
[P0] Add prediction storage table for predicted probabilities and realized outcomes.
[P1] Expose predictions via API and optionally surface in web dashboard.
[P2] Add scheduled retraining with rollback to last-known-good model. (run_rain_ml_worker.py candidate promote + RAIN_MODEL_BACKUP_PATH)

7) Monitoring and Operations

[P1] Track feature drift and prediction drift over time. (view: rain_feature_drift_daily, rain_prediction_drift_daily)
[P1] Track calibration drift and realized performance after deployment. (view: rain_calibration_drift_daily)
[P1] Add alerts for training/inference/data pipeline failures. (scripts/check_rain_pipeline_health.py)
[P1] Document runbook for train/evaluate/deploy/rollback. (see docs/rain_model_runbook.md)

8) Immediate Next Steps (This Week)

[P0] Run first full data audit and label-quality checks. (completed on runtime machine)
[P0] Train baseline model on full available history and capture metrics. (completed on runtime machine)
[P1] Add one expanded feature set and rerun evaluation. (completed on runtime machine 2026-03-12 with feature_set=extended, model_version=rain-auto-v1-extended-202603120932)
[P0] Decide v1 threshold and define deployment interface.

9) Extension Plan: 4-Hour Precipitation Window (In Progress)

[P0] Lock v2 target definition for horizon extension: rain_next_4h_mm >= <threshold_mm> and explicitly decide whether the threshold remains 0.2mm or is increased for 4-hour labeling. (implemented with 0.2mm carry-forward)
[P0] Decide rollout strategy: additive dual-horizon support (1h + 4h) vs direct replacement; prefer dual-horizon for safe cutover. (implemented as additive dual-horizon)
[P0] Parameterize label horizon in shared ML code (scripts/rain_model_common.py) so target columns are generated for 4-hour windows (48 x 5-minute buckets) instead of hard-coded 1-hour columns.
[P1] Revisit persistence/context features currently tied to rain_last_1h_mm; decide whether to keep 1-hour context, add 4-hour context, or both for the 4-hour target. (implemented horizon-aware context column selection)
[P0] Update training pipeline (scripts/train_rain_model.py) to train against the 4-hour target column, including reports, model-card content, dataset snapshot columns, and artifact metadata.
[P0] Update audit pipeline (scripts/audit_rain_data.py) to report class balance and target definition for 4-hour labels.
[P0] Update inference pipeline (scripts/predict_rain_model.py) to use the 4-hour target, including realized-outcome availability checks (pred_ts + 4h) and metadata/reporting fields.
[P0] Finalize DB storage design for 4-hour predictions (new predictions_rain_4h table vs generic horizon column strategy) before migrations. (implemented dedicated predictions_rain_4h table)
[P0] Create schema migration (recommended: new hypertable predictions_rain_4h with rain_next_4h_mm_actual and rain_next_4h_actual fields) and matching indexes.
[P0] Update prediction upsert SQL to write to the 4-hour prediction table/columns.
[P0] Update monitoring views in db/init/002_rain_monitoring_views.sql so drift/calibration/pipeline-health views include the 4-hour prediction path.
[P0] Update Go DB query layer (internal/db/series.go) to read 4-hour prediction rows/fields.
[P1] Update dashboard API defaults (cmd/ingestd/web.go) from rain_next_1h to the selected 4-hour model name (or make model name configurable).
[P1] Update web UI labels/semantics (cmd/ingestd/web/index.html, cmd/ingestd/web/app.js) from “Rain 1h %” to “Rain 4h %” and verify chart legends/tooltips match the new horizon.
[P1] Update worker/runtime defaults (docker-compose.yml, scripts/run_rain_ml_worker.py, scripts/run_p0_rain_workflow.sh) to use rain_next_4h naming/versioning.
[P1] Add dual-run deployment support with isolated artifacts for 4h and 1h workers (docker-compose.yml + scripts/rainml_py.sh service targeting).
[P0] Update health-check defaults (scripts/check_rain_pipeline_health.py) for 4-hour evaluation latency (e.g., pending-eval age threshold > 4h).
[P1] Update docs and runbooks (README.md, docs/rain_prediction.md, docs/rain_model_runbook.md) so commands, table names, and target definitions match the 4-hour system.
[P1] Add explicit automated cutover gate script for baseline-vs-candidate decisioning. (scripts/check_rain_cutover_gate.py)
[P0] Run full retraining/evaluation for the 4-hour target and compare against current 1-hour model metrics before production cutover. (completed on runtime machine 2026-04-06; comparison vs rain_model_report_extended_eval.json showed regression: precision 0.7103 -> 0.5545, PR-AUC 0.7245 -> 0.5850, ROC-AUC 0.9184 -> 0.7843, Brier 0.0931 -> 0.2276)
[P0] Execute staged rollout: deploy schema + views, deploy model + inference, verify dashboard/health checks, then switch default model name. (schema/views/deploy/inference/health completed on runtime machine 2026-04-06; final production cutover decision remains open due 4h metric regression vs 1h baseline)
[P0] Improve 4-hour model quality to meet cutover gate (at minimum recover precision and calibration relative to current 1-hour production baseline) before declaring rollout complete.
[P1] Keep rollback path documented: retain rain_next_1h artifacts/table access until 4-hour monitoring is stable. (documented in docs/rain_model_runbook.md staged rollout/rollback section)

8.4 KiB Raw Blame History

Predictive Model TODO

1) Scope and Success Criteria

2) Data Quality and Label Validation

3) Dataset and Feature Engineering

4) Modeling and Validation

5) Evaluation and Reporting

6) Packaging and Deployment

7) Monitoring and Operations

8) Immediate Next Steps (This Week)

9) Extension Plan: 4-Hour Precipitation Window (In Progress)

8.4 KiB

Raw Blame History