# Rain Prediction (Next 1 Hour) This project now includes a starter training script for a **binary rain prediction**: > **Will we see >= 0.2 mm of rain in the next hour?** It uses local observations (WS90 + barometric pressure) and trains a lightweight logistic regression model. This is a baseline you can iterate on as you collect more data. ## What the script does - Pulls data from TimescaleDB. - Resamples observations to 5-minute buckets. - Derives **pressure trend (1h)** from barometer data. - Computes **future 1-hour rainfall** from the cumulative `rain_mm` counter. - Trains a model and prints evaluation metrics. The output is a saved model file (optional) you can use later for inference. ## Requirements Python 3.10+ and: ``` pandas numpy scikit-learn psycopg2-binary joblib ``` Install with: ```sh python3 -m venv .venv source .venv/bin/activate pip install -r scripts/requirements.txt ``` ## Usage ```sh python scripts/train_rain_model.py \ --db-url "postgres://postgres:postgres@localhost:5432/micrometeo?sslmode=disable" \ --site "home" \ --start "2026-01-01" \ --end "2026-02-01" \ --out "models/rain_model.pkl" ``` You can also provide the connection string via `DATABASE_URL`: ```sh export DATABASE_URL="postgres://postgres:postgres@localhost:5432/micrometeo?sslmode=disable" python scripts/train_rain_model.py --site home ``` ## Output The script prints metrics including: - accuracy - precision / recall - ROC AUC - confusion matrix If `joblib` is installed, it saves a model bundle: ``` models/rain_model.pkl ``` This bundle contains: - The trained model pipeline - The feature list used during training ## Data needs / when to run For a reliable model, you will want: - **At least 2-4 weeks** of observations - A mix of rainy and non-rainy periods Training with only a few days will produce an unstable model. ## Features used The baseline model uses: - `pressure_trend_1h` (hPa) - `humidity` (%) - `temperature_c` (C) - `wind_avg_m_s` (m/s) - `wind_max_m_s` (m/s) These are easy to expand once you have more data (e.g. add forecast features). ## Notes / assumptions - Rain detection is based on **incremental rain** derived from the WS90 `rain_mm` cumulative counter. - Pressure comes from `observations_baro`. - All timestamps are treated as UTC. ## Next improvements Ideas once more data is available: - Add forecast precipitation and cloud cover as features - Try gradient boosted trees (e.g. XGBoost / LightGBM) - Train per-season models - Calibrate probabilities (Platt scaling / isotonic regression)