2.5 KiB
Rain Prediction (Next 1 Hour)
This project now includes a starter training script for a binary rain prediction:
Will we see >= 0.2 mm of rain in the next hour?
It uses local observations (WS90 + barometric pressure) and trains a lightweight logistic regression model. This is a baseline you can iterate on as you collect more data.
What the script does
- Pulls data from TimescaleDB.
- Resamples observations to 5-minute buckets.
- Derives pressure trend (1h) from barometer data.
- Computes future 1-hour rainfall from the cumulative
rain_mmcounter. - Trains a model and prints evaluation metrics.
The output is a saved model file (optional) you can use later for inference.
Requirements
Python 3.10+ and:
pandas
numpy
scikit-learn
psycopg2-binary
joblib
Install with:
python3 -m venv .venv
source .venv/bin/activate
pip install -r scripts/requirements.txt
Usage
python scripts/train_rain_model.py \
--db-url "postgres://postgres:postgres@localhost:5432/micrometeo?sslmode=disable" \
--site "home" \
--start "2026-01-01" \
--end "2026-02-01" \
--out "models/rain_model.pkl"
You can also provide the connection string via DATABASE_URL:
export DATABASE_URL="postgres://postgres:postgres@localhost:5432/micrometeo?sslmode=disable"
python scripts/train_rain_model.py --site home
Output
The script prints metrics including:
- accuracy
- precision / recall
- ROC AUC
- confusion matrix
If joblib is installed, it saves a model bundle:
models/rain_model.pkl
This bundle contains:
- The trained model pipeline
- The feature list used during training
Data needs / when to run
For a reliable model, you will want:
- At least 2-4 weeks of observations
- A mix of rainy and non-rainy periods
Training with only a few days will produce an unstable model.
Features used
The baseline model uses:
pressure_trend_1h(hPa)humidity(%)temperature_c(C)wind_avg_m_s(m/s)wind_max_m_s(m/s)
These are easy to expand once you have more data (e.g. add forecast features).
Notes / assumptions
- Rain detection is based on incremental rain derived from the WS90
rain_mmcumulative counter. - Pressure comes from
observations_baro. - All timestamps are treated as UTC.
Next improvements
Ideas once more data is available:
- Add forecast precipitation and cloud cover as features
- Try gradient boosted trees (e.g. XGBoost / LightGBM)
- Train per-season models
- Calibrate probabilities (Platt scaling / isotonic regression)