Weather forecast models (precipitation, wind) based on input data that can be estimated or measured by anyone with a smartphone – even without special equipment and internet connection.
Deployed version – weather.pesout.net
I trained weather forecast models using a long-term dataset, keeping only inputs that can be collected without specialized instruments. Resulting Python ML models (XGBoost – gradient boosted trees) was tested and exported to JavaScript (m2cgen) so they can run in any browser.
The fastest way is to use the prepared HTML page with already imported pre-trained models.
1. Start a local web server from the project root:
python3 -m http.server 80002. Open it in your browser: go to http://localhost:8000
Requirements:
- Python 3.8 or higher
- Weather dataset (CSV format)
The original dataset is published in a separate GitHub repository; see weather-dataset.
1. Install dependencies:
pip install -e .2. Train and export models:
python3 train_exportable.py --input weather.csvSee train_exportable.py for all possible advanced options.
3. Check exported models:
./public/precipitation_model.js- precipitation occurrence classifier./public/precipitation_amt_model.js- precipitation amount regressor./public/wind_model.js- wind speed regressor
import { predictWeather } from './predict.js';
const forecast = predictWeather({
latitude: 50.0755,
longitude: 14.4378,
altitude: 200,
airPressure: 1013.25,
temperature: 15,
cloudCover: 0.5,
windDirection: 'NW',
windCategory: 'light',
hour: 14,
dayOfYear: 180
});
console.log(forecast);
// { precipitationProb: 0.23, precipitationAmount: 0.45, windSpeed: 3.2 }Input requirements:
- Basic location data (latitude, longitude, altitude)
- Date and time
- Simple observations: temperature, air pressure, cloud cover, current wind (speed and direction)
Since exact wind speed is hard to measure without any tools, I use categories based on the Beaufort scale.
Output (6-hour forecast):
- Precipitation occurrence - probability of rain/snow in the next 6 hours
- Precipitation amount - expected rainfall in millimeters (if precipitation occurs)
- Wind speed - wind speed in meters per second
At the end of 2021, I launched a cron job that downloads current weather and forecasts from Locationforecast by MET Norway. It saves data from five different locations (CZ, SK) twice a day (7 AM and 2 PM). Thus, in December 2025, there are about 14,600 records.
The original dataset had 27 variables – see weather.example.csv, but I excluded or modified most of them. In addition to those that are unnecessary for training the ML model, it was necessary to ignore inputs that cannot be obtained without equipment (e.g., air humidity). I also excluded data on changes after 1 hour, because I wanted to generate predictions from immediate observations.
Variables/columns in the final dataset:
date- Date string (e.g., "2025-01-03")time- Time string (e.g., "14:00:00")latitude- Latitude in decimal degreeslongitude- Longitude in decimal degreesaltitude- Altitude in meters above sea levelair_pressure- Air pressure in hPaair_temperature- Temperature in Celsiuscloud_area_fraction- Cloud cover in percentage (0-100)wind_from_direction- Wind direction in degrees (0-360)wind_speed- Current wind speed in m/s (converted to a category before training)precipitation_amount_next_6h- Precipitation amount in next 6h (mm) - target variablewind_speed_next_6h- Wind speed in next 6h (m/s) - target variable
The performance of the models should be viewed in the context that many variables that are otherwise important for weather forecasting had to be excluded. At the same time, there are more suitable algorithms for this case than XGBoost, which was chosen because of JavaScript export. The character of the project also precluded data processing as time series.
The referenced model was trained with n_estimators=250 and performance was evaluated on 20% holdout from the most recent data.
- ROC-AUC: 0.7959 - Good discrimination between rain/no-rain
- PR-AUC: 0.7502 - Decent precision-recall balance for imbalanced precipitation events
- Brier Score: 0.1844 - Low calibration error
The model reliably distinguishes rainy from dry conditions with ~80% discriminative ability.
- MAE (conditional on precipitation): 1.1370 mm - Mean absolute error for cases, when rain actually occurs
- MAE (general): 0.6443 mm - Mean absolute error when combining occurrence
probability × amount
Baseline comparison:
- Always predicting zero: 0.6329 mm MAE
- Always predicting mean precipitation: 1.3750 mm MAE
The model slightly underperforms a naive "always dry" baseline but significantly beats predicting average rainfall. This is expected for rare precipitation events where predicting "no rain" is often correct.
- MAE: 0.8128 m/s
- RMSE: 1.0281 m/s
Baseline comparison:
- Always predicting the same value as input: 1.2124 m/s MAE
The model predicts wind speed with ~0.81 m/s accuracy, beating the persistence baseline by 33%. This is good performance beyond simple assumptions.
The training data comes from only five locations, which I chose quite randomly without any particular strategy. It can therefore be expected that the accuracy of the predictions will decrease in geographical locations far from the coordinates in the dataset.
Furthermore, the target variables values are not actual weather conditions, but the meteorological institute's forecast for the next 6 hours. The question is whether this really has a negative impact on accuracy, because I assume that this approach allows us to exclude unexpected situations from the data.