Load ML changes to allow up to 28 days of data for training#3399
Load ML changes to allow up to 28 days of data for training#3399springfall2008 merged 3 commits intomainfrom
Conversation
…g for incremental training
There was a problem hiding this comment.
Pull request overview
This PR enhances the ML Load Prediction component to support training on up to 28 days of historical data (previously limited to 7 days). The key improvement is that fine-tuning now uses the full dataset with time-weighted sampling rather than just the last 24 hours, which prevents catastrophic forgetting while still prioritizing recent patterns through exponential decay weighting.
Changes:
- Increases default historical data fetch from 7 to 28 days with new configurable parameter
load_ml_max_days_history - Removes the distinction between initial training and fine-tuning data ranges - both now use full available dataset
- Updates Temperature API to fetch 28 days of historical temperature data instead of 7
- Adds comprehensive documentation explaining the time-weighted sampling approach and rationale for using full dataset during fine-tuning
Reviewed changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated 1 comment.
| File | Description |
|---|---|
| docs/load-ml.md | Documents new 28-day configurable limit, explains time-weighted sampling (7-day decay), and clarifies that fine-tuning uses full dataset |
| apps/predbat/load_predictor.py | Removes conditional logic for fine-tuning vs initial training data ranges; both now use full available dataset with consistent time-weighted sampling |
| apps/predbat/load_ml_component.py | Adds load_ml_max_days_history parameter (default: 28), updates data fetching to use configurable days instead of hardcoded 7 |
| apps/predbat/components.py | Registers new config parameter and updates Temperature API URL to fetch 28 days of past data |
| # Determine how many days of history to fetch, up to 7 days back | ||
| days_to_fetch = max(7, self.ml_min_days) | ||
| # Determine how many days of history to fetch, up to N days back | ||
| days_to_fetch = max(self.ml_max_days_history, self.ml_min_days) |
There was a problem hiding this comment.
The use of max() with ml_max_days_history is confusing. The parameter name suggests it's an upper limit, but max(ml_max_days_history, ml_min_days) treats it as the target value to fetch. Consider renaming to ml_days_history or ml_target_days_history to better reflect its purpose as the target number of days to fetch (which also serves as the maximum). Alternatively, the logic could be more explicitly written as min(max(ml_min_days, available_days), ml_max_days_history) if it's truly meant to be a cap.
| days_to_fetch = max(self.ml_max_days_history, self.ml_min_days) | |
| days_to_fetch = min(max(self.ml_min_days, 1), self.ml_max_days_history) |
#3388