Conversation
fix: CumberlandCouncil - correct year assignment for all months - previous had accidentally hardcoded the year as 2025 for all months except Jan/Feb
📝 WalkthroughWalkthroughRefactors CumberlandCouncil.py's calendar parsing: introduces a module-level month-name list, extracts calendar year from a "Collection calendar" heading, computes month numbers and year rollover logic, generalizes day/bin parsing, constructs/sorts collection dates using the configured date_format, and adds error handling for undetermined years. Changes
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Possibly related PRs
Poem
🚥 Pre-merge checks | ✅ 1 | ❌ 2❌ Failed checks (1 warning, 1 inconclusive)
✅ Passed checks (1 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
🧹 Nitpick comments (2)
uk_bin_collection/uk_bin_collection/councils/CumberlandCouncil.py (2)
56-59: Month name list is duplicated.The same 12-element tuple appears twice: once in the
whileloop's month-detection branch and again in the subtype-skip condition. Extract it to a module-level constant to keep them in sync.♻️ Proposed refactor
+_MONTH_NAMES = [ + "January", "February", "March", "April", "May", "June", + "July", "August", "September", "October", "November", "December", +] ... - if line in [ - "January", "February", "March", "April", "May", "June", - "July", "August", "September", "October", "November", "December", - ]: + if line in _MONTH_NAMES: ... - and lines[i + 2] not in [ - "January", "February", "March", "April", "May", "June", - "July", "August", "September", "October", "November", "December", - ] + and lines[i + 2] not in _MONTH_NAMESAlso applies to: 85-88
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@uk_bin_collection/uk_bin_collection/councils/CumberlandCouncil.py` around lines 56 - 59, Extract the duplicated 12-month sequence into a module-level constant (e.g., MONTH_NAMES) and replace both inline lists used in the month-detection branch (where `if line in [...]` is used) and the subtype-skip condition with that constant; update any references in the CumberlandCouncil class or its methods to use MONTH_NAMES so the two checks stay in sync.
44-50: Year extraction falls back silently and assumes start/end months share the same year.Two concerns with this block:
Silent fallback: If no "Collection calendar" line is found, or it contains no 4-digit year,
current_yearsilently stays asdatetime.now().year. This masks future format changes. Per the project guideline, an explicit exception should be raised on unexpected formats.Cross-year calendar assumption: The code extracts the last 4-digit word from the heading and assigns it to all months as the base year. A heading like
"Collection calendar: November to March 2026"yieldscurrent_year = 2026, but November and December should be 2025. The subsequent rollover logic (month_num < previous_month_num) would then further corrupt the year (January rolls from 2026 → 2027 instead of landing on 2026). This only works correctly when the entire calendar falls within one calendar year (the current Feb–Aug case).💡 Suggested fix: raise on missing year + document the single-year assumption
- # Determine the base year from the page heading, e.g. - # "Collection calendar: February to August 2026" - # This is more reliable than checking whether "2026" appears anywhere - # in the page, which broke the year assignment for all non-Jan/Feb months. - for line in lines: - if "Collection calendar" in line: - for word in reversed(line.split()): - if word.isdigit() and len(word) == 4: - current_year = int(word) - break - break + # Determine the base year from the page heading, e.g. + # "Collection calendar: February to August 2026" + # NOTE: Assumes the calendar does not span a year boundary + # (i.e. all months share the same calendar year). If a cross-year + # schedule is ever published, this logic must be revisited. + calendar_year = None + for line in lines: + if "Collection calendar" in line: + for word in reversed(line.split()): + if word.isdigit() and len(word) == 4: + calendar_year = int(word) + break + break + if calendar_year is None: + raise ValueError( + "Could not determine collection year from 'Collection calendar' heading. " + "Page format may have changed." + ) + current_year = calendar_yearBased on learnings: "prefer explicit failures (raise exceptions on unexpected formats) over silent defaults or swallowed errors. This ensures format changes are detected early."
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@uk_bin_collection/uk_bin_collection/councils/CumberlandCouncil.py` around lines 44 - 50, The code currently falls back silently to datetime.now().year and assigns that single year to all months; update the "Collection calendar" parsing loop (where current_year is set) to (1) raise a clear ValueError if no 4-digit year is found in the "Collection calendar" line instead of silently using datetime.now().year, and (2) handle cross-year headings by detecting the start-month in the same header (parse the first month name in the heading) and compute each row's year relative to that start month: keep current_year as the header year but when assigning a year for a given month_num compare it to the parsed start_month_num and if month_num < start_month_num subtract 1 from current_year (use the existing previous_month_num/month_num rollover logic to maintain consistency).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Nitpick comments:
In `@uk_bin_collection/uk_bin_collection/councils/CumberlandCouncil.py`:
- Around line 56-59: Extract the duplicated 12-month sequence into a
module-level constant (e.g., MONTH_NAMES) and replace both inline lists used in
the month-detection branch (where `if line in [...]` is used) and the
subtype-skip condition with that constant; update any references in the
CumberlandCouncil class or its methods to use MONTH_NAMES so the two checks stay
in sync.
- Around line 44-50: The code currently falls back silently to
datetime.now().year and assigns that single year to all months; update the
"Collection calendar" parsing loop (where current_year is set) to (1) raise a
clear ValueError if no 4-digit year is found in the "Collection calendar" line
instead of silently using datetime.now().year, and (2) handle cross-year
headings by detecting the start-month in the same header (parse the first month
name in the heading) and compute each row's year relative to that start month:
keep current_year as the header year but when assigning a year for a given
month_num compare it to the parsed start_month_num and if month_num <
start_month_num subtract 1 from current_year (use the existing
previous_month_num/month_num rollover logic to maintain consistency).
further feedback from AI agent. this fixes the issues flagged and builds on previous commit
There was a problem hiding this comment.
Actionable comments posted: 3
🧹 Nitpick comments (2)
uk_bin_collection/uk_bin_collection/councils/CumberlandCouncil.py (2)
99-100: Inconsistent (and locale-sensitive) month-to-number conversion.Lines 72 and 74 use
_MONTH_NAMES.index(lines[j]) + 1, but line 100 re-derives the same value viadatetime.strptime(line, "%B").month.strptimeis locale-dependent and adds unnecessary overhead when the index into the already-checked_MONTH_NAMESlist gives an identical, locale-agnostic result.♻️ Proposed fix
- month_num = datetime.strptime(line, "%B").month + month_num = _MONTH_NAMES.index(line) + 1🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@uk_bin_collection/uk_bin_collection/councils/CumberlandCouncil.py` around lines 99 - 100, The code uses datetime.strptime(line, "%B").month to get month_num despite having already validated the token against the _MONTH_NAMES list; replace that locale-sensitive call with a direct, locale-agnostic lookup using _MONTH_NAMES.index(line) + 1 so month_num is computed the same way as elsewhere (consistent with uses like _MONTH_NAMES.index(lines[j]) + 1) and remove the unnecessary strptime import/usage in the CumberlandCouncil parsing logic.
67-68: Redundant.strip()on an already-stripped list element.
linesis built with.strip()inside the list comprehension at line 43, soline.strip()at line 68 is a no-op.♻️ Proposed fix
- if line.strip().startswith("Collection calendar"): + if line.startswith("Collection calendar"):🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@uk_bin_collection/uk_bin_collection/councils/CumberlandCouncil.py` around lines 67 - 68, The loop in the CumberlandCouncil parser redundantly calls line.strip() because lines are already stripped when constructed; update the conditional in the for loop (the enumerate over lines in the method that processes the calendar lines) to use line.startswith("Collection calendar") instead of line.strip().startswith(...). Locate the for i, line in enumerate(lines): block in CumberlandCouncil.py and change the check to rely on the already-stripped line variable.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@uk_bin_collection/uk_bin_collection/councils/CumberlandCouncil.py`:
- Around line 85-89: The current is_same_year logic silently treats unparsed
months as same-year because it returns True when start_month_num or
end_month_num is None; update the code around is_same_year (the variables
start_month_num, end_month_num, and is_same_year) to fail explicitly instead: if
either month number is None, raise a clear parsing/format exception (e.g.,
ValueError with context including the raw heading) or skip the cross-year branch
and raise when cross-year handling would be needed, so that unparseable headings
are surfaced rather than defaulting to same-year.
- Around line 134-144: The try/except block around building date_str and parsing
it (using date_str, collection_date, current_month, current_year, date_format
and appending dict_data to bindata["bins"]) silently swallows ValueError; change
it to surface the failure by either logging the failing date string and bin_type
then re-raising a ValueError or simply re-raising with contextual information
(e.g. include date_str and bin_type), so format regressions in
CumberlandCouncil.py are not ignored; do not keep a bare `except ValueError:
pass`.
- Line 100: Replace the star import in CumberlandCouncil.py with explicit
imports for the used names: import datetime (or from datetime import datetime as
needed) and import date_format from its defining module; then update usages like
month_num = datetime.strptime(line, "%B").month and any other references at the
locations that previously relied on the star import (the occurrences of datetime
and date_format) so they resolve to the explicitly imported symbols and avoid
the Ruff F405 ambiguity.
---
Nitpick comments:
In `@uk_bin_collection/uk_bin_collection/councils/CumberlandCouncil.py`:
- Around line 99-100: The code uses datetime.strptime(line, "%B").month to get
month_num despite having already validated the token against the _MONTH_NAMES
list; replace that locale-sensitive call with a direct, locale-agnostic lookup
using _MONTH_NAMES.index(line) + 1 so month_num is computed the same way as
elsewhere (consistent with uses like _MONTH_NAMES.index(lines[j]) + 1) and
remove the unnecessary strptime import/usage in the CumberlandCouncil parsing
logic.
- Around line 67-68: The loop in the CumberlandCouncil parser redundantly calls
line.strip() because lines are already stripped when constructed; update the
conditional in the for loop (the enumerate over lines in the method that
processes the calendar lines) to use line.startswith("Collection calendar")
instead of line.strip().startswith(...). Locate the for i, line in
enumerate(lines): block in CumberlandCouncil.py and change the check to rely on
the already-stripped line variable.
| is_same_year = ( | ||
| start_month_num is None | ||
| or end_month_num is None | ||
| or end_month_num >= start_month_num | ||
| ) |
There was a problem hiding this comment.
Silent same-year fallback when month names cannot be parsed.
is_same_year evaluates to True whenever start_month_num or end_month_num is None. This means a page whose heading contains the year but not parseable month names will silently produce wrong years for a cross-year calendar instead of surfacing the problem. Per the project's explicit-failure preference, consider raising here (or at least guarding the cross-year path) so a structural change in the heading is caught immediately.
Based on learnings from the UKBinCollectionData project: prefer explicit failures (raise exceptions on unexpected formats) over silent defaults, to ensure format changes are detected early.
🛡️ Proposed fix
+ if start_month_num is None or end_month_num is None:
+ raise ValueError(
+ "Could not determine start/end months from 'Collection calendar' heading. "
+ "Page format may have changed."
+ )
+
is_same_year = (
- start_month_num is None
- or end_month_num is None
- or end_month_num >= start_month_num
+ end_month_num >= start_month_num
)📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| is_same_year = ( | |
| start_month_num is None | |
| or end_month_num is None | |
| or end_month_num >= start_month_num | |
| ) | |
| if start_month_num is None or end_month_num is None: | |
| raise ValueError( | |
| "Could not determine start/end months from 'Collection calendar' heading. " | |
| "Page format may have changed." | |
| ) | |
| is_same_year = ( | |
| end_month_num >= start_month_num | |
| ) |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@uk_bin_collection/uk_bin_collection/councils/CumberlandCouncil.py` around
lines 85 - 89, The current is_same_year logic silently treats unparsed months as
same-year because it returns True when start_month_num or end_month_num is None;
update the code around is_same_year (the variables start_month_num,
end_month_num, and is_same_year) to fail explicitly instead: if either month
number is None, raise a clear parsing/format exception (e.g., ValueError with
context including the raw heading) or skip the cross-year branch and raise when
cross-year handling would be needed, so that unparseable headings are surfaced
rather than defaulting to same-year.
| if year_2026: | ||
| current_year = "2026" if line in ["January", "February"] else "2025" | ||
| if line in _MONTH_NAMES: | ||
| month_num = datetime.strptime(line, "%B").month |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
# First, find the file
fd -t f "CumberlandCouncil.py"Repository: robbrad/UKBinCollectionData
Length of output: 136
🏁 Script executed:
# Check the import statements and lines mentioned
cat -n uk_bin_collection/uk_bin_collection/councils/CumberlandCouncil.py | head -20Repository: robbrad/UKBinCollectionData
Length of output: 853
🏁 Script executed:
# Check around the lines mentioned (100, 136, 140, 153)
sed -n '95,160p' uk_bin_collection/uk_bin_collection/councils/CumberlandCouncil.py | cat -nRepository: robbrad/UKBinCollectionData
Length of output: 2671
🏁 Script executed:
# Check what's exported from common.py
cat -n uk_bin_collection/uk_bin_collection/common.py | head -100Repository: robbrad/UKBinCollectionData
Length of output: 3297
🏁 Script executed:
# Check if datetime is imported in common.py
rg -n "^import datetime|^from datetime" uk_bin_collection/uk_bin_collection/common.pyRepository: robbrad/UKBinCollectionData
Length of output: 113
Replace star import with explicit imports of datetime and date_format.
Lines 100, 136, 140, and 153 use datetime and date_format from the star import. Explicit imports remove the Ruff F405 ambiguity and clarify dependencies.
Also applies to: 136-136, 140-140, 153-153
🧰 Tools
🪛 Ruff (0.15.1)
[error] 100-100: datetime may be undefined, or defined from star imports
(F405)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@uk_bin_collection/uk_bin_collection/councils/CumberlandCouncil.py` at line
100, Replace the star import in CumberlandCouncil.py with explicit imports for
the used names: import datetime (or from datetime import datetime as needed) and
import date_format from its defining module; then update usages like month_num =
datetime.strptime(line, "%B").month and any other references at the locations
that previously relied on the star import (the occurrences of datetime and
date_format) so they resolve to the explicitly imported symbols and avoid the
Ruff F405 ambiguity.
| try: | ||
| date_str = f"{day} {current_month} {current_year}" | ||
| collection_date = datetime.strptime(date_str, "%d %B %Y") | ||
|
|
||
| dict_data = { | ||
| "type": bin_type, | ||
| "collectionDate": collection_date.strftime(date_format), | ||
| } | ||
| bindata["bins"].append(dict_data) | ||
| except ValueError: | ||
| pass |
There was a problem hiding this comment.
Silent except ValueError: pass masks format regressions.
The bare pass means any date-construction failure (including a silently wrong year assignment that produces an out-of-range date) is swallowed with no signal. Per the project's coding preference, this should propagate or at minimum log the failure so format changes are detected immediately rather than producing an empty result set.
Based on learnings from the UKBinCollectionData project: in uk_bin_collection/**/*.py, prefer explicit failures (raise exceptions on unexpected formats) over silent defaults or swallowed errors, and use clear exception types so format changes are detected early.
🛡️ Proposed fix
try:
date_str = f"{day} {current_month} {current_year}"
collection_date = datetime.strptime(date_str, "%d %B %Y")
- except ValueError:
- pass
+ except ValueError as e:
+ raise ValueError(
+ f"Failed to parse collection date '{date_str}': {e}"
+ ) from e📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| try: | |
| date_str = f"{day} {current_month} {current_year}" | |
| collection_date = datetime.strptime(date_str, "%d %B %Y") | |
| dict_data = { | |
| "type": bin_type, | |
| "collectionDate": collection_date.strftime(date_format), | |
| } | |
| bindata["bins"].append(dict_data) | |
| except ValueError: | |
| pass | |
| try: | |
| date_str = f"{day} {current_month} {current_year}" | |
| collection_date = datetime.strptime(date_str, "%d %B %Y") | |
| dict_data = { | |
| "type": bin_type, | |
| "collectionDate": collection_date.strftime(date_format), | |
| } | |
| bindata["bins"].append(dict_data) | |
| except ValueError as e: | |
| raise ValueError( | |
| f"Failed to parse collection date '{date_str}': {e}" | |
| ) from e |
🧰 Tools
🪛 Ruff (0.15.1)
[error] 136-136: datetime may be undefined, or defined from star imports
(F405)
[error] 140-140: date_format may be undefined, or defined from star imports
(F405)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@uk_bin_collection/uk_bin_collection/councils/CumberlandCouncil.py` around
lines 134 - 144, The try/except block around building date_str and parsing it
(using date_str, collection_date, current_month, current_year, date_format and
appending dict_data to bindata["bins"]) silently swallows ValueError; change it
to surface the failure by either logging the failing date string and bin_type
then re-raising a ValueError or simply re-raising with contextual information
(e.g. include date_str and bin_type), so format regressions in
CumberlandCouncil.py are not ignored; do not keep a bare `except ValueError:
pass`.
Admittedly this was Claude but i have checked it and it appears to work as expected (with my limited capability)
Problem
The CumberlandCouncil scraper returns February collections with the
correct year (2026), but assigns year 2025 to every other month
(March through August). Home Assistant therefore only shows the
current month — future collections appear to be in the past.
Root Cause
Two bugs were found:
Bug 1 — Hardcoded year assumption
The code used this logic to assign a year to each month:
This was written for a specific point in time (when the calendar
straddled Dec 2025 and Jan/Feb 2026). It now assigns 2025 to every
month other than January and February, which is wrong for the live
calendar that currently shows February through August 2026.
Bug 2 — Incomplete subtype skip
The code only skipped "Refuse" and "Recycling" subtypes:
The page also has "Paper" and "Green" subtypes which were not
being skipped, causing occasional parsing errors.
Fix
Extract the base year from the page heading:
"Collection calendar: February to August 2026" -> year = 2026
This works correctly for any year, not just 2026.
Track previous month number; if months go backwards
(e.g. December -> January) increment the year.
This handles year-boundary calendars robustly.
Replace the hardcoded subtype list with a general check:
skip any line that is not a digit and not a month name.
This correctly handles all four subtypes on the page:
Refuse, Recycling, Paper, Green.
Tested Against
Live URL: https://www.cumberland.gov.uk/bins-recycling-and-street-cleaning/
waste-collections/bin-collection-schedule/view/100110734613
UPRN: 100110734613
Result: All 20+ collections from February to August 2026 now return
with the correct year 2026. All four subtype variants confirmed working.
Summary by CodeRabbit